Files
mostovik-backend/docs/superpowers/specs/2026-05-06-dashboard-registry-enrichment-analytics-design.md
Aleksandr Meshchriakov 0f17ff6773
All checks were successful
CI/CD Pipeline / Quality Gate (push) Successful in 26s
CI/CD Pipeline / Build and Push Images (push) Successful in 6s
CI/CD Pipeline / Internal Notify (push) Successful in 0s
CI/CD Pipeline / Deploy Dev in Dokploy (push) Successful in 1s
Add organizations v2 API and registry enrichment
2026-05-06 19:04:46 +02:00

6.9 KiB
Raw Blame History

Dashboard Registry Enrichment Analytics Design

Date: 2026-05-06

Goal

Rework the dashboard analytics tab so it treats active registry organizations as the primary population and parser/enrichment jobs as the operational process that fills data for those organizations.

The existing analytics page is source-centric: total records, source counts, and load quality. That remains useful, but secondary. The new first screen must answer:

  • How many active registry organizations are under control?
  • How many have additional data from enrichment sources?
  • Which registries are under-covered by source?
  • Which enrichment jobs are scheduled, running, successful, failed, or stale?
  • What actions should the operator take next?

Scope

In scope:

  • Rebuild only the analyticsPanel dashboard tab.
  • Keep current navigation and other dashboard tabs unchanged.
  • Add dashboard API aggregate data under /api/v1/parsers/dashboard/.
  • Use active RegistryMembershipPeriod rows as the population.
  • Exclude unfair_suppliers from completeness degradation. It is a risk signal, not a required enrichment source.
  • Keep source record totals available, but move them below the primary registry analytics.

Out of scope:

  • Changing v2 organization API contracts.
  • Changing parser execution behavior.
  • Changing Celery scheduling semantics.
  • Adding external chart dependencies.

UX Structure

The analytics tab becomes a hybrid of "coverage center" and "enrichment pipeline".

Top section: Registry Organization Coverage

  • KPI cards:
    • Active registry organizations.
    • Organizations with at least one enrichment source.
    • Organizations with core profile coverage.
    • Organizations requiring attention.
  • Coverage by source:
    • Bar rows for FNS reports, industrial certificates, products, manufacturers, inspections, procurements, arbitration, bankruptcy, FSTEC, vacancies, etc.
    • Each row shows matched organization count and percent of active registry organizations.
    • unfair_suppliers is not included here.
  • Registry × source matrix:
    • Rows are registries.
    • Columns are important enrichment sources.
    • Cells show percent coverage for organizations in that registry.
    • This gives a fast view of which registry/source pair needs work.

Second section: Enrichment Pipeline

  • Job KPI cards:
    • Active schedules.
    • Running jobs.
    • Recent successes.
    • Recent failures.
  • Recent job quality meter:
    • Reuse existing load log status data, but frame it as enrichment pipeline health.
  • Action queue:
    • Organizations without enrichment data.
    • Organizations with identifier/matching problems.
    • Snapshots older than latest parser batches.
    • Risk signals such as unfair suppliers, bankruptcy, GOZ evasion shown separately from coverage.

Third section: Secondary Technical Counters

  • Current source record totals and source mode breakdown move below the registry-focused blocks.
  • These remain useful for diagnostics, but no longer dominate the page.

Backend Data Contract

Extend /api/v1/parsers/dashboard/ with an analytics object:

{
  "registry_enrichment_analytics": {
    "population": {
      "active_registry_organizations": 252,
      "active_memberships": 647,
      "registries_with_data_percent": 100
    },
    "coverage_summary": {
      "with_any_enrichment": 68,
      "with_any_enrichment_percent": 27.0,
      "core_profile_complete": 21,
      "core_profile_complete_percent": 8.3,
      "requires_attention": 184
    },
    "source_coverage": [
      {
        "source": "fns_reports",
        "label": "ФНС отчетность",
        "organizations_count": 45,
        "coverage_percent": 17.9,
        "required_for_core_profile": true,
        "risk_signal": false
      }
    ],
    "registry_source_matrix": [
      {
        "registry_id": "uuid",
        "registry_name": "Реестр ГК Росатом ГОЗ",
        "active_organizations": 139,
        "sources": {
          "fns_reports": {
            "organizations_count": 20,
            "coverage_percent": 14.4
          }
        }
      }
    ],
    "risk_signals": [
      {
        "source": "unfair_suppliers",
        "label": "Недобросовестные поставщики",
        "organizations_count": 3,
        "coverage_percent": 1.2
      }
    ],
    "pipeline": {
      "active_schedules": 15,
      "running_jobs": 0,
      "recent_success": 13,
      "recent_failed": 0,
      "recent_other": 1
    }
  }
}

The existing registry_data_coverage can remain temporarily for compatibility inside dashboard JS, but new UI should read registry_enrichment_analytics.

Aggregation Rules

  • Population is distinct organizations from active registry memberships: ended_at IS NULL.
  • Source coverage matches parser records to registry organizations by INN or OGRN.
  • FinancialReport matches by OGRN.
  • legacy ProcurementRecord matches by customer_inn and customer_ogrn.
  • unfair_suppliers is excluded from completeness and shown as a risk signal.
  • Percent values use one decimal place.
  • If a source has records but no identifiers, it does not count as organization coverage.

Core profile completeness for the first version:

  • Organization has FNS reports.
  • Organization has at least one industrial/manufacturer/product source.

This is intentionally conservative and can become configurable later.

Frontend Design

Implementation remains in src/templates/dashboard.html for now, following the current dashboard pattern.

New/updated DOM blocks:

  • analyticsRegistryKpis
  • registrySourceCoverageChart
  • registrySourceMatrix
  • enrichmentPipelinePanel
  • analyticsActionQueue
  • technicalSourceCounters

No chart dependency is added. Use CSS bars, compact matrix cells, and existing badges/cards. This keeps the dashboard self-contained.

Error Handling

  • If analytics aggregate is missing, show empty states instead of crashing.
  • If registries are unavailable, keep the pipeline and technical counters visible.
  • If coverage has zero population, render zeroed KPIs and explanatory empty states.

Testing

Add/update tests:

  • Dashboard API returns registry_enrichment_analytics.
  • unfair_suppliers appears in risk_signals, not source_coverage.
  • Matrix counts source coverage per registry.
  • Template contains new analytics sections and still includes secondary source counters.
  • Existing parser/dashboard tests continue to pass.

Manual validation:

  • Open /dashboard.
  • Confirm first visible analytics content is registry organization coverage.
  • Confirm source record totals are below primary registry analytics.
  • Confirm FNS table and existing organization drill-down are unaffected.

Risks

  • Matching by INN/OGRN can undercount sources with incomplete identifiers.
  • Current dashboard API may become heavier with matrix aggregation. Keep queries bounded and use grouped SQL where practical.
  • Core completeness definition is a business rule; first implementation uses a conservative default and should be easy to adjust.