avm/mostovik-backend

Fork 0

Files

Aleksandr Meshchriakov 0f17ff6773

CI/CD Pipeline / Quality Gate (push) Successful in 26s

Details

CI/CD Pipeline / Build and Push Images (push) Successful in 6s

Details

CI/CD Pipeline / Internal Notify (push) Successful in 0s

Details

CI/CD Pipeline / Deploy Dev in Dokploy (push) Successful in 1s

Details

Add organizations v2 API and registry enrichment

2026-05-06 19:04:46 +02:00

6.9 KiB

Raw Blame History

Dashboard Registry Enrichment Analytics Design

Date: 2026-05-06

Goal

Rework the dashboard analytics tab so it treats active registry organizations as the primary population and parser/enrichment jobs as the operational process that fills data for those organizations.

The existing analytics page is source-centric: total records, source counts, and load quality. That remains useful, but secondary. The new first screen must answer:

How many active registry organizations are under control?
How many have additional data from enrichment sources?
Which registries are under-covered by source?
Which enrichment jobs are scheduled, running, successful, failed, or stale?
What actions should the operator take next?

Scope

In scope:

Rebuild only the analyticsPanel dashboard tab.
Keep current navigation and other dashboard tabs unchanged.
Add dashboard API aggregate data under /api/v1/parsers/dashboard/.
Use active RegistryMembershipPeriod rows as the population.
Exclude unfair_suppliers from completeness degradation. It is a risk signal, not a required enrichment source.
Keep source record totals available, but move them below the primary registry analytics.

Out of scope:

Changing v2 organization API contracts.
Changing parser execution behavior.
Changing Celery scheduling semantics.
Adding external chart dependencies.

UX Structure

The analytics tab becomes a hybrid of "coverage center" and "enrichment pipeline".

Top section: Registry Organization Coverage

KPI cards:
- Active registry organizations.
- Organizations with at least one enrichment source.
- Organizations with core profile coverage.
- Organizations requiring attention.
Coverage by source:
- Bar rows for FNS reports, industrial certificates, products, manufacturers, inspections, procurements, arbitration, bankruptcy, FSTEC, vacancies, etc.
- Each row shows matched organization count and percent of active registry organizations.
- unfair_suppliers is not included here.
Registry × source matrix:
- Rows are registries.
- Columns are important enrichment sources.
- Cells show percent coverage for organizations in that registry.
- This gives a fast view of which registry/source pair needs work.

Second section: Enrichment Pipeline

Job KPI cards:
- Active schedules.
- Running jobs.
- Recent successes.
- Recent failures.
Recent job quality meter:
- Reuse existing load log status data, but frame it as enrichment pipeline health.
Action queue:
- Organizations without enrichment data.
- Organizations with identifier/matching problems.
- Snapshots older than latest parser batches.
- Risk signals such as unfair suppliers, bankruptcy, GOZ evasion shown separately from coverage.

Third section: Secondary Technical Counters

Current source record totals and source mode breakdown move below the registry-focused blocks.
These remain useful for diagnostics, but no longer dominate the page.

Backend Data Contract

Extend /api/v1/parsers/dashboard/ with an analytics object:

{
  "registry_enrichment_analytics": {
    "population": {
      "active_registry_organizations": 252,
      "active_memberships": 647,
      "registries_with_data_percent": 100
    },
    "coverage_summary": {
      "with_any_enrichment": 68,
      "with_any_enrichment_percent": 27.0,
      "core_profile_complete": 21,
      "core_profile_complete_percent": 8.3,
      "requires_attention": 184
    },
    "source_coverage": [
      {
        "source": "fns_reports",
        "label": "ФНС отчетность",
        "organizations_count": 45,
        "coverage_percent": 17.9,
        "required_for_core_profile": true,
        "risk_signal": false
      }
    ],
    "registry_source_matrix": [
      {
        "registry_id": "uuid",
        "registry_name": "Реестр ГК Росатом ГОЗ",
        "active_organizations": 139,
        "sources": {
          "fns_reports": {
            "organizations_count": 20,
            "coverage_percent": 14.4
          }
        }
      }
    ],
    "risk_signals": [
      {
        "source": "unfair_suppliers",
        "label": "Недобросовестные поставщики",
        "organizations_count": 3,
        "coverage_percent": 1.2
      }
    ],
    "pipeline": {
      "active_schedules": 15,
      "running_jobs": 0,
      "recent_success": 13,
      "recent_failed": 0,
      "recent_other": 1
    }
  }
}

The existing registry_data_coverage can remain temporarily for compatibility inside dashboard JS, but new UI should read registry_enrichment_analytics.

Aggregation Rules

Population is distinct organizations from active registry memberships: ended_at IS NULL.
Source coverage matches parser records to registry organizations by INN or OGRN.
FinancialReport matches by OGRN.
legacy ProcurementRecord matches by customer_inn and customer_ogrn.
unfair_suppliers is excluded from completeness and shown as a risk signal.
Percent values use one decimal place.
If a source has records but no identifiers, it does not count as organization coverage.

Core profile completeness for the first version:

Organization has FNS reports.
Organization has at least one industrial/manufacturer/product source.

This is intentionally conservative and can become configurable later.

Frontend Design

Implementation remains in src/templates/dashboard.html for now, following the current dashboard pattern.

New/updated DOM blocks:

analyticsRegistryKpis
registrySourceCoverageChart
registrySourceMatrix
enrichmentPipelinePanel
analyticsActionQueue
technicalSourceCounters

No chart dependency is added. Use CSS bars, compact matrix cells, and existing badges/cards. This keeps the dashboard self-contained.

Error Handling

If analytics aggregate is missing, show empty states instead of crashing.
If registries are unavailable, keep the pipeline and technical counters visible.
If coverage has zero population, render zeroed KPIs and explanatory empty states.

Testing

Add/update tests:

Dashboard API returns registry_enrichment_analytics.
unfair_suppliers appears in risk_signals, not source_coverage.
Matrix counts source coverage per registry.
Template contains new analytics sections and still includes secondary source counters.
Existing parser/dashboard tests continue to pass.

Manual validation:

Open /dashboard.
Confirm first visible analytics content is registry organization coverage.
Confirm source record totals are below primary registry analytics.
Confirm FNS table and existing organization drill-down are unaffected.

Risks

Matching by INN/OGRN can undercount sources with incomplete identifiers.
Current dashboard API may become heavier with matrix aggregation. Keep queries bounded and use grouped SQL where practical.
Core completeness definition is a business rule; first implementation uses a conservative default and should be easy to adjust.

6.9 KiB Raw Blame History Unescape Escape

Dashboard Registry Enrichment Analytics Design

Goal

Scope

UX Structure

Backend Data Contract

Aggregation Rules

Frontend Design

Error Handling

Testing

Risks

6.9 KiB

Raw Blame History