6.9 KiB
Dashboard Registry Enrichment Analytics Design
Date: 2026-05-06
Goal
Rework the dashboard analytics tab so it treats active registry organizations as the primary population and parser/enrichment jobs as the operational process that fills data for those organizations.
The existing analytics page is source-centric: total records, source counts, and load quality. That remains useful, but secondary. The new first screen must answer:
- How many active registry organizations are under control?
- How many have additional data from enrichment sources?
- Which registries are under-covered by source?
- Which enrichment jobs are scheduled, running, successful, failed, or stale?
- What actions should the operator take next?
Scope
In scope:
- Rebuild only the
analyticsPaneldashboard tab. - Keep current navigation and other dashboard tabs unchanged.
- Add dashboard API aggregate data under
/api/v1/parsers/dashboard/. - Use active
RegistryMembershipPeriodrows as the population. - Exclude
unfair_suppliersfrom completeness degradation. It is a risk signal, not a required enrichment source. - Keep source record totals available, but move them below the primary registry analytics.
Out of scope:
- Changing v2 organization API contracts.
- Changing parser execution behavior.
- Changing Celery scheduling semantics.
- Adding external chart dependencies.
UX Structure
The analytics tab becomes a hybrid of "coverage center" and "enrichment pipeline".
Top section: Registry Organization Coverage
- KPI cards:
- Active registry organizations.
- Organizations with at least one enrichment source.
- Organizations with core profile coverage.
- Organizations requiring attention.
- Coverage by source:
- Bar rows for FNS reports, industrial certificates, products, manufacturers, inspections, procurements, arbitration, bankruptcy, FSTEC, vacancies, etc.
- Each row shows matched organization count and percent of active registry organizations.
unfair_suppliersis not included here.
- Registry × source matrix:
- Rows are registries.
- Columns are important enrichment sources.
- Cells show percent coverage for organizations in that registry.
- This gives a fast view of which registry/source pair needs work.
Second section: Enrichment Pipeline
- Job KPI cards:
- Active schedules.
- Running jobs.
- Recent successes.
- Recent failures.
- Recent job quality meter:
- Reuse existing load log status data, but frame it as enrichment pipeline health.
- Action queue:
- Organizations without enrichment data.
- Organizations with identifier/matching problems.
- Snapshots older than latest parser batches.
- Risk signals such as unfair suppliers, bankruptcy, GOZ evasion shown separately from coverage.
Third section: Secondary Technical Counters
- Current source record totals and source mode breakdown move below the registry-focused blocks.
- These remain useful for diagnostics, but no longer dominate the page.
Backend Data Contract
Extend /api/v1/parsers/dashboard/ with an analytics object:
{
"registry_enrichment_analytics": {
"population": {
"active_registry_organizations": 252,
"active_memberships": 647,
"registries_with_data_percent": 100
},
"coverage_summary": {
"with_any_enrichment": 68,
"with_any_enrichment_percent": 27.0,
"core_profile_complete": 21,
"core_profile_complete_percent": 8.3,
"requires_attention": 184
},
"source_coverage": [
{
"source": "fns_reports",
"label": "ФНС отчетность",
"organizations_count": 45,
"coverage_percent": 17.9,
"required_for_core_profile": true,
"risk_signal": false
}
],
"registry_source_matrix": [
{
"registry_id": "uuid",
"registry_name": "Реестр ГК Росатом ГОЗ",
"active_organizations": 139,
"sources": {
"fns_reports": {
"organizations_count": 20,
"coverage_percent": 14.4
}
}
}
],
"risk_signals": [
{
"source": "unfair_suppliers",
"label": "Недобросовестные поставщики",
"organizations_count": 3,
"coverage_percent": 1.2
}
],
"pipeline": {
"active_schedules": 15,
"running_jobs": 0,
"recent_success": 13,
"recent_failed": 0,
"recent_other": 1
}
}
}
The existing registry_data_coverage can remain temporarily for compatibility inside dashboard JS, but new UI should read registry_enrichment_analytics.
Aggregation Rules
- Population is distinct organizations from active registry memberships:
ended_at IS NULL. - Source coverage matches parser records to registry organizations by INN or OGRN.
FinancialReportmatches by OGRN.- legacy
ProcurementRecordmatches bycustomer_innandcustomer_ogrn. unfair_suppliersis excluded from completeness and shown as a risk signal.- Percent values use one decimal place.
- If a source has records but no identifiers, it does not count as organization coverage.
Core profile completeness for the first version:
- Organization has FNS reports.
- Organization has at least one industrial/manufacturer/product source.
This is intentionally conservative and can become configurable later.
Frontend Design
Implementation remains in src/templates/dashboard.html for now, following the current dashboard pattern.
New/updated DOM blocks:
analyticsRegistryKpisregistrySourceCoverageChartregistrySourceMatrixenrichmentPipelinePanelanalyticsActionQueuetechnicalSourceCounters
No chart dependency is added. Use CSS bars, compact matrix cells, and existing badges/cards. This keeps the dashboard self-contained.
Error Handling
- If analytics aggregate is missing, show empty states instead of crashing.
- If registries are unavailable, keep the pipeline and technical counters visible.
- If coverage has zero population, render zeroed KPIs and explanatory empty states.
Testing
Add/update tests:
- Dashboard API returns
registry_enrichment_analytics. unfair_suppliersappears inrisk_signals, notsource_coverage.- Matrix counts source coverage per registry.
- Template contains new analytics sections and still includes secondary source counters.
- Existing parser/dashboard tests continue to pass.
Manual validation:
- Open
/dashboard. - Confirm first visible analytics content is registry organization coverage.
- Confirm source record totals are below primary registry analytics.
- Confirm FNS table and existing organization drill-down are unaffected.
Risks
- Matching by INN/OGRN can undercount sources with incomplete identifiers.
- Current dashboard API may become heavier with matrix aggregation. Keep queries bounded and use grouped SQL where practical.
- Core completeness definition is a business rule; first implementation uses a conservative default and should be easy to adjust.