Files
mostovik-backend/docs/superpowers/specs/2026-05-06-dashboard-registry-enrichment-analytics-design.md
Aleksandr Meshchriakov 0f17ff6773
All checks were successful
CI/CD Pipeline / Quality Gate (push) Successful in 26s
CI/CD Pipeline / Build and Push Images (push) Successful in 6s
CI/CD Pipeline / Internal Notify (push) Successful in 0s
CI/CD Pipeline / Deploy Dev in Dokploy (push) Successful in 1s
Add organizations v2 API and registry enrichment
2026-05-06 19:04:46 +02:00

199 lines
6.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Dashboard Registry Enrichment Analytics Design
Date: 2026-05-06
## Goal
Rework the dashboard analytics tab so it treats active registry organizations as the primary population and parser/enrichment jobs as the operational process that fills data for those organizations.
The existing analytics page is source-centric: total records, source counts, and load quality. That remains useful, but secondary. The new first screen must answer:
- How many active registry organizations are under control?
- How many have additional data from enrichment sources?
- Which registries are under-covered by source?
- Which enrichment jobs are scheduled, running, successful, failed, or stale?
- What actions should the operator take next?
## Scope
In scope:
- Rebuild only the `analyticsPanel` dashboard tab.
- Keep current navigation and other dashboard tabs unchanged.
- Add dashboard API aggregate data under `/api/v1/parsers/dashboard/`.
- Use active `RegistryMembershipPeriod` rows as the population.
- Exclude `unfair_suppliers` from completeness degradation. It is a risk signal, not a required enrichment source.
- Keep source record totals available, but move them below the primary registry analytics.
Out of scope:
- Changing v2 organization API contracts.
- Changing parser execution behavior.
- Changing Celery scheduling semantics.
- Adding external chart dependencies.
## UX Structure
The analytics tab becomes a hybrid of "coverage center" and "enrichment pipeline".
Top section: Registry Organization Coverage
- KPI cards:
- Active registry organizations.
- Organizations with at least one enrichment source.
- Organizations with core profile coverage.
- Organizations requiring attention.
- Coverage by source:
- Bar rows for FNS reports, industrial certificates, products, manufacturers, inspections, procurements, arbitration, bankruptcy, FSTEC, vacancies, etc.
- Each row shows matched organization count and percent of active registry organizations.
- `unfair_suppliers` is not included here.
- Registry × source matrix:
- Rows are registries.
- Columns are important enrichment sources.
- Cells show percent coverage for organizations in that registry.
- This gives a fast view of which registry/source pair needs work.
Second section: Enrichment Pipeline
- Job KPI cards:
- Active schedules.
- Running jobs.
- Recent successes.
- Recent failures.
- Recent job quality meter:
- Reuse existing load log status data, but frame it as enrichment pipeline health.
- Action queue:
- Organizations without enrichment data.
- Organizations with identifier/matching problems.
- Snapshots older than latest parser batches.
- Risk signals such as unfair suppliers, bankruptcy, GOZ evasion shown separately from coverage.
Third section: Secondary Technical Counters
- Current source record totals and source mode breakdown move below the registry-focused blocks.
- These remain useful for diagnostics, but no longer dominate the page.
## Backend Data Contract
Extend `/api/v1/parsers/dashboard/` with an analytics object:
```json
{
"registry_enrichment_analytics": {
"population": {
"active_registry_organizations": 252,
"active_memberships": 647,
"registries_with_data_percent": 100
},
"coverage_summary": {
"with_any_enrichment": 68,
"with_any_enrichment_percent": 27.0,
"core_profile_complete": 21,
"core_profile_complete_percent": 8.3,
"requires_attention": 184
},
"source_coverage": [
{
"source": "fns_reports",
"label": "ФНС отчетность",
"organizations_count": 45,
"coverage_percent": 17.9,
"required_for_core_profile": true,
"risk_signal": false
}
],
"registry_source_matrix": [
{
"registry_id": "uuid",
"registry_name": "Реестр ГК Росатом ГОЗ",
"active_organizations": 139,
"sources": {
"fns_reports": {
"organizations_count": 20,
"coverage_percent": 14.4
}
}
}
],
"risk_signals": [
{
"source": "unfair_suppliers",
"label": "Недобросовестные поставщики",
"organizations_count": 3,
"coverage_percent": 1.2
}
],
"pipeline": {
"active_schedules": 15,
"running_jobs": 0,
"recent_success": 13,
"recent_failed": 0,
"recent_other": 1
}
}
}
```
The existing `registry_data_coverage` can remain temporarily for compatibility inside dashboard JS, but new UI should read `registry_enrichment_analytics`.
## Aggregation Rules
- Population is distinct organizations from active registry memberships: `ended_at IS NULL`.
- Source coverage matches parser records to registry organizations by INN or OGRN.
- `FinancialReport` matches by OGRN.
- legacy `ProcurementRecord` matches by `customer_inn` and `customer_ogrn`.
- `unfair_suppliers` is excluded from completeness and shown as a risk signal.
- Percent values use one decimal place.
- If a source has records but no identifiers, it does not count as organization coverage.
Core profile completeness for the first version:
- Organization has FNS reports.
- Organization has at least one industrial/manufacturer/product source.
This is intentionally conservative and can become configurable later.
## Frontend Design
Implementation remains in `src/templates/dashboard.html` for now, following the current dashboard pattern.
New/updated DOM blocks:
- `analyticsRegistryKpis`
- `registrySourceCoverageChart`
- `registrySourceMatrix`
- `enrichmentPipelinePanel`
- `analyticsActionQueue`
- `technicalSourceCounters`
No chart dependency is added. Use CSS bars, compact matrix cells, and existing badges/cards. This keeps the dashboard self-contained.
## Error Handling
- If analytics aggregate is missing, show empty states instead of crashing.
- If registries are unavailable, keep the pipeline and technical counters visible.
- If coverage has zero population, render zeroed KPIs and explanatory empty states.
## Testing
Add/update tests:
- Dashboard API returns `registry_enrichment_analytics`.
- `unfair_suppliers` appears in `risk_signals`, not `source_coverage`.
- Matrix counts source coverage per registry.
- Template contains new analytics sections and still includes secondary source counters.
- Existing parser/dashboard tests continue to pass.
Manual validation:
- Open `/dashboard`.
- Confirm first visible analytics content is registry organization coverage.
- Confirm source record totals are below primary registry analytics.
- Confirm FNS table and existing organization drill-down are unaffected.
## Risks
- Matching by INN/OGRN can undercount sources with incomplete identifiers.
- Current dashboard API may become heavier with matrix aggregation. Keep queries bounded and use grouped SQL where practical.
- Core completeness definition is a business rule; first implementation uses a conservative default and should be easy to adjust.