Add organizations v2 API and registry enrichment
This commit is contained in:
@@ -0,0 +1,198 @@
|
||||
# Dashboard Registry Enrichment Analytics Design
|
||||
|
||||
Date: 2026-05-06
|
||||
|
||||
## Goal
|
||||
|
||||
Rework the dashboard analytics tab so it treats active registry organizations as the primary population and parser/enrichment jobs as the operational process that fills data for those organizations.
|
||||
|
||||
The existing analytics page is source-centric: total records, source counts, and load quality. That remains useful, but secondary. The new first screen must answer:
|
||||
|
||||
- How many active registry organizations are under control?
|
||||
- How many have additional data from enrichment sources?
|
||||
- Which registries are under-covered by source?
|
||||
- Which enrichment jobs are scheduled, running, successful, failed, or stale?
|
||||
- What actions should the operator take next?
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- Rebuild only the `analyticsPanel` dashboard tab.
|
||||
- Keep current navigation and other dashboard tabs unchanged.
|
||||
- Add dashboard API aggregate data under `/api/v1/parsers/dashboard/`.
|
||||
- Use active `RegistryMembershipPeriod` rows as the population.
|
||||
- Exclude `unfair_suppliers` from completeness degradation. It is a risk signal, not a required enrichment source.
|
||||
- Keep source record totals available, but move them below the primary registry analytics.
|
||||
|
||||
Out of scope:
|
||||
|
||||
- Changing v2 organization API contracts.
|
||||
- Changing parser execution behavior.
|
||||
- Changing Celery scheduling semantics.
|
||||
- Adding external chart dependencies.
|
||||
|
||||
## UX Structure
|
||||
|
||||
The analytics tab becomes a hybrid of "coverage center" and "enrichment pipeline".
|
||||
|
||||
Top section: Registry Organization Coverage
|
||||
|
||||
- KPI cards:
|
||||
- Active registry organizations.
|
||||
- Organizations with at least one enrichment source.
|
||||
- Organizations with core profile coverage.
|
||||
- Organizations requiring attention.
|
||||
- Coverage by source:
|
||||
- Bar rows for FNS reports, industrial certificates, products, manufacturers, inspections, procurements, arbitration, bankruptcy, FSTEC, vacancies, etc.
|
||||
- Each row shows matched organization count and percent of active registry organizations.
|
||||
- `unfair_suppliers` is not included here.
|
||||
- Registry × source matrix:
|
||||
- Rows are registries.
|
||||
- Columns are important enrichment sources.
|
||||
- Cells show percent coverage for organizations in that registry.
|
||||
- This gives a fast view of which registry/source pair needs work.
|
||||
|
||||
Second section: Enrichment Pipeline
|
||||
|
||||
- Job KPI cards:
|
||||
- Active schedules.
|
||||
- Running jobs.
|
||||
- Recent successes.
|
||||
- Recent failures.
|
||||
- Recent job quality meter:
|
||||
- Reuse existing load log status data, but frame it as enrichment pipeline health.
|
||||
- Action queue:
|
||||
- Organizations without enrichment data.
|
||||
- Organizations with identifier/matching problems.
|
||||
- Snapshots older than latest parser batches.
|
||||
- Risk signals such as unfair suppliers, bankruptcy, GOZ evasion shown separately from coverage.
|
||||
|
||||
Third section: Secondary Technical Counters
|
||||
|
||||
- Current source record totals and source mode breakdown move below the registry-focused blocks.
|
||||
- These remain useful for diagnostics, but no longer dominate the page.
|
||||
|
||||
## Backend Data Contract
|
||||
|
||||
Extend `/api/v1/parsers/dashboard/` with an analytics object:
|
||||
|
||||
```json
|
||||
{
|
||||
"registry_enrichment_analytics": {
|
||||
"population": {
|
||||
"active_registry_organizations": 252,
|
||||
"active_memberships": 647,
|
||||
"registries_with_data_percent": 100
|
||||
},
|
||||
"coverage_summary": {
|
||||
"with_any_enrichment": 68,
|
||||
"with_any_enrichment_percent": 27.0,
|
||||
"core_profile_complete": 21,
|
||||
"core_profile_complete_percent": 8.3,
|
||||
"requires_attention": 184
|
||||
},
|
||||
"source_coverage": [
|
||||
{
|
||||
"source": "fns_reports",
|
||||
"label": "ФНС отчетность",
|
||||
"organizations_count": 45,
|
||||
"coverage_percent": 17.9,
|
||||
"required_for_core_profile": true,
|
||||
"risk_signal": false
|
||||
}
|
||||
],
|
||||
"registry_source_matrix": [
|
||||
{
|
||||
"registry_id": "uuid",
|
||||
"registry_name": "Реестр ГК Росатом ГОЗ",
|
||||
"active_organizations": 139,
|
||||
"sources": {
|
||||
"fns_reports": {
|
||||
"organizations_count": 20,
|
||||
"coverage_percent": 14.4
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"risk_signals": [
|
||||
{
|
||||
"source": "unfair_suppliers",
|
||||
"label": "Недобросовестные поставщики",
|
||||
"organizations_count": 3,
|
||||
"coverage_percent": 1.2
|
||||
}
|
||||
],
|
||||
"pipeline": {
|
||||
"active_schedules": 15,
|
||||
"running_jobs": 0,
|
||||
"recent_success": 13,
|
||||
"recent_failed": 0,
|
||||
"recent_other": 1
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The existing `registry_data_coverage` can remain temporarily for compatibility inside dashboard JS, but new UI should read `registry_enrichment_analytics`.
|
||||
|
||||
## Aggregation Rules
|
||||
|
||||
- Population is distinct organizations from active registry memberships: `ended_at IS NULL`.
|
||||
- Source coverage matches parser records to registry organizations by INN or OGRN.
|
||||
- `FinancialReport` matches by OGRN.
|
||||
- legacy `ProcurementRecord` matches by `customer_inn` and `customer_ogrn`.
|
||||
- `unfair_suppliers` is excluded from completeness and shown as a risk signal.
|
||||
- Percent values use one decimal place.
|
||||
- If a source has records but no identifiers, it does not count as organization coverage.
|
||||
|
||||
Core profile completeness for the first version:
|
||||
|
||||
- Organization has FNS reports.
|
||||
- Organization has at least one industrial/manufacturer/product source.
|
||||
|
||||
This is intentionally conservative and can become configurable later.
|
||||
|
||||
## Frontend Design
|
||||
|
||||
Implementation remains in `src/templates/dashboard.html` for now, following the current dashboard pattern.
|
||||
|
||||
New/updated DOM blocks:
|
||||
|
||||
- `analyticsRegistryKpis`
|
||||
- `registrySourceCoverageChart`
|
||||
- `registrySourceMatrix`
|
||||
- `enrichmentPipelinePanel`
|
||||
- `analyticsActionQueue`
|
||||
- `technicalSourceCounters`
|
||||
|
||||
No chart dependency is added. Use CSS bars, compact matrix cells, and existing badges/cards. This keeps the dashboard self-contained.
|
||||
|
||||
## Error Handling
|
||||
|
||||
- If analytics aggregate is missing, show empty states instead of crashing.
|
||||
- If registries are unavailable, keep the pipeline and technical counters visible.
|
||||
- If coverage has zero population, render zeroed KPIs and explanatory empty states.
|
||||
|
||||
## Testing
|
||||
|
||||
Add/update tests:
|
||||
|
||||
- Dashboard API returns `registry_enrichment_analytics`.
|
||||
- `unfair_suppliers` appears in `risk_signals`, not `source_coverage`.
|
||||
- Matrix counts source coverage per registry.
|
||||
- Template contains new analytics sections and still includes secondary source counters.
|
||||
- Existing parser/dashboard tests continue to pass.
|
||||
|
||||
Manual validation:
|
||||
|
||||
- Open `/dashboard`.
|
||||
- Confirm first visible analytics content is registry organization coverage.
|
||||
- Confirm source record totals are below primary registry analytics.
|
||||
- Confirm FNS table and existing organization drill-down are unaffected.
|
||||
|
||||
## Risks
|
||||
|
||||
- Matching by INN/OGRN can undercount sources with incomplete identifiers.
|
||||
- Current dashboard API may become heavier with matrix aggregation. Keep queries bounded and use grouped SQL where practical.
|
||||
- Core completeness definition is a business rule; first implementation uses a conservative default and should be easy to adjust.
|
||||
Reference in New Issue
Block a user