Files
mostovik-backend/docs/superpowers/specs/2026-05-18-direct-parser-source-ingestion-design.md

77 lines
3.0 KiB
Markdown

# Direct Parser Source Ingestion Design
## Goal
Parser runtime must write parsed source records directly into the organization-centric
polymorphic storage:
- `organizations_organization`
- `organizations_source_extension`
- source extension subclass tables
- `organizations_source_record`
- `organizations_source_financial_line`
Legacy parser record tables remain only as migration/audit inputs until a later
destructive cleanup. They must not be part of the parser runtime write path or the
runtime read path used by the application.
## Current Runtime Problem
Current parser tasks write source rows into legacy parser tables such as
`GenericParserRecord`, `InspectionRecord`, `ProcurementRecord`,
`IndustrialProductRecord`, and `FinancialReport`, then enqueue source backfill into
the new organization storage. This keeps old tables in the hot path and allows new
runtime data to diverge before the async backfill runs.
## Target Runtime
Parser tasks keep using `ParserLoadLog`, `ParserBatchSequence`, and `BackgroundJob`
as operational metadata. Parsed records are converted into normalized source-record
inputs and persisted through one ingestion service.
The ingestion service is responsible for:
- normalizing identity fields before writing canonical organizations;
- resolving or creating `Organization`;
- creating or updating the source-group polymorphic extension;
- creating or updating `OrganizationSourceRecord` by `(source, external_id)`;
- writing structured financial lines for FNS reports;
- refreshing extension counters in the same transaction.
Parser save services return the number of inserted or updated source records. They no
longer create or query legacy parser record models for runtime decisions.
## Runtime Read Scope
The following runtime reads must use organization source storage:
- parser source cards and source item counters;
- parser log organization counts;
- source detail lists;
- source record detail reads;
- frontend-facing parser result compatibility endpoints while they remain exposed;
- admin/dashboard/export paths that are used by the app during normal operation.
Legacy parser tables may still be read by explicit migration/backfill tooling only.
## Compatibility
Existing v1 parser-result URLs can remain during transition, but their data source must
be `OrganizationSourceRecord`, not the legacy parser models. Response shape can be
kept best-effort through serializers/adapters that read source-record payloads.
## Non-Goals
- Do not drop legacy parser tables in this phase.
- Do not rewrite parser clients.
- Do not remove parser load logs or background jobs.
- Do not make every payload strongly typed immediately.
## Risks
- Industrial product ingestion is large; the writer must avoid per-record table scans.
- Existing tests assert legacy model counts and must be updated to assert source-record
behavior.
- Some compatibility endpoints expose legacy primary keys. New records use UUIDs, so
compatibility adapters must accept source-record UUIDs where needed.