1105 lines
34 KiB
Markdown
1105 lines
34 KiB
Markdown
# Техническая справка: Парсер ЕИС Закупок (zakupki.gov.ru)
|
||
|
||
**Версия:** 1.0
|
||
**Дата:** 26 марта 2026
|
||
**Статус:** Production-ready
|
||
|
||
---
|
||
|
||
## Содержание
|
||
|
||
1. [Обзор системы](#1-обзор-системы)
|
||
2. [Парсируемый ресурс](#2-парсируемый-ресурс)
|
||
3. [Архитектура](#3-архитектура)
|
||
4. [Процесс загрузки данных](#4-процесс-загрузки-данных)
|
||
5. [Структура данных](#5-структура-данных)
|
||
6. [Хранение в БД](#6-хранение-в-бд)
|
||
7. [API](#7-api)
|
||
8. [Фоновые задачи](#8-фоновые-задачи)
|
||
9. [Конфигурация](#9-конфигурация)
|
||
10. [Примеры](#10-примеры)
|
||
|
||
---
|
||
|
||
## 1. Обзор системы
|
||
|
||
### Назначение
|
||
|
||
Сервис парсит данные о государственных закупках из **Единой информационной системы в сфере закупок (ЕИС)** — zakupki.gov.ru.
|
||
|
||
### Поддерживаемые законы
|
||
|
||
- **44-ФЗ** — Федеральный закон "О контрактной системе в сфере закупок"
|
||
- **223-ФЗ** — Федеральный закон "О закупках товаров, работ, услуг отдельными видами юридических лиц"
|
||
|
||
### Возможности
|
||
|
||
- SOAP API (int44.zakupki.gov.ru)
|
||
- Парсинг XML-архивов
|
||
- Поддержка 80+ регионов РФ
|
||
- Инкрементальная синхронизация
|
||
- Точечный запрос по номеру закупки
|
||
- Связывание с организациями
|
||
- Отслеживание прогресса (BackgroundJob)
|
||
- Логирование (ParserLoadLog)
|
||
|
||
---
|
||
|
||
## 2. Парсируемый ресурс
|
||
|
||
### Источник данных
|
||
|
||
| Параметр | Значение |
|
||
|----------|----------|
|
||
| **Название** | Единая информационная система в сфере закупок (ЕИС) |
|
||
| **Домен** | zakupki.gov.ru |
|
||
| **SOAP API** | https://int44.zakupki.gov.ru/eis-integration/services/getDocsIP |
|
||
| **Протокол** | SOAP 1.2 over HTTPS |
|
||
| **Формат** | XML в ZIP-архивах |
|
||
| **Авторизация** | Токен через Госуслуги |
|
||
|
||
### Получение токена
|
||
|
||
```
|
||
URL: https://zakupki.gov.ru/pmd/auth/welcome
|
||
Требуется: Учётная запись Госуслуги (ЕСИА)
|
||
Токен: individualPerson_token (в SOAP-заголовке)
|
||
```
|
||
|
||
### Методы SOAP API
|
||
|
||
#### getDocsByOrgRegionRequest
|
||
|
||
Запрос по региону и периоду.
|
||
|
||
**Параметры:**
|
||
- `orgRegion` — код региона ("77" = Москва)
|
||
- `subsystemType` — "PRIZ" (44-ФЗ), "OOS223" (223-ФЗ)
|
||
- `documentType44` — тип документа:
|
||
- `epNotificationEF2020` — электронный аукцион
|
||
- `epNotificationOK2020` — открытый конкурс
|
||
- `epNotificationZK2020` — запрос котировок
|
||
- `periodInfo/exactDate` — дата (YYYY-MM-DD)
|
||
|
||
**Ответ:**
|
||
```xml
|
||
<dataInfo>
|
||
<archiveUrl>https://zakupki.gov.ru/opendata/download/...</archiveUrl>
|
||
</dataInfo>
|
||
```
|
||
|
||
#### getDocsByReestrNumberRequest
|
||
|
||
Точечный запрос по номеру закупки.
|
||
|
||
**Параметры:**
|
||
- `reestrNumber` — номер (например, "0888200000224000038")
|
||
|
||
---
|
||
|
||
## 3. Архитектура
|
||
|
||
### Компоненты
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────┐
|
||
│ Celery Task Layer │
|
||
│ parse_procurements │ sync_procurements │
|
||
│ └──────┬────────────┴────────────┬────────────────┤
|
||
│ ▼ ▼ │
|
||
│ BackgroundJob (progress) │ │
|
||
│ ParserLoadLog (audit) │ │
|
||
└─────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────┐
|
||
│ Service Layer │
|
||
│ ProcurementService │
|
||
│ - save_procurements() (bulk upsert) │
|
||
│ - get_last_loaded_period() │
|
||
│ - find_by_inn(), find_by_purchase_number() │
|
||
│ │
|
||
│ RegistryOrganizationResolver │
|
||
│ - build_lookup() (INN/OGRN → Org ID) │
|
||
└─────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────┐
|
||
│ Client Layer │
|
||
│ ZakupkiClient │
|
||
│ - fetch_procurements() │
|
||
│ - _fetch_via_soap() │
|
||
│ - _build_soap_request_*() │
|
||
│ - _parse_soap_response() │
|
||
│ - _parse_archive_content() │
|
||
│ - _parse_xml_record() │
|
||
│ │
|
||
│ BaseHTTPClient │
|
||
│ - get(), post(), download_file() │
|
||
│ - Proxy support │
|
||
└─────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────┐
|
||
│ Data Layer │
|
||
│ ProcurementRecord (Django Model) │
|
||
│ - 30+ fields │
|
||
│ - Indexes & constraints │
|
||
│ │
|
||
│ Procurement (dataclass DTO) │
|
||
└─────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Файловая структура
|
||
|
||
```
|
||
src/apps/parsers/
|
||
├── clients/
|
||
│ ├── base.py # Базовый HTTP-клиент
|
||
│ └── zakupki/
|
||
│ ├── __init__.py # ZakupkiClient (888 строк)
|
||
│ └── schemas.py # Dataclass схемы
|
||
├── models.py # Django модели
|
||
├── services.py # Business logic
|
||
├── tasks.py # Celery задачи
|
||
├── views.py # DRF ViewSet
|
||
├── serializers.py # DRF Serializers
|
||
├── admin.py # Django Admin
|
||
└── migrations/
|
||
├── 0006_add_procurement_model.py
|
||
├── 0010_link_registry_organizations.py
|
||
└── 0011_add_normalized_date_and_amount_fields.py
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Процесс загрузки данных
|
||
|
||
### 4.1. Полный цикл (parse_procurements)
|
||
|
||
```python
|
||
# 1. Создание лога
|
||
load_log, batch_id = ParserLoadLogService.create_load_log_with_next_batch_id(
|
||
source=ParserLoadLog.Source.PROCUREMENTS,
|
||
status="in_progress"
|
||
)
|
||
|
||
# 2. BackgroundJob для отслеживания
|
||
job = BackgroundJob.objects.create(
|
||
task_id=task_id,
|
||
task_name="apps.parsers.tasks.parse_procurements",
|
||
status="in_progress"
|
||
)
|
||
|
||
# 3. Клиент
|
||
client = ZakupkiClient(
|
||
token=settings.ZAKUPKI_TOKEN,
|
||
proxies=proxies
|
||
)
|
||
|
||
# 4. SOAP запрос
|
||
soap_request = client._build_soap_request_by_region(
|
||
region_code="77",
|
||
law_type="44",
|
||
year=2025,
|
||
month=3
|
||
)
|
||
|
||
response = requests.post(
|
||
"https://int44.zakupki.gov.ru/eis-integration/services/getDocsIP",
|
||
data=soap_request,
|
||
headers={
|
||
"Content-Type": "text/xml; charset=utf-8",
|
||
"individualPerson_token": settings.ZAKUPKI_TOKEN
|
||
}
|
||
)
|
||
|
||
# 5. Парсинг ответа → archive_url
|
||
archive_url = client._parse_soap_response(response)
|
||
|
||
# 6. Скачивание ZIP
|
||
archive_content = client.http_client.download_file(
|
||
archive_url,
|
||
headers={"individualPerson_token": settings.ZAKUPKI_TOKEN}
|
||
)
|
||
|
||
# 7. Распаковка и парсинг XML
|
||
procurements = client._parse_archive_content(archive_content, archive_url)
|
||
|
||
# 8. Сохранение (bulk upsert)
|
||
saved_count = ProcurementService.save_procurements(
|
||
procurements,
|
||
batch_id=batch_id,
|
||
region_code="77",
|
||
data_year=2025,
|
||
data_month=3,
|
||
chunk_size=500
|
||
)
|
||
|
||
# 9. Обновление статуса
|
||
ParserLoadLogService.update(load_log, status="success", records_count=saved_count)
|
||
job.complete(result={"batch_id": batch_id, "saved": saved_count})
|
||
```
|
||
|
||
### 4.2. Инкрементальная синхронизация (sync_procurements)
|
||
|
||
```python
|
||
# 1. Последняя загруженная дата
|
||
last_year, last_month = ProcurementService.get_last_loaded_period(
|
||
region_code="77",
|
||
law_type="44-FZ"
|
||
)
|
||
|
||
# 2. Начальная точка
|
||
if last_year and last_month:
|
||
start_year, start_month = last_year, last_month + 1
|
||
else:
|
||
start_year, start_month = 2025, 1 # по умолчанию
|
||
|
||
# 3. Загрузка месяц за месяцем
|
||
empty_months_count = 0
|
||
year, month = start_year, start_month
|
||
|
||
while year < current_year or (year == current_year and month <= current_month):
|
||
procurements = client.fetch_procurements(
|
||
region_code=region_code,
|
||
year=year,
|
||
month=month
|
||
)
|
||
|
||
if procurements:
|
||
ProcurementService.save_procurements(...)
|
||
empty_months_count = 0
|
||
else:
|
||
empty_months_count += 1
|
||
|
||
if empty_months_count >= 2: # остановка
|
||
break
|
||
|
||
# следующий месяц
|
||
month += 1
|
||
if month > 12:
|
||
year += 1
|
||
month = 1
|
||
```
|
||
|
||
### 4.3. Парсинг XML
|
||
|
||
```python
|
||
def _parse_xml_record(element: ET.Element) -> Procurement:
|
||
# Поиск с учётом namespace
|
||
def find_child(tag): ...
|
||
|
||
# Извлечение текста
|
||
def get_text(tags):
|
||
for tag in tags:
|
||
if tag in element.attrib:
|
||
return element.attrib[tag]
|
||
child = find_child(tag)
|
||
if child is not None and child.text:
|
||
return child.text.strip()
|
||
return ""
|
||
|
||
# Вложенные структуры
|
||
def get_nested_text(parent_tags, child_tags): ...
|
||
|
||
# Маппинг полей
|
||
purchase_number = get_text(["purchaseNumber", "regNum"])
|
||
purchase_name = get_text(["purchaseObjectInfo", "name"])
|
||
|
||
customer_inn = get_nested_text(
|
||
["customer", "organizationInfo"],
|
||
["INN", "inn"]
|
||
)
|
||
|
||
max_price = get_nested_text(
|
||
["lot", "lotData"],
|
||
["maxPrice", "initialSum"]
|
||
)
|
||
|
||
publish_date = get_text(["publishDate", "createDate"])
|
||
end_date = get_text(["endDate", "submissionCloseDate"])
|
||
status = get_text(["state", "status"])
|
||
|
||
# Определение закона
|
||
law_type = ""
|
||
if "44" in element.tag or "fcs" in element.tag.lower():
|
||
law_type = "44-FZ"
|
||
elif "223" in element.tag:
|
||
law_type = "223-FZ"
|
||
|
||
return Procurement(...)
|
||
```
|
||
|
||
### 4.4. Нормализация
|
||
|
||
```python
|
||
def normalize_to_date(value: str | None) -> date | None:
|
||
"""Строка → date (YYYY-MM-DD, DD.MM.YYYY, ISO 8601)"""
|
||
if not value:
|
||
return None
|
||
|
||
candidate = str(value).strip().replace("T", " ").replace("Z", "")
|
||
|
||
for fmt in ["%Y-%m-%d", "%d.%m.%Y", "%Y-%m-%d %H:%M:%S"]:
|
||
try:
|
||
return datetime.strptime(candidate, fmt).date()
|
||
except ValueError:
|
||
continue
|
||
|
||
# Fallback: regex
|
||
match = re.search(r"\b\d{4}-\d{2}-\d{2}\b", candidate)
|
||
if match:
|
||
return datetime.strptime(match.group(0), "%Y-%m-%d").date()
|
||
|
||
return None
|
||
|
||
|
||
def normalize_to_decimal(value: str | None) -> Decimal | None:
|
||
"""Строка → Decimal (удаление ₽, пробелов, замена запятой)"""
|
||
if not value:
|
||
return None
|
||
|
||
normalized = (
|
||
str(value)
|
||
.replace("\u00a0", "")
|
||
.replace(" ", "")
|
||
.replace("₽", "")
|
||
.replace("руб.", "")
|
||
.replace("руб", "")
|
||
)
|
||
|
||
normalized = re.sub(r"[^0-9,.\-]", "", normalized)
|
||
|
||
# Обработка разделителя
|
||
if "," in normalized and "." in normalized:
|
||
if normalized.rfind(",") > normalized.rfind("."):
|
||
normalized = normalized.replace(".", "").replace(",", ".")
|
||
else:
|
||
normalized = normalized.replace(",", "")
|
||
elif "," in normalized:
|
||
normalized = normalized.replace(",", ".")
|
||
|
||
try:
|
||
return Decimal(normalized)
|
||
except (InvalidOperation, ValueError):
|
||
return None
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Структура данных
|
||
|
||
### DTO (Procurement dataclass)
|
||
|
||
**Файл:** `src/apps/parsers/clients/zakupki/schemas.py`
|
||
|
||
```python
|
||
@dataclass(frozen=True)
|
||
class Procurement:
|
||
purchase_number: str # Реестровый номер
|
||
purchase_name: str # Наименование
|
||
customer_inn: str # ИНН заказчика
|
||
customer_kpp: str # КПП заказчика
|
||
customer_ogrn: str # ОГРН заказчика
|
||
customer_name: str # Наименование заказчика
|
||
max_price: str # НМЦ (строка)
|
||
currency_code: str # Код валюты (RUB)
|
||
placement_method: str # Способ определения
|
||
publish_date: str # Дата публикации
|
||
end_date: str # Дата окончания
|
||
status: str # Статус
|
||
law_type: str # 44-ФЗ / 223-ФЗ
|
||
purchase_object_info: str = "" # Объект закупки
|
||
href: str = "" # Ссылка
|
||
```
|
||
|
||
### Пример данных
|
||
|
||
```json
|
||
{
|
||
"purchase_number": "0888200000224000038",
|
||
"purchase_name": "Поставка офисной бумаги",
|
||
"customer_inn": "7707083893",
|
||
"customer_kpp": "770701001",
|
||
"customer_ogrn": "1027700034460",
|
||
"customer_name": "ПАО СБЕРБАНК",
|
||
"max_price": "1500000.00",
|
||
"currency_code": "RUB",
|
||
"placement_method": "Электронный аукцион",
|
||
"publish_date": "2025-03-15",
|
||
"end_date": "2025-03-25T18:00:00",
|
||
"status": "Подача заявок",
|
||
"law_type": "44-FZ"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Хранение в БД
|
||
|
||
### 6.1. Таблица: parsers_procurement
|
||
|
||
**Модель:** `ProcurementRecord`
|
||
**Файл:** `src/apps/parsers/models.py`
|
||
|
||
#### Поля
|
||
|
||
| Поле | Тип БД | Django | Null | Index | Описание |
|
||
|------|--------|--------|------|-------|----------|
|
||
| id | BIGSERIAL | BigAutoField | NO | PK | Первичный ключ |
|
||
| created_at | TIMESTAMP | DateTimeField | NO | ✓ | Дата создания |
|
||
| updated_at | TIMESTAMP | DateTimeField | NO | - | Дата обновления |
|
||
| load_batch | INTEGER | PositiveIntegerField | NO | ✓ | ID пакета |
|
||
| purchase_number | VARCHAR(100) | CharField | NO | ✓ | Реестровый номер |
|
||
| purchase_name | TEXT | TextField | NO | - | Наименование |
|
||
| customer_inn | VARCHAR(20) | CharField | NO | ✓ | ИНН |
|
||
| customer_kpp | VARCHAR(20) | CharField | YES | - | КПП |
|
||
| customer_ogrn | VARCHAR(20) | CharField | YES | ✓ | ОГРН |
|
||
| customer_name | TEXT | TextField | NO | - | Наименование заказчика |
|
||
| max_price | VARCHAR(50) | CharField | YES | - | НМЦ (строка) |
|
||
| max_price_amount | DECIMAL(20,2) | DecimalField | YES | ✓ | НМЦ (число) |
|
||
| currency_code | VARCHAR(10) | CharField | NO | - | Код валюты |
|
||
| placement_method | VARCHAR(255) | CharField | YES | - | Способ определения |
|
||
| publish_date | VARCHAR(30) | CharField | YES | - | Дата (строка) |
|
||
| publish_date_normalized | DATE | DateField | YES | ✓ | Дата (date) |
|
||
| end_date | VARCHAR(30) | CharField | YES | - | Дата окончания (строка) |
|
||
| end_date_normalized | DATE | DateField | YES | ✓ | Дата окончания (date) |
|
||
| status | VARCHAR(100) | CharField | YES | - | Статус |
|
||
| law_type | VARCHAR(20) | CharField | YES | ✓ | Тип закона |
|
||
| purchase_object_info | TEXT | TextField | YES | - | Объект |
|
||
| href | VARCHAR(500) | URLField | YES | - | Ссылка |
|
||
| region_code | VARCHAR(10) | CharField | YES | ✓ | Код региона |
|
||
| data_year | SMALLINT | PositiveSmallIntegerField | YES | ✓ | Год данных |
|
||
| data_month | SMALLINT | PositiveSmallIntegerField | YES | ✓ | Месяц данных |
|
||
| registry_organization_id | BIGINT | ForeignKey | YES | - | FK к организациям |
|
||
|
||
#### Индексы
|
||
|
||
```sql
|
||
-- Одиночные (db_index=True)
|
||
CREATE INDEX ON parsers_procurement(created_at);
|
||
CREATE INDEX ON parsers_procurement(load_batch);
|
||
CREATE INDEX ON parsers_procurement(purchase_number);
|
||
CREATE INDEX ON parsers_procurement(customer_inn);
|
||
CREATE INDEX ON parsers_procurement(customer_ogrn);
|
||
CREATE INDEX ON parsers_procurement(max_price_amount);
|
||
CREATE INDEX ON parsers_procurement(publish_date_normalized);
|
||
CREATE INDEX ON parsers_procurement(end_date_normalized);
|
||
CREATE INDEX ON parsers_procurement(law_type);
|
||
CREATE INDEX ON parsers_procurement(region_code);
|
||
CREATE INDEX ON parsers_procurement(data_year);
|
||
CREATE INDEX ON parsers_procurement(data_month);
|
||
|
||
-- Составные (Meta.indexes)
|
||
CREATE INDEX ON parsers_procurement(customer_inn, purchase_number);
|
||
CREATE INDEX ON parsers_procurement(load_batch, customer_inn);
|
||
CREATE INDEX ON parsers_procurement(law_type, data_year, data_month);
|
||
```
|
||
|
||
#### Ограничения
|
||
|
||
```sql
|
||
-- Уникальность номера
|
||
ALTER TABLE parsers_procurement
|
||
ADD CONSTRAINT unique_procurement_purchase_number
|
||
UNIQUE (purchase_number);
|
||
|
||
-- FK на организации
|
||
ALTER TABLE parsers_procurement
|
||
ADD CONSTRAINT fk_registry_organization
|
||
FOREIGN KEY (registry_organization_id)
|
||
REFERENCES registers_organization(id)
|
||
ON DELETE SET NULL;
|
||
```
|
||
|
||
#### DDL (CREATE TABLE)
|
||
|
||
```sql
|
||
CREATE TABLE "parsers_procurement" (
|
||
"id" bigserial NOT NULL PRIMARY KEY,
|
||
"created_at" timestamp with time zone NOT NULL,
|
||
"updated_at" timestamp with time zone NOT NULL,
|
||
"load_batch" integer NOT NULL,
|
||
"purchase_number" varchar(100) NOT NULL,
|
||
"purchase_name" text NOT NULL,
|
||
"customer_inn" varchar(20) NOT NULL,
|
||
"customer_kpp" varchar(20),
|
||
"customer_ogrn" varchar(20),
|
||
"customer_name" text NOT NULL,
|
||
"max_price" varchar(50),
|
||
"max_price_amount" decimal(20, 2),
|
||
"currency_code" varchar(10) NOT NULL DEFAULT 'RUB',
|
||
"placement_method" varchar(255),
|
||
"publish_date" varchar(30),
|
||
"publish_date_normalized" date,
|
||
"end_date" varchar(30),
|
||
"end_date_normalized" date,
|
||
"status" varchar(100),
|
||
"law_type" varchar(20),
|
||
"purchase_object_info" text,
|
||
"href" varchar(500),
|
||
"region_code" varchar(10),
|
||
"data_year" smallint,
|
||
"data_month" smallint,
|
||
"registry_organization_id" bigint,
|
||
|
||
CONSTRAINT "unique_procurement_purchase_number" UNIQUE ("purchase_number"),
|
||
CONSTRAINT "fk_registry_organization"
|
||
FOREIGN KEY ("registry_organization_id")
|
||
REFERENCES "registers_organization" ("id")
|
||
ON DELETE SET NULL
|
||
);
|
||
```
|
||
|
||
### 6.2. Таблица: parsers_load_log
|
||
|
||
**Модель:** `ParserLoadLog`
|
||
|
||
| Поле | Тип | Null | Index | Описание |
|
||
|------|-----|------|-------|----------|
|
||
| id | BIGSERIAL | NO | PK | Первичный ключ |
|
||
| created_at | TIMESTAMP | NO | - | Дата создания |
|
||
| updated_at | TIMESTAMP | NO | - | Дата обновления |
|
||
| batch_id | INTEGER | NO | ✓ | ID пакета |
|
||
| source | VARCHAR(50) | NO | ✓ | Источник |
|
||
| records_count | INTEGER | NO | - | Количество |
|
||
| status | VARCHAR(20) | NO | - | Статус |
|
||
| error_message | TEXT | YES | - | Ошибка |
|
||
|
||
**Ограничение:** UNIQUE (source, batch_id)
|
||
|
||
```sql
|
||
CREATE TABLE "parsers_load_log" (
|
||
"id" bigserial NOT NULL PRIMARY KEY,
|
||
"created_at" timestamp with time zone NOT NULL,
|
||
"updated_at" timestamp with time zone NOT NULL,
|
||
"batch_id" integer NOT NULL,
|
||
"source" varchar(50) NOT NULL,
|
||
"records_count" integer NOT NULL DEFAULT 0,
|
||
"status" varchar(20) NOT NULL DEFAULT 'success',
|
||
"error_message" text,
|
||
|
||
CONSTRAINT "unique_load_batch_per_source" UNIQUE ("source", "batch_id")
|
||
);
|
||
```
|
||
|
||
**Значения source:**
|
||
- `procurements` — Госзакупки (ЕИС)
|
||
- `industrial` — Промышленное производство
|
||
- `manufactures` — Реестр производителей
|
||
- `inspections` — Единый реестр проверок
|
||
- `fns_reports` — Бухгалтерская отчётность ФНС
|
||
|
||
### 6.3. Связь с организациями
|
||
|
||
```python
|
||
registry_organization = models.ForeignKey(
|
||
"registers.Organization",
|
||
on_delete=models.SET_NULL,
|
||
null=True,
|
||
blank=True,
|
||
related_name="procurement_records"
|
||
)
|
||
```
|
||
|
||
**Алгоритм связывания:**
|
||
|
||
```python
|
||
class RegistryOrganizationResolver:
|
||
@classmethod
|
||
def build_lookup(cls, identifiers):
|
||
"""
|
||
identifiers: [(inn, ogrn), ...]
|
||
|
||
Returns:
|
||
by_pair: {(inn, ogrn): org_id}
|
||
by_inn: {inn: org_id}
|
||
by_ogrn: {ogrn: org_id}
|
||
"""
|
||
# 1. Уникальные значения
|
||
inn_values = {int(inn) for inn, _ in identifiers if inn}
|
||
ogrn_values = {int(ogrn) for _, ogrn in identifiers if ogrn}
|
||
|
||
# 2. Запрос организаций
|
||
organizations = Organization.objects.filter(
|
||
Q(inn__in=inn_values) | Q(ogrn__in=ogrn_values)
|
||
).values("id", "inn", "ogrn")
|
||
|
||
# 3. Построение индексов
|
||
by_pair, by_inn, by_ogrn = {}, {}, {}
|
||
for org in organizations:
|
||
inn = normalize(org["inn"])
|
||
ogrn = normalize(org["ogrn"])
|
||
org_id = org["id"]
|
||
|
||
if inn and ogrn:
|
||
by_pair[(inn, ogrn)] = org_id
|
||
if inn:
|
||
by_inn[inn] = org_id
|
||
if ogrn:
|
||
by_ogrn[ogrn] = org_id
|
||
|
||
return Lookup(by_pair, by_inn, by_ogrn)
|
||
|
||
@classmethod
|
||
def resolve_organization_id(cls, lookup, inn, ogrn):
|
||
"""Приоритет: пара → INN → OGRN"""
|
||
inn_norm = normalize(inn)
|
||
ogrn_norm = normalize(ogrn)
|
||
|
||
if inn_norm and ogrn_norm:
|
||
if (inn_norm, ogrn_norm) in lookup.by_pair:
|
||
return lookup.by_pair[(inn_norm, ogrn_norm)]
|
||
|
||
if inn_norm and inn_norm in lookup.by_inn:
|
||
return lookup.by_inn[inn_norm]
|
||
|
||
if ogrn_norm and ogrn_norm in lookup.by_ogrn:
|
||
return lookup.by_ogrn[ogrn_norm]
|
||
|
||
return None
|
||
```
|
||
|
||
---
|
||
|
||
## 7. API
|
||
|
||
### Endpoints
|
||
|
||
**Base:** `/api/v1/zakupki/`
|
||
|
||
#### GET /api/v1/zakupki/
|
||
|
||
Список закупок.
|
||
|
||
**Параметры:**
|
||
|
||
| Параметр | Тип | Описание |
|
||
|----------|-----|----------|
|
||
| customer_inn | string | Фильтр по ИНН |
|
||
| customer_ogrn | string | Фильтр по ОГРН |
|
||
| purchase_number | string | Фильтр по номеру |
|
||
| law_type | string | 44-FZ / 223-FZ |
|
||
| status | string | Статус |
|
||
| region_code | string | Код региона |
|
||
| data_year | integer | Год |
|
||
| data_month | integer | Месяц |
|
||
| load_batch | integer | Пакет |
|
||
| search | string | Поиск по названию/номеру/заказчику |
|
||
| ordering | string | Сортировка |
|
||
| page, page_size | integer | Пагинация |
|
||
|
||
**Пример:**
|
||
```http
|
||
GET /api/v1/zakupki/?customer_inn=7707083893&law_type=44-FZ&data_year=2025
|
||
Authorization: Bearer <token>
|
||
```
|
||
|
||
**Ответ:**
|
||
```json
|
||
{
|
||
"count": 156,
|
||
"next": ".../api/v1/zakupki/?page=2",
|
||
"results": [
|
||
{
|
||
"id": 12345,
|
||
"purchase_number": "0888200000224000038",
|
||
"purchase_name": "Поставка офисной бумаги",
|
||
"customer_inn": "7707083893",
|
||
"max_price_amount": "1500000.00",
|
||
"publish_date_normalized": "2025-03-15",
|
||
"law_type": "44-FZ",
|
||
"status": "Подача заявок"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
#### GET /api/v1/zakupki/{id}/
|
||
|
||
Детали закупки.
|
||
|
||
### Serializer
|
||
|
||
**Файл:** `src/apps/parsers/serializers.py`
|
||
|
||
```python
|
||
class ProcurementSerializer(serializers.ModelSerializer):
|
||
class Meta:
|
||
model = ProcurementRecord
|
||
fields = [
|
||
"id", "load_batch", "purchase_number", "purchase_name",
|
||
"customer_inn", "customer_kpp", "customer_ogrn", "customer_name",
|
||
"max_price", "max_price_amount", "currency_code",
|
||
"placement_method", "publish_date", "publish_date_normalized",
|
||
"end_date", "end_date_normalized", "status", "law_type",
|
||
"purchase_object_info", "href", "region_code",
|
||
"data_year", "data_month", "registry_organization",
|
||
"created_at", "updated_at"
|
||
]
|
||
read_only_fields = fields
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Фоновые задачи
|
||
|
||
### Celery Tasks
|
||
|
||
**Файл:** `src/apps/parsers/tasks.py`
|
||
|
||
#### parse_procurements
|
||
|
||
Одноразовая загрузка.
|
||
|
||
```python
|
||
@shared_task(bind=True)
|
||
def parse_procurements(
|
||
self,
|
||
region_code: str | None = None,
|
||
year: int | None = None,
|
||
month: int | None = None,
|
||
law_type: str = "44",
|
||
proxies: list[str] | None = None,
|
||
requested_by_id: int | None = None,
|
||
) -> dict:
|
||
"""
|
||
Returns:
|
||
{"batch_id": int, "saved": int, "status": "success"}
|
||
"""
|
||
```
|
||
|
||
**Вызов:**
|
||
```python
|
||
parse_procurements.delay(
|
||
region_code="77",
|
||
year=2025,
|
||
month=3,
|
||
law_type="44"
|
||
)
|
||
```
|
||
|
||
#### sync_procurements
|
||
|
||
Инкрементальная синхронизация.
|
||
|
||
```python
|
||
@shared_task(bind=True)
|
||
def sync_procurements(
|
||
self,
|
||
region_code: str,
|
||
law_type: str = "44",
|
||
proxies: list[str] | None = None,
|
||
) -> dict:
|
||
"""
|
||
Логика:
|
||
1. Проверить последнюю дату в БД
|
||
2. Если нет данных — начать с 01.01.2025
|
||
3. Загружать месяц за месяцем
|
||
4. Остановиться после 2 месяцев без данных
|
||
|
||
Returns:
|
||
{
|
||
"batch_id": int,
|
||
"total_saved": int,
|
||
"results": [{"year": 2025, "month": 3, "fetched": 150, "saved": 145}],
|
||
"status": "success"
|
||
}
|
||
"""
|
||
```
|
||
|
||
**Вызов:**
|
||
```python
|
||
sync_procurements.delay(
|
||
region_code="77",
|
||
law_type="44"
|
||
)
|
||
```
|
||
|
||
### Periodic Tasks (Celery Beat)
|
||
|
||
**Файл:** `src/core/celery.py`
|
||
|
||
```python
|
||
CELERY_BEAT_SCHEDULE = {
|
||
"sync-procurements-daily": {
|
||
"task": "apps.parsers.tasks.sync_procurements",
|
||
"schedule": crontab(hour=2, minute=0), # Ежедневно в 02:00
|
||
"kwargs": {"region_code": "77", "law_type": "44"},
|
||
},
|
||
}
|
||
```
|
||
|
||
### Progress Tracking
|
||
|
||
```python
|
||
# Создание
|
||
job = BackgroundJob.objects.create(
|
||
task_id=task_id,
|
||
task_name="apps.parsers.tasks.parse_procurements",
|
||
status="in_progress"
|
||
)
|
||
|
||
# Прогресс
|
||
job.update_progress(50, "Загрузка за 03/2025...")
|
||
|
||
# Завершение
|
||
job.complete(result={"batch_id": 123, "saved": 150})
|
||
|
||
# Ошибка
|
||
job.fail(error="SOAP API timeout")
|
||
```
|
||
|
||
---
|
||
|
||
## 9. Конфигурация
|
||
|
||
### Переменные окружения
|
||
|
||
**Файл:** `.env.prod.example` / `.env.dev`
|
||
|
||
```bash
|
||
# Токен ЕИС (Госуслуги)
|
||
ZAKUPKI_TOKEN=<token>
|
||
|
||
# Прокси (опционально)
|
||
PARSER_PROXIES=http://user:pass@proxy1:8080,http://user:pass@proxy2:8080
|
||
|
||
# PostgreSQL
|
||
POSTGRES_HOST=localhost
|
||
POSTGRES_PORT=5432
|
||
POSTGRES_DB=mostovik
|
||
POSTGRES_USER=postgres
|
||
POSTGRES_PASSWORD=<password>
|
||
|
||
# Redis (Celery)
|
||
CELERY_BROKER_URL=redis://localhost:6379/0
|
||
CELERY_RESULT_BACKEND=redis://localhost:6379/0
|
||
|
||
# Django
|
||
DJANGO_SETTINGS_MODULE=config.settings.production
|
||
SECRET_KEY=<secret>
|
||
ALLOWED_HOSTS=example.com
|
||
```
|
||
|
||
### Django Settings
|
||
|
||
**Файл:** `src/settings/base.py`
|
||
|
||
```python
|
||
INSTALLED_APPS = [
|
||
# ...
|
||
"apps.parsers",
|
||
"apps.registers",
|
||
]
|
||
|
||
ZAKUPKI_TOKEN = os.getenv("ZAKUPKI_TOKEN", "")
|
||
|
||
PARSER_PROXIES = []
|
||
if parser_proxies := os.getenv("PARSER_PROXIES"):
|
||
PARSER_PROXIES = [p.strip() for p in parser_proxies.split(",")]
|
||
```
|
||
|
||
### Docker Compose
|
||
|
||
**Файл:** `docker-compose.prod.yml`
|
||
|
||
```yaml
|
||
services:
|
||
celery:
|
||
image: registry.example.com/mostovik/celery:latest
|
||
command: celery -A config worker --loglevel=info --concurrency=4
|
||
env_file: .env.prod
|
||
depends_on: [postgres, redis]
|
||
|
||
celery-beat:
|
||
image: registry.example.com/mostovik/celery:latest
|
||
command: celery -A config beat --loglevel=info
|
||
env_file: .env.prod
|
||
depends_on: [postgres, redis]
|
||
```
|
||
|
||
---
|
||
|
||
## 10. Примеры
|
||
|
||
### 10.1. Прямой вызов клиента
|
||
|
||
```python
|
||
from apps.parsers.clients.zakupki import ZakupkiClient
|
||
from django.conf import settings
|
||
|
||
client = ZakupkiClient(
|
||
token=settings.ZAKUPKI_TOKEN,
|
||
proxies=["http://proxy.example.com:8080"]
|
||
)
|
||
|
||
# По региону
|
||
procurements = client.fetch_procurements(
|
||
region_code="77",
|
||
year=2025,
|
||
month=3,
|
||
law_type="44"
|
||
)
|
||
|
||
# По номеру
|
||
procurements = client.fetch_by_reestr_number(
|
||
reestr_number="0888200000224000038",
|
||
law_type="44"
|
||
)
|
||
|
||
# Context manager
|
||
with ZakupkiClient(token=settings.ZAKUPKI_TOKEN) as client:
|
||
procurements = client.fetch_procurements(region_code="77", year=2025)
|
||
```
|
||
|
||
### 10.2. Сервис
|
||
|
||
```python
|
||
from apps.parsers.services import ProcurementService
|
||
|
||
# Поиск по ИНН
|
||
procurements = ProcurementService.find_by_inn("7707083893")
|
||
|
||
# Поиск по номеру
|
||
procurement = ProcurementService.find_by_purchase_number(
|
||
"0888200000224000038"
|
||
).first()
|
||
|
||
# Последний период
|
||
last_year, last_month = ProcurementService.get_last_loaded_period(
|
||
region_code="77",
|
||
law_type="44-FZ"
|
||
)
|
||
print(f"Last loaded: {last_year}/{last_month}")
|
||
```
|
||
|
||
### 10.3. API
|
||
|
||
```bash
|
||
# Все закупки заказчика
|
||
curl -X GET "http://localhost:8000/api/v1/zakupki/?customer_inn=7707083893" \
|
||
-H "Authorization: Bearer <token>"
|
||
|
||
# Фильтрация
|
||
curl -X GET "http://localhost:8000/api/v1/zakupki/?data_year=2025&law_type=44-FZ" \
|
||
-H "Authorization: Bearer <token>"
|
||
|
||
# Поиск
|
||
curl -X GET "http://localhost:8000/api/v1/zakupki/?search=бумага" \
|
||
-H "Authorization: Bearer <token>"
|
||
|
||
# Детали
|
||
curl -X GET "http://localhost:8000/api/v1/zakupki/12345/" \
|
||
-H "Authorization: Bearer <token>"
|
||
```
|
||
|
||
### 10.4. SQL
|
||
|
||
```sql
|
||
-- Закупки заказчика за 2025
|
||
SELECT
|
||
purchase_number,
|
||
purchase_name,
|
||
max_price_amount,
|
||
publish_date_normalized,
|
||
status
|
||
FROM parsers_procurement
|
||
WHERE customer_inn = '7707083893'
|
||
AND data_year = 2025
|
||
ORDER BY publish_date_normalized DESC;
|
||
|
||
-- Сумма по регионам
|
||
SELECT
|
||
region_code,
|
||
COUNT(*) as count,
|
||
SUM(max_price_amount) as total
|
||
FROM parsers_procurement
|
||
WHERE data_year = 2025
|
||
GROUP BY region_code
|
||
ORDER BY total DESC;
|
||
|
||
-- Последние загрузки
|
||
SELECT
|
||
batch_id,
|
||
created_at,
|
||
records_count,
|
||
status,
|
||
error_message
|
||
FROM parsers_load_log
|
||
WHERE source = 'procurements'
|
||
ORDER BY created_at DESC
|
||
LIMIT 10;
|
||
|
||
-- Статистика по законам
|
||
SELECT
|
||
law_type,
|
||
COUNT(*) as count,
|
||
SUM(max_price_amount) as total,
|
||
AVG(max_price_amount) as avg
|
||
FROM parsers_procurement
|
||
GROUP BY law_type;
|
||
```
|
||
|
||
### 10.5. Мониторинг
|
||
|
||
```python
|
||
from apps.parsers.models import ParserLoadLog, ProcurementRecord
|
||
from django.db.models import Count, Sum
|
||
|
||
# Логи
|
||
logs = ParserLoadLog.objects.filter(
|
||
source=ParserLoadLog.Source.PROCUREMENTS
|
||
).order_by("-created_at")[:10]
|
||
|
||
# Статистика
|
||
stats = ProcurementRecord.objects.values("region_code").annotate(
|
||
count=Count("*"),
|
||
total=Sum("max_price_amount")
|
||
).order_by("-count")
|
||
|
||
# По законам
|
||
by_law = ProcurementRecord.objects.values("law_type").annotate(
|
||
count=Count("*")
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
## Приложения
|
||
|
||
### A. Коды регионов
|
||
|
||
| Код | Регион |
|
||
|-----|--------|
|
||
| 01 | Адыгея |
|
||
| 77 | Москва |
|
||
| 78 | Санкт-Петербург |
|
||
| 99 | Все регионы |
|
||
|
||
### B. Типы документов 44-ФЗ
|
||
|
||
| document_type | Значение |
|
||
|---------------|----------|
|
||
| notification | Электронный аукцион |
|
||
| notification_ok | Открытый конкурс |
|
||
| notification_zk | Запрос котировок |
|
||
|
||
### C. Статусы
|
||
|
||
- Планирование
|
||
- Публикация извещения
|
||
- Подача заявок
|
||
- Рассмотрение заявок
|
||
- Заключение контракта
|
||
- Исполнение
|
||
- Завершено
|
||
- Отменено
|
||
|
||
---
|
||
|
||
**Файл:** `docs/Техническая справка ЕИС Закупки.md`
|
||
**Код:** `src/apps/parsers/`
|
||
**Тесты:** `tests/apps/parsers/`
|