Saft PO Extraction

7-stage VLM pipeline for Saft America Valdosta Purchase Order ingestion. Upload any PO — Ariba, Satair, or direct customer layout — get structured JSON, Snake matches against the 9,425-article master data, and a trust score in seconds.

POST /extract

Unified extraction — upload PDF, PNG, JPG. Auto-converts to PDF, runs the full pipeline. Returns task_id for async polling.

POST /stage_0

Fast sync identification — customer ID (Verizon, Satair, Airbus, Bombardier, military primes…), PO number, layout family. Regex tier (~5ms) with Haiku VLM fallback (~1s).

GET /extract/{id}

Poll extraction result. Returns full structured PO data with matching, validation flags, and routing decision.

Pipeline

0
Client ID — regex + Haiku, customer + PO number + layout family in ~5ms
1
Document Analyzer — classify Ariba / Satair / direct PO layout, skip terms pages
2
Unified Extractor — Sonnet VLM, USD decimals, EA/LB/FT units, dates like "10 Jul 2026"
3
Rules Engine — ISO dates, subtotal arithmetic, schedule-line merging
4
Snake Matcher — batteries / racks / straps / connectors against Saft Valdosta master data
5
Validation — Haiku cross-checks extraction vs original PDF
6
Router — auto-approve / human review based on trust score