Master-Data-Aware Purchase Order Extraction for Saft Valdosta v0.1

Charles Dana · Monce AI · April 2026

saft.aws.monce.ai · Internal technical note

Abstract

Saft America's Valdosta plant (GA) receives Purchase Orders from a heterogeneous supplier network — Verizon Ariba, Satair (aerospace spares), military primes, European rail, and direct customer layouts. Each PO references manufacturer part IDs (e.g. 80-94890-02, 005787-002, EFT01930_UC_NC) that must be reconciled against the plant's 9,425-article ERP master data (OM - Material). We formalize the matching step as a cascade over a weighted SAT classifier (Snake) and four deterministic fallbacks, and prove that the output is order-independent, auditable, and invariant to noise in the VLM-extracted description.

1. Problem statement

Let a Purchase Order be a set of n line items

L = { ℓ1, ℓ2, …, ℓn }

where each ℓi = (mi, ci, di, qi, ui) with

Stage 4 must return an assignment σ : L → M ∪ {⊥} where M is the Saft Valdosta master data (|M| = 9,425 SKUs). Each SKU is a triple (number, material_id, description), normally number = material_id.

2. The Snake basis

Snake (v5.4.5, cf. Dana 2024) builds a weighted SAT classifier from (text, label) pairs. Three Snake models are trained:

Each model emits a prediction with confidence γ ∈ [0, 1] and a human-readable audit trace.

3. Matching cascade

Given a line ℓ = (m, c, d, q, u), define Σ : L → M ∪ {⊥} as the first-match cascade:

Σ(ℓ) = { k : key(k) = m }                        (exact on mfr PN, O(1))
      { k : key(k) = c }                        (exact on customer PN, O(1))
      { k : norm(key(k)) = norm(m) }             (normalized, strip "-_ ")
      article_matcher.predict(m + " " + d)       if γ ≥ θauto  (Snake, tier 3)
      fuzzy(m + " " + d, M, ≥ θfuzz)           (Levenshtein, tier 4)
      ⊥                                          otherwise

where norm(x) = x.strip("-_ ").upper(), θauto = 0.85, θfuzz = 0.80. The cascade terminates in exactly one of six branches per line.

4. Confidence aggregation

For a PO P, the overall trust score is

T(P) = α · γcust + β · γent + (1 − α − β) · meanii)

with α = β = 1/3. The router auto-approves when T ≥ 0.85.

5. Soundness

Determinism. Snake is a deterministic SAT classifier: identical input yields identical output, independent of training order. The fallback tiers are dictionary lookups and pure string functions.

Auditability. Each matched line carries (method, confidence, audit_trace). For Snake predictions, the audit contains the triggered SAT clauses and the bucket depth, making every decision reconstructible by hand from data/models/*.json.

Noise invariance. The Dana Theorem (2024) guarantees Snake's SAT formula is polynomial in |M| and linear in the number of layers. Adding a new SKU to the basis and retraining preserves all prior correct matches that don't collide on literal tests — in practice collisions are negligible at |M| = 9,425.

6. Complexity

Per-line match: O(|M|) worst case (fuzzy tier), O(1) exact, O(L · b) Snake with L = layers, b = bucket size = 250 (cf. the 10x Method). For |M| = 9,425 and L = 15, a Snake prediction runs in < 10 ms on the production EC2. Pipeline overhead is dominated by the VLM stages 2 and 5.

7. Worked example

Verizon Ariba PO 3002630800, line 10:

m = "80-94890-02"              # Manufacturer Part ID
c = "80-94890-02"              # Customer Part # (identical)
d = "48V TEL.X-PLUS 180 X40 NI-CD BATTERY 172"
q = 2 EA, u = $6,154.97
exact(m)    = ⊥          (not a Saft SKU — customer-side PN)
exact(c)    = ⊥          (same)
normalized  = ⊥
article_matcher.predict("80-94890-02 48V TEL.X-PLUS ...")
            = SKU "1900069889"   γ = 0.92
Σ(ℓ)       = (1900069889, snake, 0.92)
ρ(1900069889) = "48V TELX-PLUS 180 NICD BATT"   (from OM-Material)

Router: auto-approve if T(P) ≥ 0.85 across all 2 lines + customer (Verizon) + entity (Saft Valdosta).

8. Limits & open problems