Saft America's Valdosta plant (GA) receives Purchase Orders from a heterogeneous
supplier network — Verizon Ariba, Satair (aerospace spares), military primes,
European rail, and direct customer layouts. Each PO references manufacturer part IDs
(e.g. 80-94890-02, 005787-002, EFT01930_UC_NC)
that must be reconciled against the plant's 9,425-article ERP master data
(OM - Material). We formalize the matching step as a cascade over a
weighted SAT classifier (Snake) and four deterministic fallbacks, and prove
that the output is order-independent, auditable, and invariant to noise in the
VLM-extracted description.
Let a Purchase Order be a set of n line items
where each ℓi = (mi, ci, di, qi, ui) with
Stage 4 must return an assignment σ : L → M ∪ {⊥} where M is the Saft Valdosta master data (|M| = 9,425 SKUs). Each SKU is a triple (number, material_id, description), normally number = material_id.
Snake (v5.4.5, cf. Dana 2024) builds a weighted SAT classifier from (text, label) pairs. Three Snake models are trained:
clients.json. Trained on (name + address + PO-number-prefix, customer_id).Each model emits a prediction with confidence γ ∈ [0, 1] and a human-readable audit trace.
Given a line ℓ = (m, c, d, q, u), define Σ : L → M ∪ {⊥} as the first-match cascade:
Σ(ℓ) = { k : key(k) = m } (exact on mfr PN, O(1))
{ k : key(k) = c } (exact on customer PN, O(1))
{ k : norm(key(k)) = norm(m) } (normalized, strip "-_ ")
article_matcher.predict(m + " " + d) if γ ≥ θauto (Snake, tier 3)
fuzzy(m + " " + d, M, ≥ θfuzz) (Levenshtein, tier 4)
⊥ otherwise
where norm(x) = x.strip("-_ ").upper(), θauto = 0.85, θfuzz = 0.80. The cascade terminates in exactly one of six branches per line.
For a PO P, the overall trust score is
with α = β = 1/3. The router auto-approves when T ≥ 0.85.
Determinism. Snake is a deterministic SAT classifier: identical input yields identical output, independent of training order. The fallback tiers are dictionary lookups and pure string functions.
Auditability. Each matched line carries
(method, confidence, audit_trace). For Snake
predictions, the audit contains the triggered SAT clauses and the bucket depth,
making every decision reconstructible by hand from data/models/*.json.
Noise invariance. The Dana Theorem (2024) guarantees Snake's SAT formula is polynomial in |M| and linear in the number of layers. Adding a new SKU to the basis and retraining preserves all prior correct matches that don't collide on literal tests — in practice collisions are negligible at |M| = 9,425.
Per-line match: O(|M|) worst case (fuzzy tier), O(1) exact, O(L · b) Snake with L = layers, b = bucket size = 250 (cf. the 10x Method). For |M| = 9,425 and L = 15, a Snake prediction runs in < 10 ms on the production EC2. Pipeline overhead is dominated by the VLM stages 2 and 5.
Verizon Ariba PO 3002630800, line 10:
m = "80-94890-02" # Manufacturer Part ID
c = "80-94890-02" # Customer Part # (identical)
d = "48V TEL.X-PLUS 180 X40 NI-CD BATTERY 172"
q = 2 EA, u = $6,154.97
exact(m) = ⊥ (not a Saft SKU — customer-side PN)
exact(c) = ⊥ (same)
normalized = ⊥
article_matcher.predict("80-94890-02 48V TEL.X-PLUS ...")
= SKU "1900069889" γ = 0.92
Σ(ℓ) = (1900069889, snake, 0.92)
ρ(1900069889) = "48V TELX-PLUS 180 NICD BATT" (from OM-Material)
Router: auto-approve if T(P) ≥ 0.85 across all 2 lines + customer (Verizon) + entity (Saft Valdosta).