For the current Bedrock on-demand pricing (eu-west-3, Apr 2026):
| Component | Call | Tokens (typ.) | $/PO |
|---|---|---|---|
| Stage 0 (Client ID) | Haiku 4.5 × 1 | 1k in / 0.1k out | $0.0013 |
| Stage 1 (Doc Analyzer) | Haiku 4.5 × 1 | 2k in / 0.3k out | $0.0031 |
| Stage 2 (Extractor) | Sonnet 4.6 × 1–3 | 4k in / 2k out | $0.042 |
| Stage 3 (Rules) | local | — | $0 |
| Stage 4 (Snake × 9.4k-label basis) | local | — | $0 |
| Stage 5 (Validation) | Haiku 4.5 × 1 | 3k in / 0.5k out | $0.0044 |
| Stage 6 (Router) | local | — | $0 |
| Total LLM | $0.051 |
A typical Ariba PO (Verizon, Satair) with 2–6 line items costs $0.05 in LLM fees. Multi-page POs with 20+ line items over 4–5 pages stay < $0.15.
Stage 4 (Snake) is free (< 10 ms/line, in-process) but it resolves the heart of the problem: mapping heterogeneous manufacturer part IDs to Saft's 9,425-article master data. Without it, every PO is a manual lookup.
| Manual (today) | Monce pipeline | |
|---|---|---|
| Avg time per PO | 8–15 min (data entry + SKU lookup) | 15 s automated + 30 s spot check |
| SKU match rate | ≈ 100% (operator does it) | 85–95% on seen families, audited |
| Error rate (wrong SKU entered) | ~2% (typos, wrong family) | < 0.2% (Snake + fuzzy cross-check) |
| Share routed to auto-approve | 0% | ~70% target once stable |
Assume a fully-loaded cost of $45/h for a Valdosta planning clerk. Manual PO ingestion averages ~12 min → $9.00 per PO. The pipeline runs at $0.05.
At 30 POs/day for Valdosta (Verizon + Satair + direct customers), that's $268/day in recovered operator time, or ~$67k/year gross, before fixed infrastructure cost.
| Resource | Monthly |
|---|---|
| EC2 t3.medium (eu-west-3) | $30 |
| EBS 20 GB gp3 | $2 |
| Route53 + CloudWatch + data out | $4 |
| Total | $36/month |
10 POs/day → $15/mo LLM + $36 infra = $51/mo 30 POs/day → $46/mo LLM + $36 infra = $82/mo 100 POs/day → $153/mo LLM + $36 infra = $189/mo
t3.medium handles 3 concurrent workers — ~20 POs/minute sustained on a typical Ariba/Satair layout. The bottleneck is Bedrock throughput, not CPU. Snake scales linearly in |M|: the 9,425-label model trains in ~4 min on bucket=250, L=15, and predicts in < 10 ms/query.
The model_mode parameter on /extract trades cost for accuracy:
| mode | stage 0/1/5 | stage 2 | $/PO |
|---|---|---|---|
| cheap | Haiku | Haiku | $0.01 |
| balanced (default) | Haiku | Sonnet | $0.05 |
| accurate | Sonnet | Sonnet | $0.12 |
Cheapest first. Escalate only when Snake's top-1 confidence is below θauto.