Choosing an AI model tier — the honest cost math for businesses

Every model launch produces the same question from clients: "should we be using the new one?" It's a fair question with an unfashionable answer: use the cheapest model that clears your quality bar, and prove where that bar is with your own data. Here's the reasoning, with real numbers.

The price spread is now 10×

As of mid-2026, the main Claude tiers price out (per million tokens, USD, input / output) at roughly: budget tier $1 / $5, workhorse tier $3 / $15, hard-problem tier $5 / $25, frontier tier $10 / $50. Other providers have a similar shape. Same API, same integration work — a 10× spread on the metered cost depending on which tier you point at.

Worked example: a document-extraction pipeline

Take a workload we build often: extracting structured fields from inbound documents — invoices, KYC forms, contracts. Assume a mid-market volume of 5,000 documents a month, each around 2,500 tokens in and 400 tokens out. (Your numbers will differ; the shape of the conclusion won't.)

Monthly token bill, by tier:

Budget tier: 5,000 × (2,500 × $1 + 400 × $5) / 1M ≈ $22.50/month
Workhorse tier: 5,000 × (2,500 × $3 + 400 × $15) / 1M ≈ $67.50/month
Frontier tier: 5,000 × (2,500 × $10 + 400 × $50) / 1M ≈ $225/month

Two things jump out, and they pull in opposite directions.

First: at this volume, even the "expensive" option is cheap. $225 a month is not a business problem. If the frontier tier genuinely extracted your documents more accurately, the premium would be trivially worth it. The people telling you to obsess over per-token price at SMB volumes are optimizing the wrong line.

Second: the premium usually buys you nothing here. Field extraction from an invoice is not a frontier-reasoning problem. In our experience, a well-prompted budget-tier model with a validation layer clears the quality bar on this class of task, and the frontier model's extra capability has nothing to push against. You'd pay 10× for the same JSON.

Both things are true at once. Which is why the tier decision should never be made on price or on capability headlines — it should be made on a threshold test.

The capability-threshold rule

For any given workload there's a quality bar: the accuracy, consistency, or judgment below which the output isn't usable. The rule is simply:

Define the bar concretely. "95% of invoices extracted with zero field errors" is a bar. "Good results" is not.
Test the cheapest tier against it using a few dozen real examples from your own documents — not the vendor's demo set.
Move up a tier only when the current one measurably fails. And when a cheaper tier passes, move down — model launches also make last year's capability cheaper, which is the quieter half of every announcement.

Where do the expensive tiers earn their keep? Genuinely hard judgment calls, long multi-step tasks, work where a subtle error is costly, and agentic jobs that run unattended. If your workload is one of those, pay up gladly. Most business workloads aren't.

The uncomfortable truth: tokens were never your real cost

One more piece of honesty. In the projects we've shipped, the metered model bill is almost never the number that matters. What actually costs money:

Engineering — integration, error handling, the validation layer, monitoring. This dwarfs the token bill for most SMB deployments, whoever you hire (including us).
Errors — one mis-extracted amount that reaches your accounting system costs more than a year of the tier premium.
The workflow around the model — if humans still re-check every output because nobody measured accuracy, you've bought a very fast typist, not automation.

So when someone pitches you savings by switching model tiers, or urgency because a new frontier model shipped, ask for the threshold test: which of our workloads fails on the current tier, and what's the evidence? If there's no answer, there's no decision to make.

This mapping — your actual workloads against the cheapest tier that clears each bar — is one of the concrete things you walk away with from our free AI Review, whether or not we ever build anything together. And if the reason you're stuck on this question is that your documents can't go to any cloud tier, that's a different conversation — the Private & Secure AI one.

The price spread is now 10×

Worked example: a document-extraction pipeline

The capability-threshold rule

The uncomfortable truth: tokens were never your real cost

Theteardowns,thebuildlogs,thehonestmath.