Cloud AI APIs vs. self-hosting: the 12-month cost math, done properly

This is the question we get asked more than any other, usually phrased as a statement: "self-hosting is way cheaper, right?" Sometimes it's the reverse — "surely the cloud is cheaper than buying hardware." The truthful answer is that both can be correct, and which one is correct for you comes down to a handful of variables most comparisons quietly ignore. Let's not ignore them.

The naive comparison (and why it misleads)

The version you see online: "A cloud API costs $X per million tokens. A GPU costs $Y to buy. Divide and self-hosting wins after Z months." This is wrong in both directions at once, because it omits the three things that actually decide it.

Variable 1: Utilization

This is the big one. A cloud API charges you only when you use it. Owned hardware charges you whether you use it or not — the moment you buy it, the meter runs on depreciation and power regardless of whether a single query goes through.

So the real question isn't "cost per token," it's "how busy will the hardware be?" A GPU running flat out 12 hours a day has a wonderful cost per token. The same GPU answering 200 queries a day, sitting idle the rest of the time, has a terrible one — you've bought a sports car to drive to the end of the street.

The break-even is a utilization line, not a date. Below some level of steady usage, the cloud wins on pure cost because you're not paying for idle time. Above it, owned hardware wins, and the gap widens the busier you get.

Variable 2: The cost of running the box

A bought GPU is not a finished cost. Someone has to host it, patch it, monitor it, and be there when it falls over at the worst possible moment. That's real money — either a person's time or a managed service — and it's the line item that turns a "cheaper" self-hosting spreadsheet into a more expensive reality.

This doesn't kill the self-hosting case. It just has to be in the case. When we model this for clients, the operating cost goes in the spreadsheet next to the hardware, because pretending it's zero is how people end up surprised. (It's also, candidly, what our managed retainers cover — running the box so you don't have to think about it.)

Variable 3: Whether your data is even allowed to leave

Here's where the whole cost framing can collapse, and it's the variable the online comparisons never mention: for some businesses, the cloud option doesn't exist at all.

If you're a law firm with privileged documents, a clinic with patient records, or a business with IP you can't expose, the question isn't "is the cloud cheaper?" The cloud is off the table. Self-hosting isn't the cost-optimal choice in that situation — it's the only choice, and the cost analysis is purely between different ways of doing it privately. We wrote more about that side in our Private & Secure AI work.

For everyone else, this variable is a tilt rather than a veto: even where the cloud is allowed, a strong preference to keep data in-house pushes the decision toward self-hosting before the pure-cost math does.

Putting it together: a worked logic

Forget exact figures for a second — here's the decision logic we actually apply:

Is your data allowed to leave? If no, you're self-hosting; skip to "how." If yes, continue.
Is your usage steady and high, or low and bursty? Bursty and low → the cloud's pay-per-use almost always wins. Steady and high → owned hardware likely wins, if you account for operating cost.
Do you need frontier-model quality? If the task genuinely requires the best hosted models, self-hosting a smaller open model to save money solves the wrong problem.
Do you have someone to run it? If not, either budget for that (a person or a managed service) or stay on the cloud.

Run those four and the answer is usually obvious — and it's frequently a mix: cloud APIs for the bursty, quality-critical, non-sensitive work, and a self-hosted setup for the steady or sensitive workloads. The right answer is often "both, for different jobs."

The honest conclusion

Anyone who answers "cloud vs. self-hosting" without asking about your utilization, your data sensitivity, and who's going to run the thing is selling you their default, not your answer. The math is real, but it's your math, and it turns on variables specific to you.

If you'd like us to run it on your actual numbers, that's a core part of an AI Roadmap. Or book a free AI Review and we'll talk you through which way your specific situation leans, no spreadsheet required.