Cloud AI APIs vs. self-hosting: the 12-month cost math, done properly
"Self-hosting is cheaper" and "the cloud is cheaper" are both wrong as stated. Here's the actual break-even analysis, with the variables that decide it for your situation.
Writing
No thought-leadership. No predictions about the future of work. Just what we're building, what it costs, what works, and what broke — written by the people doing the building.
"Self-hosting is cheaper" and "the cloud is cheaper" are both wrong as stated. Here's the actual break-even analysis, with the variables that decide it for your situation.
The honest version. What we scoped, what we got wrong, the retrieval problem that ate two days, and what we'd do differently next time.
We put Llama and Qwen on three hardware setups — a £600 mini-PC, a single used GPU, and a rented cloud instance — and measured throughput, latency, and cost per million tokens. Here's the real math, including where self-hosting stops making sense.
We've had the same conversation with enough law and accounting firms to know where the money is — and where it's quietly wasted. A prioritized list, with the reasoning.
Shadow AI isn't a future risk; it's happening in your office right now. A calm, practical playbook — the five-step version you can start today, before you spend a rupee or a pound on tooling.
Prefer to talk it through?
No retainer required to start. No obligation after the review.