Choosing an LLM for cost in 2026: a practical buyer's guide
By Itorzo Editorial · May 4, 2026 · 8 min read

The headline price-per-million-tokens chart is almost always wrong about your actual bill. Here's how we model LLM cost at Itorzo Digital when picking models for tools like LLMCalculator.net.
Indicative 2026 pricing (per 1M tokens)
| Model | Input | Output | Best for |
|---|---|---|---|
| OpenAI GPT-5 | $2.50 | $10.00 | Frontier reasoning |
| OpenAI GPT-4o mini | $0.15 | $0.60 | High-volume chat |
| Claude Sonnet | $3.00 | $15.00 | Long-form writing, code |
| Claude Haiku | $0.25 | $1.25 | Fast cheap reasoning |
| Gemini 3.1 Pro | $1.25 | $5.00 | Multimodal, long context |
| Gemini 3.5 Flash | $0.10 | $0.40 | Massive throughput |
| Groq Llama 3.1 70B | $0.59 | $0.79 | Latency-critical apps |
Indicative published rates as of May 2026 in USD per million tokens. Volume tiers, batch discounts and prompt caching change these numbers materially — see below.
The four levers that actually move your bill
- Output-to-input ratio. If your average response is twice the prompt, output pricing dominates. Re-architect prompts to keep responses short before chasing a cheaper model.
- Prompt caching. Anthropic and OpenAI both discount cached input tokens by 75–90%. Worth a one-day refactor for any system prompt over 2k tokens.
- Batch API. 50% off if you can tolerate 24-hour turnaround. Perfect for backfills, evaluations, embeddings.
- Model selection per task. Route the 80% of easy requests to a Haiku/Flash-class model, keep the frontier model for the 20% that need it. Saves more money than any single price negotiation.
A simple decision rule
Start with the cheapest credible model in the table above. Run your eval set. Only move up the price ladder when a specific task fails the eval. Most teams overpay by 5–10x because they default to the flagship model and never come back down.
Tools we use
For day-to-day pricing checks we built and use LLMCalculator.net — free, no signup, prices refreshed weekly. Plug in expected monthly tokens and it'll give you the side-by-side bill across every major provider.
