Skip to main content

Choosing an LLM for cost in 2026: a practical buyer's guide

By Itorzo Editorial · May 4, 2026 · 8 min read

Stacked translucent glass coins balanced on a brass scale against a small glowing AI chip

The headline price-per-million-tokens chart is almost always wrong about your actual bill. Here's how we model LLM cost at Itorzo Digital when picking models for tools like LLMCalculator.net.

Indicative 2026 pricing (per 1M tokens)

ModelInputOutputBest for
OpenAI GPT-5$2.50$10.00Frontier reasoning
OpenAI GPT-4o mini$0.15$0.60High-volume chat
Claude Sonnet$3.00$15.00Long-form writing, code
Claude Haiku$0.25$1.25Fast cheap reasoning
Gemini 3.1 Pro$1.25$5.00Multimodal, long context
Gemini 3.5 Flash$0.10$0.40Massive throughput
Groq Llama 3.1 70B$0.59$0.79Latency-critical apps

Indicative published rates as of May 2026 in USD per million tokens. Volume tiers, batch discounts and prompt caching change these numbers materially — see below.

The four levers that actually move your bill

  1. Output-to-input ratio. If your average response is twice the prompt, output pricing dominates. Re-architect prompts to keep responses short before chasing a cheaper model.
  2. Prompt caching. Anthropic and OpenAI both discount cached input tokens by 75–90%. Worth a one-day refactor for any system prompt over 2k tokens.
  3. Batch API. 50% off if you can tolerate 24-hour turnaround. Perfect for backfills, evaluations, embeddings.
  4. Model selection per task. Route the 80% of easy requests to a Haiku/Flash-class model, keep the frontier model for the 20% that need it. Saves more money than any single price negotiation.

A simple decision rule

Start with the cheapest credible model in the table above. Run your eval set. Only move up the price ladder when a specific task fails the eval. Most teams overpay by 5–10x because they default to the flagship model and never come back down.

Tools we use

For day-to-day pricing checks we built and use LLMCalculator.net — free, no signup, prices refreshed weekly. Plug in expected monthly tokens and it'll give you the side-by-side bill across every major provider.