How Much Do LLM APIs Really Cost? I Ran the Numbers for 5 Common Workloads in 2026
Quick summary
Real monthly cost estimates for 5 common LLM workloads: chat app, code assistant, support bot, document Q&A, and batch summarisation. OpenAI, Anthropic, Google, xAI — with a free comparison tool.
Read next
- Deepfakes Are Now Indistinguishable From Real. Here's How Developers Are Fighting Back.
- OpenAI Took the Pentagon Deal Anthropic Was Blacklisted For — Then Agreed to the Same Terms
"How much will we actually pay?" is the question every team asks before committing to an LLM API. Published price-per-token tables are accurate but hard to translate into real monthly bills. This piece runs the numbers for five common workloads — chat app, code assistant, support bot, document Q&A, and batch summarisation — and gives you rough monthly cost bands for OpenAI, Anthropic, Google, and xAI in 2026. No fluff; the goal is to help developers and founders in the US, UK, Europe, India, and Australia budget realistically.
The Five Workloads (and Assumptions)
1. Consumer chat app (light use)
~50,000 input + 20,000 output tokens per user per month. Assumes a small B2C product with a few thousand active users. Mix of short turns and occasional long threads.
2. Code assistant / dev tool
~200,000 input + 80,000 output tokens per developer per month. Assumes daily use for completions, explanations, and refactors. Heavy on code context.
3. Customer support bot
~500,000 input + 150,000 output tokens per month per agent. Assumes the bot handles a meaningful share of tier-1 support; multi-turn conversations and knowledge-base retrieval.
4. Document Q&A / RAG
~1M input + 200,000 output tokens per month. Assumes internal docs or help-center RAG; repeated retrieval and medium-length answers.
5. Batch summarisation
~2M input + 400,000 output tokens per month. Assumes nightly or weekly jobs over reports, emails, or logs. Output-heavy.
Exact numbers depend on model choice (e.g. GPT-4o vs GPT-4o mini, Claude 3.5 Sonnet vs Haiku). The ranges below use typical "mid-tier" models where most teams land.
Workload 1: Consumer Chat App (Light)
Rough scale: 50K in / 20K out per user/month.
OpenAI (GPT-4o): ~$0.25–0.40 per user/month.
Anthropic (Claude 3.5 Sonnet): ~$0.20–0.35.
Google (Gemini 1.5 Pro): ~$0.15–0.30.
xAI (Grok): ~$0.01–0.02 (Grok is orders of magnitude cheaper per token).
At 5,000 users, you are in the $1,250–2,000/month range for OpenAI/Anthropic/Google, or well under $100 for xAI at similar usage. Switching to "mini" or "Haiku" tiers can cut these by 50–70%.
Workload 2: Code Assistant
Rough scale: 200K in / 80K out per dev/month.
OpenAI: ~$1.50–2.50 per dev/month.
Anthropic: ~$1.20–2.00.
Google: ~$1.00–1.80.
xAI: ~$0.05–0.10.
For a team of 20 developers, that is roughly $30–50/month (OpenAI/Anthropic/Google) or a few dollars for xAI. Code-assistant workloads are often among the most predictable; many teams lock in a single provider and optimise later with caching and model tiers.
Workload 3: Customer Support Bot
Rough scale: 500K in / 150K out per "agent"/month.
OpenAI: ~$4–7 per agent/month.
Anthropic: ~$3.50–6.
Google: ~$3–5.
xAI: ~$0.15–0.25.
At 10 equivalent agents, you are looking at $40–70/month for the big three, or about $1.50–2.50 for xAI. Support bots often need strong instruction-following and safety; Claude and GPT-4o are common choices even when cost is higher.
Workload 4: Document Q&A / RAG
Rough scale: 1M in / 200K out/month.
OpenAI: ~$8–14/month.
Anthropic: ~$6–12.
Google: ~$5–10.
xAI: ~$0.30–0.50.
RAG workloads are input-heavy (retrieval context). Long-context models (Claude, Gemini) can reduce round-trips; xAI and smaller tiers keep cost low if quality is acceptable.
Workload 5: Batch Summarisation
Rough scale: 2M in / 400K out/month.
OpenAI: ~$16–28/month.
Anthropic: ~$12–22.
Google: ~$10–18.
xAI: ~$0.60–1.00.
Batch jobs are where per-token price matters most. Many teams use the cheapest capable model (e.g. Haiku, Gemini Flash, or Grok) for summarisation and reserve premium models for user-facing features.
How to Use These Numbers
Treat these as order-of-magnitude estimates. Your mix of models, caching, and prompt length will shift the numbers. The point is to get a feel for which workload dominates your bill and which provider is in the right ballpark for your region and quality bar.
For a quick side-by-side of 2026 token pricing across providers, use the free LLM API Pricing Tracker on this site. For a deeper dive into when to choose which model, see OpenAI vs Anthropic vs Google vs xAI API Pricing 2026.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
More on AI
All posts →Deepfakes Are Now Indistinguishable From Real. Here's How Developers Are Fighting Back.
AI-generated synthetic media — deepfakes, voice clones, face swaps — have reached a point where human detection is effectively impossible. This is how the detection technology actually works, what platforms are building, and what developers need to understand about synthetic media in 2026.
OpenAI Took the Pentagon Deal Anthropic Was Blacklisted For — Then Agreed to the Same Terms
Hours after the Trump administration blacklisted Anthropic as a national security supply chain risk, OpenAI signed a Pentagon deal for classified AI deployment — and agreed to the exact same safety red lines Anthropic had been punished for demanding. Here's the full story and what it means for AI developers.
NVIDIA GTC 2026: What Jensen Huang Will Announce on March 17 — Blackwell Ultra, AI Factories, and the Next GPU Era
NVIDIA GTC 2026 keynote is March 17. Here is what developers, ML engineers, and AI teams should expect: Blackwell Ultra specs, NIM microservices, AI factory announcements, and the roadmap beyond Blackwell to Rubin.
GPT-4o vs Claude 3.5 vs Grok 3 vs Gemini 2.0: The Only AI Model Comparison Developers Need in 2026
A real comparison of GPT-4o, Claude 3.5 Sonnet, Grok 3, and Gemini 2.0 Flash for developers in 2026 — covering coding, reasoning, cost, context window, speed, and when to use each model. With live pricing data.
Free Tool
What should your project cost?
Get honest 2026 price ranges for any project type — website, SaaS, MVP, or e-commerce. No fluff.
Try the Website Cost Calculator →Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 824+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 164 countries.
