How Much Do LLM APIs Really Cost? I Ran the Numbers for 5 Common Workloads in 2026
Quick summary
Real monthly cost estimates for 5 common LLM workloads: chat app, code assistant, support bot, document Q&A, and batch summarisation. OpenAI, Anthropic, Google, xAI — with a free comparison tool.
"How much will we actually pay?" is the question every team asks before committing to an LLM API. Published price-per-token tables are accurate but hard to translate into real monthly bills. This piece runs the numbers for five common workloads — chat app, code assistant, support bot, document Q&A, and batch summarisation — and gives you rough monthly cost bands for OpenAI, Anthropic, Google, and xAI in 2026. No fluff; the goal is to help developers and founders in the US, UK, Europe, India, and Australia budget realistically.
The Five Workloads (and Assumptions)
1. Consumer chat app (light use)
~50,000 input + 20,000 output tokens per user per month. Assumes a small B2C product with a few thousand active users. Mix of short turns and occasional long threads.
2. Code assistant / dev tool
~200,000 input + 80,000 output tokens per developer per month. Assumes daily use for completions, explanations, and refactors. Heavy on code context.
3. Customer support bot
~500,000 input + 150,000 output tokens per month per agent. Assumes the bot handles a meaningful share of tier-1 support; multi-turn conversations and knowledge-base retrieval.
4. Document Q&A / RAG
~1M input + 200,000 output tokens per month. Assumes internal docs or help-center RAG; repeated retrieval and medium-length answers.
5. Batch summarisation
~2M input + 400,000 output tokens per month. Assumes nightly or weekly jobs over reports, emails, or logs. Output-heavy.
Exact numbers depend on model choice (e.g. GPT-4o vs GPT-4o mini, Claude 3.5 Sonnet vs Haiku). The ranges below use typical "mid-tier" models where most teams land.
Workload 1: Consumer Chat App (Light)
Rough scale: 50K in / 20K out per user/month.
OpenAI (GPT-4o): ~$0.25–0.40 per user/month.
Anthropic (Claude 3.5 Sonnet): ~$0.20–0.35.
Google (Gemini 1.5 Pro): ~$0.15–0.30.
xAI (Grok): ~$0.01–0.02 (Grok is orders of magnitude cheaper per token).
At 5,000 users, you are in the $1,250–2,000/month range for OpenAI/Anthropic/Google, or well under $100 for xAI at similar usage. Switching to "mini" or "Haiku" tiers can cut these by 50–70%.
Workload 2: Code Assistant
Rough scale: 200K in / 80K out per dev/month.
OpenAI: ~$1.50–2.50 per dev/month.
Anthropic: ~$1.20–2.00.
Google: ~$1.00–1.80.
xAI: ~$0.05–0.10.
For a team of 20 developers, that is roughly $30–50/month (OpenAI/Anthropic/Google) or a few dollars for xAI. Code-assistant workloads are often among the most predictable; many teams lock in a single provider and optimise later with caching and model tiers.
Workload 3: Customer Support Bot
Rough scale: 500K in / 150K out per "agent"/month.
OpenAI: ~$4–7 per agent/month.
Anthropic: ~$3.50–6.
Google: ~$3–5.
xAI: ~$0.15–0.25.
At 10 equivalent agents, you are looking at $40–70/month for the big three, or about $1.50–2.50 for xAI. Support bots often need strong instruction-following and safety; Claude and GPT-4o are common choices even when cost is higher.
Workload 4: Document Q&A / RAG
Rough scale: 1M in / 200K out/month.
OpenAI: ~$8–14/month.
Anthropic: ~$6–12.
Google: ~$5–10.
xAI: ~$0.30–0.50.
RAG workloads are input-heavy (retrieval context). Long-context models (Claude, Gemini) can reduce round-trips; xAI and smaller tiers keep cost low if quality is acceptable.
Workload 5: Batch Summarisation
Rough scale: 2M in / 400K out/month.
OpenAI: ~$16–28/month.
Anthropic: ~$12–22.
Google: ~$10–18.
xAI: ~$0.60–1.00.
Batch jobs are where per-token price matters most. Many teams use the cheapest capable model (e.g. Haiku, Gemini Flash, or Grok) for summarisation and reserve premium models for user-facing features.
How to Use These Numbers
Treat these as order-of-magnitude estimates. Your mix of models, caching, and prompt length will shift the numbers. The point is to get a feel for which workload dominates your bill and which provider is in the right ballpark for your region and quality bar.
For a quick side-by-side of 2026 token pricing across providers, use the free LLM API Pricing Tracker on this site. For a deeper dive into when to choose which model, see OpenAI vs Anthropic vs Google vs xAI API Pricing 2026.
Free Tool
What should your project cost?
Get honest 2026 price ranges for any project type — website, SaaS, MVP, or e-commerce. No fluff.
Try the Website Cost Calculator →Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Abhishek Gautam
Full Stack Developer & Software Engineer based in Delhi, India. Building web applications and SaaS products with React, Next.js, Node.js, and TypeScript. 8+ projects deployed across 7+ countries.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
You might also like
Iran's Internet Collapsed to 4% of Normal. Here's the Technical Breakdown.
On February 28, 2026, Israel and the US conducted the largest coordinated cyberattack on a nation's internet in history. Iran's traffic dropped to 4% of normal. Here's how it was done, what infrastructure was targeted, and what developers need to understand about nation-state cyberattacks.
10 min read
Iranian Hackers Are Targeting Developers in 2026. Here's the Threat Intel Guide.
Cotton Sandstorm, Charming Kitten, Peach Sandstorm — Iranian APT groups are actively deploying WezRat malware via fake software updates and running credential theft campaigns against developers and researchers. Here's what's actually happening and how to protect yourself.
9 min read
97% of the Internet Travels Through Undersea Cables. The Middle East Conflict Just Put Them at Risk.
The Persian Gulf's undersea cables connect Europe, Asia, and the Middle East. The Strait of Hormuz — the world's most important maritime chokepoint — sits directly above several of them. Here's what developers and infrastructure teams need to understand about the internet's most fragile physical layer.
8 min read