How Much Do LLM APIs Really Cost? I Ran the Numbers for 5 Common Workloads in 2026

Abhishek Gautam··9 min read

Quick summary

Real monthly cost estimates for 5 common LLM workloads: chat app, code assistant, support bot, document Q&A, and batch summarisation. OpenAI, Anthropic, Google, xAI — with a free comparison tool.

"How much will we actually pay?" is the question every team asks before committing to an LLM API. Published price-per-token tables are accurate but hard to translate into real monthly bills. This piece runs the numbers for five common workloads — chat app, code assistant, support bot, document Q&A, and batch summarisation — and gives you rough monthly cost bands for OpenAI, Anthropic, Google, and xAI in 2026. No fluff; the goal is to help developers and founders in the US, UK, Europe, India, and Australia budget realistically.

The Five Workloads (and Assumptions)

1. Consumer chat app (light use)

~50,000 input + 20,000 output tokens per user per month. Assumes a small B2C product with a few thousand active users. Mix of short turns and occasional long threads.

2. Code assistant / dev tool

~200,000 input + 80,000 output tokens per developer per month. Assumes daily use for completions, explanations, and refactors. Heavy on code context.

3. Customer support bot

~500,000 input + 150,000 output tokens per month per agent. Assumes the bot handles a meaningful share of tier-1 support; multi-turn conversations and knowledge-base retrieval.

4. Document Q&A / RAG

~1M input + 200,000 output tokens per month. Assumes internal docs or help-center RAG; repeated retrieval and medium-length answers.

5. Batch summarisation

~2M input + 400,000 output tokens per month. Assumes nightly or weekly jobs over reports, emails, or logs. Output-heavy.

Exact numbers depend on model choice (e.g. GPT-4o vs GPT-4o mini, Claude 3.5 Sonnet vs Haiku). The ranges below use typical "mid-tier" models where most teams land.

Workload 1: Consumer Chat App (Light)

Rough scale: 50K in / 20K out per user/month.

OpenAI (GPT-4o): ~$0.25–0.40 per user/month.

Anthropic (Claude 3.5 Sonnet): ~$0.20–0.35.

Google (Gemini 1.5 Pro): ~$0.15–0.30.

xAI (Grok): ~$0.01–0.02 (Grok is orders of magnitude cheaper per token).

At 5,000 users, you are in the $1,250–2,000/month range for OpenAI/Anthropic/Google, or well under $100 for xAI at similar usage. Switching to "mini" or "Haiku" tiers can cut these by 50–70%.

Workload 2: Code Assistant

Rough scale: 200K in / 80K out per dev/month.

OpenAI: ~$1.50–2.50 per dev/month.

Anthropic: ~$1.20–2.00.

Google: ~$1.00–1.80.

xAI: ~$0.05–0.10.

For a team of 20 developers, that is roughly $30–50/month (OpenAI/Anthropic/Google) or a few dollars for xAI. Code-assistant workloads are often among the most predictable; many teams lock in a single provider and optimise later with caching and model tiers.

Workload 3: Customer Support Bot

Rough scale: 500K in / 150K out per "agent"/month.

OpenAI: ~$4–7 per agent/month.

Anthropic: ~$3.50–6.

Google: ~$3–5.

xAI: ~$0.15–0.25.

At 10 equivalent agents, you are looking at $40–70/month for the big three, or about $1.50–2.50 for xAI. Support bots often need strong instruction-following and safety; Claude and GPT-4o are common choices even when cost is higher.

Workload 4: Document Q&A / RAG

Rough scale: 1M in / 200K out/month.

OpenAI: ~$8–14/month.

Anthropic: ~$6–12.

Google: ~$5–10.

xAI: ~$0.30–0.50.

RAG workloads are input-heavy (retrieval context). Long-context models (Claude, Gemini) can reduce round-trips; xAI and smaller tiers keep cost low if quality is acceptable.

Workload 5: Batch Summarisation

Rough scale: 2M in / 400K out/month.

OpenAI: ~$16–28/month.

Anthropic: ~$12–22.

Google: ~$10–18.

xAI: ~$0.60–1.00.

Batch jobs are where per-token price matters most. Many teams use the cheapest capable model (e.g. Haiku, Gemini Flash, or Grok) for summarisation and reserve premium models for user-facing features.

How to Use These Numbers

Treat these as order-of-magnitude estimates. Your mix of models, caching, and prompt length will shift the numbers. The point is to get a feel for which workload dominates your bill and which provider is in the right ballpark for your region and quality bar.

For a quick side-by-side of 2026 token pricing across providers, use the free LLM API Pricing Tracker on this site. For a deeper dive into when to choose which model, see OpenAI vs Anthropic vs Google vs xAI API Pricing 2026.

Free Tool

What should your project cost?

Get honest 2026 price ranges for any project type — website, SaaS, MVP, or e-commerce. No fluff.

Try the Website Cost Calculator →

Free Tool

Will AI replace your job?

4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.

Check Your AI Risk Score →
ShareX / TwitterLinkedIn

Written by

Abhishek Gautam

Full Stack Developer & Software Engineer based in Delhi, India. Building web applications and SaaS products with React, Next.js, Node.js, and TypeScript. 8+ projects deployed across 7+ countries.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.