Claude.ai Outage Apr 28: API Errors, Login Failures, Dev Mitigation

Abhishek GautamAbhishek Gautam6 min read
Claude.ai Outage Apr 28: API Errors, Login Failures, Dev Mitigation

Quick summary

Anthropic confirmed elevated API errors and Claude.ai login failures on Apr 28, 2026. Timeline, blast radius, and failover steps for teams shipping tonight.

Anthropic confirmed an active incident at 17:41 UTC on April 28, 2026: users could not reach Claude.ai, API error rates spiked, and Claude Code login paths were affected. Ten minutes later, at 17:51 UTC, status moved to Identified with explicit wording: elevated errors on the Anthropic API plus web access issues.

If you are shipping software tonight, this is not a "wait and refresh" event. It is a dependency incident with real blast radius across chat UX, coding agents, and API-driven production workflows.

What Is Confirmed Right Now

The official status page reported two concrete symptoms:

  1. Claude.ai unavailable for some users
  2. Elevated API errors on api.anthropic.com

The same notice explicitly called out Claude Code login path issues. This matters because teams often assume CLI workflows are isolated from web incidents. They are not isolated when auth surfaces are shared.

Independent reporting also showed a sharp user-report spike. That does not replace vendor telemetry, but it confirms this was broad enough to hit public monitoring signals.

Timeline You Should Put in Your Incident Notes

Use this timeline for internal postmortems and customer communication:

  • 17:41 UTC: status page marks incident as Investigating
  • 17:51 UTC: status page updates to Identified with API + Claude.ai details
  • After 17:51 UTC: customers report mixed behavior across surfaces, including login failures and elevated request errors

If your team logs in IST, this starts around 11:11 PM IST. If your deploy window was 11:00 PM-1:00 AM, you are exactly in the highest-risk period.

Real Blast Radius for Developers and Infra Teams

Most teams underestimate second-order effects. The outage is not only "chat is down."

Agent workflows: tool-calling chains fail mid-run when one Claude step errors and your orchestrator treats it as terminal.

CI/CD helpers: any build system using Claude for commit summaries, test triage, or codegen can hard-fail on API retries if backoff is misconfigured.

Customer support bots: queued tickets grow rapidly because fallback copy and routing are often tied to the same LLM provider.

On-call load: pager noise spikes from downstream timeouts, not from "Anthropic down" alerts, because many monitors only watch your API edge.

Immediate Mitigation Playbook (Tonight, Not Tomorrow)

First, cap retries. Infinite exponential backoff looks safe in code review and becomes a traffic amplifier during provider incidents.

Second, switch to graceful degradation mode:

  • Return cached answers where safe
  • Disable non-essential agent steps
  • Show explicit "AI assist delayed" UX instead of generic 500s

Third, route to alternate models if your compliance policy allows it. If you already have a multi-provider abstraction, this is the exact moment it should pay for itself.

Fourth, protect budgets: set temporary token and request ceilings to prevent retry storms from multiplying cost once service recovers.

For pricing-aware fallback decisions, use /tools/llm-api-pricing.

Architecture Lesson: Multi-Region Is Not Multi-Provider

Many teams designed cross-region redundancy and still failed here because the dependency was single-provider. Region failover does nothing when the provider control plane or shared auth path is degraded.

The reliability hierarchy is:

  1. Single model, single provider (highest risk)
  2. Multiple models, single provider
  3. Multiple providers with policy-based routing (best practical resilience for most teams)

If your stack is still level 1, incidents like this will keep becoming midnight release blockers.

What to Communicate to Customers

Use concrete language:

  • "Our AI provider is experiencing elevated errors"
  • "Core product remains available; AI assist is degraded"
  • "We have activated fallback mode and are monitoring recovery"

Avoid vague language like "temporary instability." Customers trust precision under pressure.

Why This Outage Matters Beyond One Night

This is the second quarter in 2026 where teams were reminded that model quality is only half the production story. Reliability and operability are the other half.

Read this alongside Project Glasswing and Mythos operational implications and our cloud risk checklist for force majeure and contract realities. Different incidents, same engineering truth: dependency concentration is the hidden outage multiplier.

Key Takeaways

  • Confirmed incident: Anthropic posted Investigating at 17:41 UTC and Identified at 17:51 UTC for Claude.ai access issues and elevated API errors.
  • Blast radius includes API-driven apps, Claude Code login flows, support automation, and agent chains, not just chat users.
  • Mitigation tonight: cap retries, enable graceful degradation, activate alternate routing, and put temporary cost guardrails in place.
  • Architecture gap: multi-region without multi-provider does not protect against shared provider incidents.
  • Customer trust improves when status updates are factual, timestamped, and explicit about degraded surfaces.

FAQ

Frequently Asked Questions

Is Claude down right now on April 28, 2026?

Yes, Anthropic reported an active incident with elevated API errors and Claude.ai access issues on April 28, 2026. The status page moved from Investigating at 17:41 UTC to Identified at 17:51 UTC, and the incident note also mentioned login path issues affecting Claude Code.

Does this outage affect only claude.ai or also the API?

It affects both surfaces. Anthropic explicitly reported elevated errors on the API and web access problems on Claude.ai in the same incident window. Teams using Claude through direct API integrations should treat this as a production dependency incident, not only a web UI issue.

What should developers do immediately when Claude API errors spike?

Reduce retry amplification, enable graceful degradation, and route to fallback providers if your policy allows it. Also set temporary request and token ceilings so automatic retries do not create cost spikes during partial recovery.

How do we prevent this from breaking releases again?

Implement multi-provider routing with health-based failover and pre-defined degraded modes per feature. Cross-region redundancy alone is insufficient when a shared provider control plane, auth path, or model endpoint degrades globally.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

Written by

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 885+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.