OpenAI Spud Day 5: Window Extends to April 30 — What Changed
Quick summary
OpenAI Spud has not launched through April 18 — day 5 of the delay. Polymarket window extended to April 30. Three new theories on why, and what the extended delay means for developers.
Read next
- A BBC Reporter Hacked ChatGPT and Gemini With One Fake Blog PostThomas Germain published a fake article about a made-up hot dog contest and within 24 hours ChatGPT and Google Gemini were citing it as fact. Here is what this means for developers building AI products.
- OpenAI Killed Sora: $15M/Day Burn Rate, $2.1M Revenue, Disney's $1B Deal GoneOpenAI shut down Sora on March 24, 2026 — app, API, and sora.com all gone. The economics: $15M/day inference costs against $2.1M lifetime revenue. Disney's $1B partnership collapsed with it.
OpenAI Spud has not launched. Five days after the first credible leaks on April 13, the model remains unreleased. Polymarket has now moved the primary contract from "launches today" day-by-day bets to a consolidated "launches by April 30" window, currently sitting at approximately 78% probability.
The window extension itself is information. Here is what day 5 tells us that days 1-4 did not.
The 5-Day Threshold Changes the Analysis
On days 1-3 of a pre-release delay, the standard explanations hold: last-mile safety evaluation, news cycle management, API capacity provisioning. These are 24-72 hour holds.
Day 5 is outside that window for all three standard explanations. Safety red-teaming that takes more than 72 hours on a model this close to launch is not standard last-mile review — it is a substantive finding that required significant additional work. API capacity provisioning decisions are made faster than 5 days. News cycle management does not extend a hold this long without a clear alternative timing target.
Day 5 means one of three things that were not apparent earlier.
Theory 1 — Benchmark negotiation: OpenAI's internal evaluations showed Spud does not cleanly beat Claude 3.7 Sonnet on SWE-bench — the metric that has the most developer credibility right now. Rather than launch with benchmarks where Claude still leads, OpenAI is running additional fine-tuning passes targeted specifically at the SWE-bench task distribution. This can take 3-7 days depending on the compute allocation. Day 5 is consistent with this timeline.
Probability: 40%. This is the scenario where Spud launches with meaningfully better coding benchmarks than whatever internal evaluation showed on April 13. It would explain why OpenAI is absorbing the delay cost — a clean SWE-bench win over Claude is worth more than launching 5 days sooner.
Theory 2 — Naming and positioning decision: The question of whether Spud launches as GPT-5, GPT-5.5, GPT-4.5 Pro, or something else entirely is not resolved. The naming decision has downstream implications for pricing tiers, API versioning, enterprise contract terms, and the narrative around GPT-5 being held back. A model that ships as "GPT-5.5" versus "GPT-5" has completely different market positioning and customer expectation management requirements. Executive-level naming disputes can hold launches for days.
Probability: 30%. Less likely to be the primary cause but plausible as a contributing factor.
Theory 3 — Competitive intelligence on Anthropic: Anthropic is expected to release a Claude update in Q2 2026. If OpenAI's competitive intelligence suggests that Anthropic is closer to a release than previously estimated — potentially within 2-3 weeks — launching Spud now means a Claude update lands right into Spud's launch coverage cycle, splitting developer attention. Timing Spud to land after the Anthropic release would give OpenAI the "response" narrative advantage.
Probability: 30%. The competitive timing theory explains the extended hold duration better than any operational reason.
What the April 30 Window Means for Developers
The Polymarket market moved from day-by-day contracts to an April 30 consolidated window. That shift reflects market participants updating away from "imminent launch" expectations to "launch sometime in the next 12 days" expectations.
For developers: the correct response to a 12-day uncertainty window is the same as the correct response to a 1-day window. Do not block production decisions on Spud availability. Build against GPT-4o or Claude 3.7 Sonnet today.
The specific thing developers get wrong in this situation: they wait to evaluate a new model, then feel urgency when it lands, then make integration decisions too fast. The better approach is to have your evaluation framework ready before the launch so you can run your own benchmarks within 48 hours of availability — not waiting for OpenAI's numbers, not waiting for community consensus.
What to prepare now:
- Define the 3-5 tasks where you would most plausibly switch from your current provider to Spud
- Write test cases for those tasks that you can run immediately on API access
- Set a cost threshold — at what price per million tokens does Spud become a default choice vs Claude or GPT-4o?
- Decide in advance whether you care about multimodal features or primarily text/code
Having this framework ready means you make a provider decision in 48 hours after Spud launches, not 2 weeks.
The SWE-bench Question
The coding benchmark race between OpenAI, Anthropic, and Google is now the primary competitive battleground for developer adoption. Claude 3.7 Sonnet currently leads on SWE-bench verified — the most credible measure of real-world software engineering capability.
If Theory 1 is correct and OpenAI is running additional SWE-bench-targeted fine-tuning, the launch announcement will lead with coding capability numbers. Watch specifically for:
- SWE-bench verified score: anything above 55% would beat Claude 3.7 Sonnet's current position
- HumanEval pass@1: less important as a standalone metric but expected to be 95%+
- Aider leaderboard position: the practical coding assistant benchmark that correlates with Cursor/Copilot integration performance
- Latency at standard context lengths: a technically superior model that is 2x slower than GPT-4o has limited production appeal
Do not evaluate Spud on OpenAI's published benchmarks alone. Run your own evals against your specific use case within 72 hours of API access.
The Competitive Landscape While Spud Waits
Every day Spud does not launch, the competitive landscape it is entering changes.
Google's Gemini updates in the April window have sharpened Gemini's position in long-context tasks. Anthropic's Claude 3.7 Sonnet has been accumulating developer trust since its February release, with integrations deepening in Cursor, GitHub Copilot, and enterprise deployments. The developer audience Spud is launching into has formed habits with current models.
This is the real cost of the 5-day delay that the oil-price-drop news cycle obscured. Every additional day is not just a lost first-mover day — it is a day of deeper entrenchment for competitors.
Key Takeaways
- OpenAI Spud still not launched on day 5, April 18 — Polymarket window extended from day-by-day to "launches by April 30" at ~78% probability
- Day 5 eliminates standard 72-hour hold explanations: the delay is now most consistent with benchmark fine-tuning for SWE-bench (40%), naming/positioning dispute (30%), or competitive timing against Anthropic (30%)
- Do not wait for Spud to make production decisions — build against GPT-4o or Claude 3.7 Sonnet today; migration will be API-compatible when Spud lands
- Prepare your evaluation framework now: define the 3-5 tasks you would switch providers for, write test cases, set cost thresholds — run evals within 48 hours of API access, not 2 weeks
- Watch SWE-bench verified score at launch: anything above 55% beats Claude 3.7 Sonnet's current position; that is the number that will move developer adoption
Compare current model capabilities at Claude vs ChatGPT. For the Day 4 delay analysis, read OpenAI Spud Day 4: Still Not Live — Polymarket at 78%. See live API pricing at LLM API Pricing.
FAQ
Frequently Asked Questions
Why has OpenAI Spud still not launched after 5 days?
Day 5 eliminates standard explanations (safety review, news cycle timing, API capacity) which resolve within 72 hours. The three remaining theories: benchmark fine-tuning targeting SWE-bench where Claude 3.7 Sonnet currently leads (40% probability, 3-7 day hold); naming and positioning dispute over whether to release as GPT-5, GPT-5.5, or another designation (30%); competitive timing to avoid launching into an upcoming Anthropic release cycle (30%). Polymarket has shifted from day-by-day contracts to an April 30 consolidated window.
When will OpenAI Spud launch?
Polymarket's consolidated "launches by April 30" contract sits at approximately 78% probability as of April 18. The extended window suggests OpenAI is not targeting an imminent same-day launch but has a specific target within the next 12 days. If the delay is benchmark fine-tuning (most likely single cause at 40%), expect launch in the April 19-22 window. If it is a naming/positioning dispute or competitive timing, the launch could extend to April 25-30.
Should I wait for OpenAI Spud before choosing an AI provider?
No. Build against GPT-4o or Claude 3.7 Sonnet today — both are stable, production-proven, and API-compatible with any future Spud migration. What you should do: prepare your evaluation framework now. Define the 3-5 tasks where you would switch providers, write test cases, set cost-per-token thresholds. When Spud lands, run your own evals within 48 hours rather than waiting 2 weeks for community consensus. Decision speed matters more than waiting for a perfect information state.
What benchmarks should I watch when OpenAI Spud launches?
SWE-bench verified is the most credible developer benchmark — Claude 3.7 Sonnet currently leads; anything above 55% for Spud represents a meaningful improvement. Also watch: HumanEval pass@1 (expect 95%+), Aider leaderboard position (correlates with Cursor/Copilot integration performance), and latency at standard context lengths. Do not rely on OpenAI's published benchmarks alone — run your own evals against your specific tasks within 72 hours of API access.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
More on AI
All posts →A BBC Reporter Hacked ChatGPT and Gemini With One Fake Blog Post
Thomas Germain published a fake article about a made-up hot dog contest and within 24 hours ChatGPT and Google Gemini were citing it as fact. Here is what this means for developers building AI products.
OpenAI Killed Sora: $15M/Day Burn Rate, $2.1M Revenue, Disney's $1B Deal Gone
OpenAI shut down Sora on March 24, 2026 — app, API, and sora.com all gone. The economics: $15M/day inference costs against $2.1M lifetime revenue. Disney's $1B partnership collapsed with it.
Mistral Voxtral TTS: Open-Weight Model Beats ElevenLabs at 90ms Latency
Mistral released Voxtral-4B-TTS on March 26, 2026. 4B parameters, open weights, 90ms time-to-first-audio, 68.4% win rate vs ElevenLabs. At $0.016 per 1,000 chars it changes the TTS pricing floor.
ChatGPT Ads Hit $100M in 6 Weeks: What OpenAI's Ad Platform Means for Developers
OpenAI's ChatGPT advertising platform generated $100M in its first 6 weeks, launching at $50 CPM. Self-serve ad platform opens April 2026. Impact on AI developer ecosystem.
Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 941+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.
