Meta Muse Spark: First MSL Model, Closed Source, Benchmark Results

Abhishek GautamApril 8, 20268 min read

Meta Muse Spark: First MSL Model, Closed Source, Benchmark Results

Quick summary

Meta launched Muse Spark on April 8 — first model from Superintelligence Labs under Alexandr Wang. Free, closed source, beats GPT-5.4 on health and science. Trails on coding.

What Meta Superintelligence Labs Actually Is

Meta Superintelligence Labs (MSL) is the research division that came out of Meta's $14.3 billion deal to bring in Alexandr Wang as Chief AI Officer. Wang co-founded Scale AI and built one of the most important AI data labelling and evaluation companies in the industry. His bet when joining Meta was that the company's data advantages — billions of daily active users across Instagram, WhatsApp, Facebook, and Messenger — could be turned into a training signal moat that frontier labs without a consumer base cannot replicate.

MSL spent nine months on a ground-up rebuild. They did not fine-tune Llama. They did not iterate on an existing architecture. The Muse series is a new training paradigm internally described as "deliberate and scientific model scaling where each generation validates and builds on the last before going bigger." Muse Spark is the first step — small and fast by design, built to validate the scaling approach before MSL goes to larger parameter counts.

The compute efficiency claim is notable: Meta says Muse Spark reaches the same capability level as Llama 4 Maverick with more than ten times less compute. If that holds under independent evaluation, it suggests the architecture and training data quality improvements are doing meaningful work rather than the model just benefiting from increased scale.

Benchmark Results: Where It Wins and Where It Loses

On the Artificial Analysis Intelligence Index v4.0, Muse Spark scores 52 — placing it fifth overall:

GPT-5.4: 57
Gemini 3.1 Pro: 57
Claude Opus 4.6: 53
Muse Spark: 52

Where Muse Spark leads the field:

Health benchmarks (HealthBench Hard): 42.8, versus GPT-5.4's 40.1. This is the benchmark most relevant to medical and clinical AI use cases — complex clinical reasoning, drug interaction analysis, diagnostic question answering. Meta's health data advantage across its platforms appears to be producing measurable results here.

Scientific reasoning (Humanity's Last Exam): 50.2% in Contemplating mode, versus GPT-5.4 Pro's 43.9%. HLE tests graduate-level expert questions across mathematics, physics, chemistry, biology, and other sciences. A 6+ percentage point lead on this benchmark is significant — it is one of the hardest AI evaluations currently available.

Chart understanding (CharXiv): 86.4 — strong multimodal performance on data visualisation comprehension.

Where Muse Spark trails significantly:

Coding (Terminal-Bench): 59.0 versus GPT-5.4's 75.1. A 16-point gap on coding benchmarks is large. For developers evaluating whether to use Muse Spark for code generation, this number matters. The model is not competitive with GPT-5.4 or Claude on pure coding tasks at launch.

Abstract reasoning (ARC-AGI-2): 42.5 versus GPT-5.4's 76.1. A 33-point gap on the benchmark designed to test novel reasoning rather than pattern matching from training data. This is the most striking underperformance number in Muse Spark's launch results.

Agentic tasks (GDPval-AA): 1,444 ELO versus GPT-5.4's 1,672. Meaningful gap for developers building multi-step agentic workflows.

The honest read: Muse Spark is a specialist that wins on health, science, and charts. It is not a generalist that challenges GPT-5.4 or Claude Opus 4.6 across the board at launch. The "top 5" framing is accurate but hides the uneven benchmark profile.

The Closed-Source Decision: What It Means

Every previous major Meta AI model — Llama 1, 2, 3, 4 — was released as open weights. Muse Spark is not. The weights are not available. The architecture is not documented beyond what Meta chose to share. There is no HuggingFace download.

Meta's official position is that future versions of the Muse series may be open-sourced. The current closed approach is framed as necessary for responsible deployment given capability levels — the same framing OpenAI and Anthropic use. The practical effect is that the developer community that built an ecosystem around Llama now has a Meta flagship model it cannot run, fine-tune, or audit.

This is a significant strategic shift. The Llama ecosystem created enormous goodwill, research adoption, and indirect commercial leverage for Meta. MSL is betting that the competitive advantage of keeping Muse weights proprietary outweighs the ecosystem network effects of open release. That is a bet Anthropic and OpenAI have always made — it is a new bet for Meta.

For enterprises that chose Meta AI infrastructure specifically because of open weights: Muse Spark is not a drop-in for Llama 4 in your deployment. The API is a different integration model with different control, cost, and compliance implications.

Technical Architecture: What We Know

Muse Spark is natively multimodal — voice, text, and image inputs at launch, with text-only output. Visual chain of thought, tool-use, and multi-agent orchestration are built into the base model rather than bolted on. This matters because models that add multimodal capability as an afterthought tend to perform worse at cross-modal reasoning than models where modalities are integrated at pretraining.

The model is currently powering the Meta AI app and website. It will roll out to WhatsApp, Instagram, Facebook, Messenger, and Meta's AI glasses over the coming weeks. That deployment scale — billions of users — is something no other frontier model has. If Meta uses that distribution to collect preference and feedback data at scale, the advantage compounds into the next Muse generation.

A private API preview is open to select partners now. Public API access timeline has not been announced. For developers wanting to evaluate the model, meta.ai is the current access point — completely free with no subscription required.

How to Use Muse Spark Right Now

For health and science tasks: Muse Spark is the current benchmark leader. Clinical reasoning, drug research, scientific paper analysis, medical question answering — try it here first before defaulting to GPT-5.4 or Claude.

For coding: Use Claude Opus 4.6 or GPT-5.4 instead. The 16-point Terminal-Bench gap is too large to ignore for production code generation.

For agentic workflows: Too early to deploy at scale. The GDPval-AA gap and the ARC-AGI-2 underperformance suggest the model has not yet been optimised for multi-step autonomous tasks.

For chart and data analysis: Competitive benchmark scores — worth evaluating alongside Gemini 3.1 Pro for visualisation-heavy workflows.

For cost-sensitive deployments: It is free on meta.ai with no usage limits announced. For high-volume read-only use cases where health or science context is relevant, this is a meaningful cost advantage. Check LLM API pricing once the API launches publicly for the full cost comparison.

The Muse series will get larger. Muse Spark's role is to validate the training approach and the scaling laws before MSL commits to the compute required for the next generation. What comes after Muse Spark — once the architecture is validated and the health/science data advantages are combined with more scale — is the real competitive move Meta is building toward.

Key Takeaways

Muse Spark launched April 8 from Meta Superintelligence Labs (MSL) under Alexandr Wang — Meta's first model since the $14.3B Scale AI deal
Closed source: break from Llama heritage — weights not public; future versions may be open-sourced; no timeline given
Benchmark scores: 52 on AI Analysis Index (5th overall); beats GPT-5.4 on HealthBench Hard (42.8 vs 40.1) and HLE science reasoning (50.2% vs 43.9%); trails badly on coding (59 vs 75.1) and ARC-AGI-2 abstract reasoning (42.5 vs 76.1)
Architecture: natively multimodal (voice, text, image input), visual chain-of-thought, tool-use, multi-agent orchestration built in at base model level
Access: completely free on meta.ai and Meta AI app; private API preview for select partners; public API timeline unannounced
10x compute efficiency claimed versus Llama 4 Maverick at equivalent capability — if independently verified, significant
Deployment: rolling to WhatsApp, Instagram, Facebook, Messenger, AI glasses — billions of users as feedback data flywheel

FAQ

Frequently Asked Questions

What is Meta Muse Spark and when did it launch?

Meta Muse Spark is the first AI model from Meta Superintelligence Labs (MSL), launched on April 8 2026. It was built over nine months from a ground-up rebuild of Meta's AI stack under Chief AI Officer Alexandr Wang. It is natively multimodal, supports tool-use and multi-agent orchestration, and is free to use on meta.ai and the Meta AI app.

Is Meta Muse Spark open source like Llama?

No. Muse Spark is closed source — a major break from Meta's previous Llama models which were all released as open weights. The weights and architecture are not public. Meta said it hopes to open-source future Muse versions but gave no timeline. The current access is via meta.ai (free) or a private API preview for select partners.

How does Meta Muse Spark compare to GPT-5.4 and Claude Opus 4.6?

Muse Spark scores 52 on the Artificial Analysis Intelligence Index v4.0, placing it fifth behind GPT-5.4 (57), Gemini 3.1 Pro (57), and Claude Opus 4.6 (53). It beats GPT-5.4 on HealthBench Hard (42.8 vs 40.1) and Humanity's Last Exam science reasoning (50.2% vs 43.9%), but trails significantly on coding benchmarks (59 vs 75.1) and abstract reasoning ARC-AGI-2 (42.5 vs 76.1).

Should developers use Meta Muse Spark for coding?

Not as a primary coding model. Muse Spark scores 59 on Terminal-Bench versus GPT-5.4's 75.1 — a 16-point gap that is too large to ignore for production code generation. Use Claude Opus 4.6 or GPT-5.4 for coding tasks. Muse Spark is better suited to health, science, and chart analysis use cases where it leads or matches frontier models.

How can I access Meta Muse Spark?

Muse Spark is free on meta.ai and the Meta AI app with no subscription required. It is rolling out to WhatsApp, Instagram, Facebook, Messenger, and Meta AI glasses over coming weeks. A private API preview is available to select partners. Public API access timeline has not been announced.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.