AI Models Developer Tools Open Source AI Industry

NVIDIA Nemotron 3 Super: 60% SWE-bench, Best Open Model for Code

Abhishek GautamMarch 12, 20267 min read

NVIDIA Nemotron 3 Super: 60% SWE-bench, Best Open Model for Code

Quick summary

NVIDIA Nemotron 3 Super hits 60.47% on SWE-bench — highest open-weight score ever. 120B total, 12B active, 1M context, 5x throughput vs GPT-OSS. Already in CodeRabbit and Greptile.

The Architecture: Why Mamba-Transformer Hybrid Is Different

Most large language models use transformer attention, which has quadratic computational complexity relative to sequence length. Longer context = dramatically higher cost. This is why 1M-token context windows are expensive to run even when the model technically supports them.

Nemotron 3 Super uses a hybrid architecture: interleaved Mamba-2 layers, Mixture-of-Experts (MoE) layers, and select transformer attention layers. The Mamba-2 backbone uses linear-time sequence processing — compute cost scales linearly with context length, not quadratically.

The result: the model can process a 1M-token context window at a cost that doesn't explode the way pure transformer attention would. For developers trying to run codebase-wide analysis, this is the practical difference between "can do this with a single H100" and "needs a multi-GPU cluster."

The Scale Numbers and What They Mean

Nemotron 3 Super has 120 billion total parameters and 12 billion active parameters. The gap between those two numbers is the MoE architecture at work. In each forward pass, only 12B of the 120B parameters activate — the router selects which expert sub-networks handle each token. You get near-120B model quality at roughly 12B inference compute cost.

For throughput comparison:

5x higher throughput than GPT-OSS-120B (a comparable-scale open-weight model)
7.5x higher throughput than Qwen3.5-122B
2-3x wall-clock speedup on structured generation like code and tool calls, via built-in speculative decoding

These are not marginal differences. A 5x throughput advantage means you can run 5 parallel coding agents for the same hardware budget that would run one GPT-OSS-120B agent. For agentic coding workflows where you want multiple parallel code-review or debugging passes, this compounds.

SWE-bench 60.47%: What the Number Actually Means

SWE-bench Verified is one of the most rigorous benchmarks for AI coding capability. It presents real GitHub issues from popular Python repositories — the same issues real contributors resolved — and asks the model to produce a patch that passes the test suite.

There's no memorization shortcut available. The issues are from real production codebases. The test suite validates whether the patch actually fixes the problem, not just whether it looks correct.

60.47% means Nemotron 3 Super resolves more than 3 in every 5 real GitHub issues autonomously. Among open-weight models, the previous best was in the high 50s. Among all models including closed frontier systems, 60.47% sits meaningfully below Claude Opus 4.6 (80.8%) and GPT-5.4 (~75%), but it's no longer in a different category from them.

The practical implication: for code review automation, bug triage, and greenfield feature implementation in constrained contexts, Nemotron 3 Super is capable enough that the "good enough" bar for a self-hosted solution has been crossed.

The Inference Stack: How to Access It

NVIDIA has made Nemotron 3 Super available through several routes, from zero-setup cloud inference to full self-hosted deployment.

Managed inference (no setup required):

Perplexity Labs API — callable via standard OpenAI-compatible endpoint
OpenRouter — aggregated access alongside other models
build.nvidia.com — NVIDIA's own NIM (NVIDIA Inference Microservice) endpoint

Self-hosted:

HuggingFace model hub — full weights available for download
NVIDIA NIM container — Docker-compatible deployment with built-in speculative decoding already configured

For most developers evaluating the model, starting with OpenRouter or build.nvidia.com is the fastest path to a working prototype before committing to the infrastructure investment of self-hosting a 120B-parameter model.

Already Integrated: CodeRabbit, Factory, Greptile

Three coding tools have already shipped integrations with Nemotron 3 Super as of March 2026:

CodeRabbit — AI code review tool that comments on pull requests. Nemotron 3 Super is now available as a code review engine alongside Claude Opus 4.6 and GPT-5.4. The throughput advantage means faster PR turnaround at lower cost for high-volume repositories.

Factory — agentic coding platform that implements feature requests end-to-end. Nemotron 3 Super runs as an agent backbone for implementation tasks where users want to self-host the model rather than route through external APIs.

Greptile — codebase Q&A and search tool. Nemotron 3 Super's 1M-token context window is particularly relevant here: Greptile needs to load large code contexts to answer questions about complex codebases, and linear-time sequence processing makes that economical at scale.

The fact that these tools shipped integrations within days of the model release signals that the performance is real and reproducible outside of NVIDIA's own benchmarking environment.

The Caveats Developers Should Know

Training data cutoff. Pre-training data has a cutoff of June 2025. Post-training (instruction tuning) data has a cutoff of February 2026. For code in repositories that evolved significantly after mid-2025, the model may not be aware of new APIs, breaking changes, or community patterns introduced after that date.

English-primary. The model was trained on English and 19 other languages, with 43 programming languages. If your codebase has extensive non-English comments or documentation, performance degrades from the benchmark numbers.

Mamba-2 and attention layer interaction. The hybrid architecture is newer than pure transformers and less battle-tested across diverse deployment configurations. Some inference frameworks don't yet fully optimize for Mamba-2 layers. Benchmark the model on your specific workload before building a production pipeline around benchmark numbers.

Open-weight is not fully open-source. The weights are downloadable, but the training code and full dataset composition are not released. You can run and fine-tune Nemotron 3 Super, but you cannot reproduce the training run.

Key Takeaways

Nemotron 3 Super scores 60.47% on SWE-bench Verified — the highest open-weight result on record
120B total parameters, 12B active via MoE — near-120B quality at 12B inference compute cost
1M-token context window with linear-time processing via Mamba-2 backbone
5x throughput vs GPT-OSS-120B, 7.5x vs Qwen3.5-122B, 2-3x speedup on code generation
Available via Perplexity, OpenRouter, build.nvidia.com, and HuggingFace today
Already integrated into CodeRabbit, Factory, and Greptile
Best use cases: self-hosted code review automation, agentic coding in private codebases, codebase-wide Q&A
Caveats: training cutoff June 2025 (pre-training), no full open-source training code, hybrid Mamba architecture less tested in diverse deployment configs

FAQ

Frequently Asked Questions

What is NVIDIA Nemotron 3 Super and what is its benchmark score?

NVIDIA Nemotron 3 Super is a 120B total parameter, 12B active parameter hybrid Mamba-Transformer MoE model released March 11, 2026. It scores 60.47% on SWE-bench Verified — the highest score any open-weight model has achieved on the benchmark, which tests ability to resolve real GitHub issues. Frontier closed models like Claude Opus 4.6 (80.8%) and GPT-5.4 (~75%) still score higher.

How does Nemotron 3 Super compare to GPT and Qwen on speed?

Nemotron 3 Super achieves 5x higher throughput than GPT-OSS-120B and 7.5x higher throughput than Qwen3.5-122B on equivalent hardware. It also delivers a 2-3x wall-clock speedup on structured generation like code and tool calls through built-in speculative decoding.

Can developers run Nemotron 3 Super themselves?

Yes. The weights are available on HuggingFace and through build.nvidia.com. Managed inference is available via Perplexity Labs, OpenRouter, and build.nvidia.com with no setup required. For self-hosted deployment, NVIDIA provides a Docker-compatible NIM container with speculative decoding pre-configured.

What AI coding tools already use Nemotron 3 Super?

CodeRabbit (AI code review on pull requests), Factory (agentic coding platform), and Greptile (codebase Q&A) all shipped Nemotron 3 Super integrations within days of the March 11, 2026 release. The rapid third-party integration indicates the benchmark performance holds outside of NVIDIA's own test environment.

What are the main limitations of Nemotron 3 Super?

Pre-training data has a June 2025 cutoff, so the model may not know APIs or patterns introduced after mid-2025. Open-weight means the weights are downloadable but the training code is not released, unlike fully open-source models. The hybrid Mamba-Transformer architecture is newer and less tested across diverse deployment configurations than pure transformers.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.