Kimi K2.7-Code: Moonshot AI Releases 1T Parameter Open-Source Model, Claims to Beat Claude Opus on Tool Use
Quick summary
Chinese AI lab Moonshot AI released Kimi K2.7-Code on June 12 under a Modified MIT licence. One trillion parameters, 32B active via MoE, 256K context window, and benchmark results that claim to exceed Claude Opus on tool-use tasks. Here is what developers need to know.
Read next
- Claude Opus 4.8 Ships With Dynamic Workflows — Same $5/$25 API PriceAnthropic released Claude Opus 4.8 on May 28, 2026: dynamic workflows for Claude Code, effort controls, faster fast mode. API ID claude-opus-4-8 at unchanged $5/$25 per million tokens.
- OpenAI Launches Rosalind Biodefense for GPT-Rosalind Life Sciences AIOpenAI announced Rosalind Biodefense on May 29, 2026 — sponsored GPT-Rosalind access for vetted developers building pandemic detection, screening, and countermeasure tools.
Chinese AI lab Moonshot AI released Kimi K2.7-Code on June 12, 2026 on Hugging Face under a Modified MIT licence. It is a 1-trillion-parameter Mixture-of-Experts coding model with 32 billion active parameters per forward pass, 384 experts total, and a 256K token context window. Benchmark results show it outperforms the previous Kimi K2.6 by 21.8% on Kimi Code Bench v2 and claims tool-use performance above Claude Opus 4.
This is the most capable open-weight coding model any Chinese lab has released to date. It is available for self-hosting via vLLM, SGLang, and Docker. Here is what it actually means for developers.
What Kimi K2.7-Code Is
Kimi K2.7-Code is a coding-specialist variant of the Kimi K2.7 base model, fine-tuned specifically for code generation, debugging, and agentic tool-use tasks. The "K2.7" versioning indicates incremental improvement on Kimi K2.6, which itself was released in April 2026 and became one of the most-downloaded open-weight models in Hugging Face's first quarter.
The key differentiators from K2.6:
- +21.8% on Kimi Code Bench v2 (Moonshot's internal benchmark for code generation quality)
- +31.5% on MLS Bench Lite (multi-language synthesis benchmark covering Python, TypeScript, Rust, Go, Java, C++)
- 30% fewer reasoning tokens per output — meaning lower inference cost for equivalent code quality
- Tool-use benchmark: claims to exceed Claude Opus 4 on a standardised function-calling benchmark (specific score: not yet independently replicated by third parties as of this writing)
The Modified MIT licence means you can use the weights commercially with attribution. The one restriction: you cannot use Kimi K2.7-Code outputs to train a competing model without explicit permission from Moonshot AI. In practice this is a non-constraint for the vast majority of deployment use cases.
Architecture: Why 1T Parameters With Only 32B Active
The Mixture-of-Experts architecture is the reason the numbers seem contradictory. Kimi K2.7-Code has 1 trillion total parameters distributed across 384 expert networks, but only 32 billion parameters are activated for any single forward pass. A routing mechanism selects which experts handle each token.
The practical implications:
Inference cost: Running Kimi K2.7-Code costs approximately the same as running a 32B dense model per request — not a 1T dense model. The compute savings are significant. A 1T dense model like GPT-4o (estimated 1.76T parameters) requires enormous GPU memory. K2.7-Code with 32B active parameters can run on 4x A100 80GB GPUs in 8-bit quantisation.
Memory footprint: Full precision (BF16) requires approximately 2TB of GPU VRAM across a cluster. At 4-bit quantisation (GPTQ or AWQ), the requirement drops to ~500GB — runnable on an 8x H100 node.
Routing quality: The 384-expert architecture is unusually granular. Most production MoE models (Mixtral, DeepSeek-V2) use 8-64 experts. 384 experts means finer-grained specialisation — specific experts handle Python imports, others handle TypeScript generics, others handle SQL optimisation. This is why coding specialist performance is disproportionately strong versus general-purpose benchmarks.
Benchmark Results: The Specific Numbers
| Benchmark | Kimi K2.6 | Kimi K2.7-Code | Claude Opus 4 | GPT-4o |
|---|---|---|---|---|
| Kimi Code Bench v2 | baseline | +21.8% | — | — |
| MLS Bench Lite | baseline | +31.5% | — | — |
| Tool-use (function calling) | — | 68.4 | 65.1 | 61.7 |
| HumanEval | 81.2 | 86.4 | 84.3 | 82.1 |
| MBPP | 78.9 | 83.7 | 81.2 | 79.8 |
| SWE-Bench Verified | 38.4 | 44.1 | 40.2 | 38.9 |
*Moonshot AI self-reported; independent third-party replication pending as of June 14, 2026.*
The tool-use score (68.4 vs Claude Opus 4's 65.1) is the headline claim. Tool use — structured function calling, JSON schema compliance, multi-step agentic tasks — is specifically the capability that matters for production AI agents. If you are building an agent that uses tools (web search, code execution, database queries), this benchmark directly reflects real-world performance.
The SWE-Bench Verified score (44.1) is particularly significant. SWE-Bench Verified tests on real GitHub issues — not synthetic coding problems — and requires understanding repository context, navigating codebases, and generating working patches. 44.1% is competitive with the top tier of closed models and significantly above most open-weight alternatives.
How to Run It: vLLM, SGLang, Docker
Kimi K2.7-Code is available at moonshotai/Kimi-K2.7-Code on Hugging Face and at ModelScope for Chinese infrastructure users.
vLLM (recommended for production):
Install vLLM then run: vllm serve moonshotai/Kimi-K2.7-Code --tensor-parallel-size 8 --max-model-len 32768 --enable-expert-parallel
The --enable-expert-parallel flag is critical for MoE models — without it, you lose the routing efficiency that makes the model economical to run.
SGLang (recommended for agentic workflows):
SGLang's structured generation support makes it better than vLLM for multi-step agentic tasks. Kimi K2.7-Code is natively supported in SGLang 0.4.2+. Install with: pip install sglang[all], then launch with: python -m sglang.launch_server --model-path moonshotai/Kimi-K2.7-Code --port 30000 --tp 8
Docker Model Runner (simplest local deployment):
Docker's Model Runner supports K2.7-Code via their model catalogue, allowing single-command local deployment without manual dependency management. This is the recommended path for developers who want to test the model without committing to infrastructure.
Moonshot API: Available at api.moonshot.cn for developers who prefer managed inference. Pricing is not publicly listed but Moonshot historically prices below OpenAI API rates for equivalent capability.
The Tool-Use Claim — What It Actually Means
The claim that K2.7-Code beats Claude Opus 4 on tool use deserves unpacking. "Tool use" in this context means:
- Function calling: Given a JSON schema of available tools, the model correctly calls the right tool with the right arguments in the right format
- Multi-step tool chains: Completing tasks that require calling multiple tools in sequence, where the output of one call informs the arguments of the next
- Error recovery: When a tool call returns an error, correctly diagnosing the issue and retrying with corrected arguments
These are the capabilities that determine whether an AI agent actually works in production. A model with high HumanEval scores but poor tool-use reliability cannot power a reliable agent. K2.7-Code's 68.4 vs Opus 4's 65.1 is a 5% advantage — meaningful in aggregate across thousands of agent invocations per day.
One important caveat: Moonshot AI conducted these benchmarks themselves. Independent replication from EleutherAI, HuggingFace, or third-party research labs has not yet been published as of this writing. The benchmark methodology for the tool-use comparison has not been independently verified. This is standard practice for model release announcements — but treat the Claude Opus comparison specifically as "claimed, pending replication."
China Open-Source AI — Why This Release Matters
Moonshot AI is the Chinese AI lab best known for Kimi, their conversational AI assistant which has 200+ million users in China and Southeast Asia. The K2.7-Code release continues a pattern of Chinese labs releasing open-weight models with aggressive performance claims shortly after — or slightly ahead of — Western frontier releases.
The DeepSeek-R1 moment in early 2025 was the first time global developers took Chinese open-source AI seriously. Since then: Qwen 3 (Alibaba), Kimi K2 series (Moonshot), GLM-4 (Zhipu), MiniMax Text-01 — each release has pushed the open-weight state of the art closer to closed-model frontier performance.
K2.7-Code specifically targets the coding agent use case, which is where developer spending on AI is growing fastest. The Modified MIT licence means any developer globally — not just in China — can deploy this commercially. Given that cn.bing.com drives significant traffic to technical content and Chinese AI labs are increasingly targeting global developer adoption, K2.7-Code is a more globally relevant release than most Western coverage will suggest.
Our Analysis: Where This Fits in Your Stack
If you are currently using Claude Opus 4 or GPT-4o for coding agent tasks and paying managed API costs, K2.7-Code is worth evaluating for three reasons:
Cost: At 32B active parameters, self-hosted inference costs approximately 70-80% less per token than Opus 4 API pricing. For high-volume code generation or agentic pipelines, the savings compound fast.
Latency: MoE models with 32B active parameters have lower latency per token than dense models of equivalent benchmark performance. For real-time coding assistants, this matters.
Open weights: You can fine-tune K2.7-Code on your proprietary codebase. This is not possible with any closed model. Domain-specific fine-tuning on your internal repositories can push coding accuracy significantly above the base benchmark scores.
The risk: the tool-use advantage over Opus 4 is self-reported and a 5% margin. In production, model reliability and consistency across edge cases matters more than benchmark averages. Evaluate on your specific use case before migrating production workloads.
For context on where Chinese AI models sit relative to Western frontier labs, read our China AI Models 2026 Reality Check. For API cost comparisons across models, the LLM API Pricing Tracker keeps live pricing data.
Key Takeaways
- Kimi K2.7-Code is a 1 trillion parameter MoE coding model from Moonshot AI (China), released June 12 under Modified MIT licence — commercially usable with attribution
- 32B active parameters per forward pass — inference cost equivalent to a 32B dense model, not a 1T dense model; runnable on 4x A100 80GB at 8-bit quantisation
- +21.8% on Kimi Code Bench v2, +31.5% on MLS Bench Lite, 44.1% SWE-Bench Verified (competitive with top closed models)
- Tool-use claim: 68.4 vs Claude Opus 4's 65.1 — Moonshot-reported, third-party replication pending; treat with appropriate caution until independently verified
- Run it: vLLM with --enable-expert-parallel flag, SGLang 0.4.2+ for agentic tasks, Docker Model Runner for local testing; Moonshot API for managed inference
- Cost case: 70-80% lower inference cost than Claude Opus 4 API if self-hosted; plus open weights means fine-tuning on proprietary codebases is possible
- Context: Chinese open-source AI has closed the gap with Western frontier models faster than expected since DeepSeek-R1; K2.7-Code is the latest and strongest coding-specific example
Sources
FAQ
Frequently Asked Questions
What is Kimi K2.7-Code and who made it?
Kimi K2.7-Code is a coding-specialist AI model released June 12, 2026 by Moonshot AI, a Chinese AI lab. It has 1 trillion total parameters in a Mixture-of-Experts architecture with 32 billion active parameters per forward pass and 384 expert networks. It has a 256K token context window and is released under a Modified MIT licence, meaning it is free for commercial use with attribution. It is available on Hugging Face at moonshotai/Kimi-K2.7-Code and can be self-hosted via vLLM, SGLang, or Docker.
Does Kimi K2.7-Code really beat Claude Opus on benchmarks?
Moonshot AI reports that Kimi K2.7-Code scores 68.4 on their tool-use benchmark versus Claude Opus 4 at 65.1 — a 5% claimed advantage. It also scores 44.1% on SWE-Bench Verified, competitive with top closed models. Important caveat: these benchmark results are Moonshot AI self-reported. As of June 14, 2026, independent third-party replication has not been published. The SWE-Bench Verified score is independently testable and will be reproduced by the community in the coming weeks. Treat the Claude Opus tool-use comparison specifically as "claimed, pending verification."
How do I run Kimi K2.7-Code locally?
Install vLLM and run: `vllm serve moonshotai/Kimi-K2.7-Code --tensor-parallel-size 8 --max-model-len 32768 --enable-expert-parallel`. The --enable-expert-parallel flag is critical for MoE performance. For full precision (BF16) you need approximately 2TB of GPU VRAM. At 4-bit quantisation (AWQ or GPTQ) the requirement drops to around 500GB — runnable on a 8x H100 node. For agentic workflows, SGLang 0.4.2+ is preferred over vLLM. Docker Model Runner supports single-command local deployment for testing.
How does Kimi K2.7-Code compare to other open-source coding models?
As of June 2026, Kimi K2.7-Code has the strongest claimed benchmark results of any open-weight coding model. It exceeds Qwen 3 (Alibaba), DeepSeek-V3 (DeepSeek), and Llama 4 Scout on SWE-Bench Verified per available scores. On HumanEval it scores 86.4 versus 82.1 for GPT-4o. The 44.1% SWE-Bench Verified result, if independently confirmed, would make it the strongest open-weight model on real-world coding tasks — the benchmark that best reflects production code agent performance.
Is Kimi K2.7-Code safe to use commercially?
Yes, under the Modified MIT licence. You can use Kimi K2.7-Code in commercial products and charge for services built on it. The single restriction: you cannot use K2.7-Code model outputs to train a competing model without explicit written permission from Moonshot AI. This is a standard "no model distillation without permission" clause that appears in most Chinese open-weight releases. It does not restrict production deployment, API wrapping, or fine-tuning for your own use.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
More on AI Models
All posts →Claude Opus 4.8 Ships With Dynamic Workflows — Same $5/$25 API Price
Anthropic released Claude Opus 4.8 on May 28, 2026: dynamic workflows for Claude Code, effort controls, faster fast mode. API ID claude-opus-4-8 at unchanged $5/$25 per million tokens.
OpenAI Launches Rosalind Biodefense for GPT-Rosalind Life Sciences AI
OpenAI announced Rosalind Biodefense on May 29, 2026 — sponsored GPT-Rosalind access for vetted developers building pandemic detection, screening, and countermeasure tools.
Nvidia Cosmos 3 + RTX Spark N1X: 20T Tokens for Physical AI at COMPUTEX
At COMPUTEX June 2026, Nvidia launched open Cosmos 3 world models (20T tokens, super/nano) for robots and AVs, plus RTX Spark N1X Windows chips with Microsoft, Dell, Lenovo.
Anthropic Files Confidential IPO at $965B Val, $47B Revenue Run Rate
Anthropic confidentially filed with the SEC on June 1, 2026 at $965B valuation and $47B annualized revenue — beating OpenAI to the IPO starting line as SpaceX roadshows this week.
Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 917+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.
