DeepSeek V4 Is Coming: 1 Trillion Parameters, Multimodal, Huawei Ascend-Optimised
Quick summary
DeepSeek's next flagship model is imminent — 1 trillion parameter MoE architecture, multimodal support, 1M token context, trained on Huawei Ascend. Here's what it means for developers and the widening US-China AI stack split.
Read next
- DeepSeek V4: 1M Context, Multimodal, Coding Benchmarks — What Developers Get in 2026DeepSeek V4 launch: 1 million token context, multimodal, coding-first. Benchmarks vs GPT-4o and Claude, API pricing, and what developers actually get in 2026.
- Cursor Composer 2 Beats Claude Opus 4.6 at $0.50/1M — Built on Chinese AICursor launched Composer 2 on March 19: beats Claude Opus 4.6 on coding benchmarks at $0.50/1M tokens. Built on Kimi K2.5. Moonshot AI is now accusing Cursor of license violation.
The AI world is watching China very carefully right now. DeepSeek V4 — the successor to the model that sent Nvidia's stock down 17% in a single day — is expected to drop any day. And based on what we know, it is going to land harder than V3.
Here is everything developers need to know before it arrives.
What We Know About DeepSeek V4
DeepSeek V4 is a Mixture-of-Experts (MoE) architecture with approximately 1 trillion total parameters. Like V3 before it, only a subset of those parameters activate per inference — roughly 37 billion — which keeps latency and compute costs low while enabling performance that rivals much larger dense models.
The confirmed additions over V3:
Multimodal input. V4 will accept text, images, and code in the same context window. This closes the gap with GPT-4o and Claude 3.5 Sonnet that existed in V3.
1 million token context. V3 supported 128K tokens. V4 targets 1 million tokens — competitive with Gemini 1.5 Pro and Claude 3.5 Sonnet, and beyond most competitors on raw context length.
Huawei Ascend optimisation. This is the geopolitically significant part. V4 is trained and optimised to run on Huawei Ascend 910B and 910C accelerators — not Nvidia H100 or H200.
The Huawei Ascend Angle
When the US imposed export restrictions on Nvidia A100 and H100 chips in October 2022, followed by H800 restrictions in October 2023 and H20 restrictions in April 2024, the assumption was that Chinese AI development would slow. DeepSeek proved that assumption wrong with V3.
V4 goes further. DeepSeek has reportedly given early model access to Chinese chip suppliers — including Huawei — rather than to US chipmakers like Nvidia and AMD. The message is deliberate: V4 was built for Chinese hardware and will be optimised for Chinese hardware first.
Zhipu AI's GLM-5, with 744 billion parameters, was trained entirely on Huawei Ascend. It demonstrated near-parity with Western models on standard benchmarks. V4 is built on top of that foundation.
This is not a workaround. It is an intentional architectural choice to decouple China's most advanced AI from the US semiconductor supply chain entirely.
The US-China AI Stack Split
What is happening at the infrastructure level is a bifurcation of the global AI stack. Two separate technology supply chains are emerging:
US-centric stack: Nvidia H100/H200/Blackwell GPUs, CUDA, HuggingFace ecosystem, OpenAI/Anthropic/Google models, AWS/Azure/GCP inference.
China-centric stack: Huawei Ascend 910B/C, CANN software (Huawei's CUDA equivalent), DeepSeek/GLM/Qwen models, Alibaba Cloud/Huawei Cloud/Baidu AI Cloud inference.
For most developers outside China and the US, this creates a genuine choice that did not exist 18 months ago. DeepSeek models are open-weight, downloadable, and run well on consumer hardware. You can run DeepSeek V3 on a single A100 or on commodity CPU clusters using quantised versions.
V4, when it drops, will likely be released the same way: open weights, Apache 2.0 or MIT licence, downloadable from HuggingFace.
What This Means for Developers
API cost implications. DeepSeek's API pricing has historically been 10-20x cheaper than OpenAI equivalents. V4 will almost certainly be in the same range. If you are paying $15-30 per million output tokens for GPT-4o, you should benchmark V4 seriously before your next contract renewal.
Open-weight access. If the US government restricts access to DeepSeek models — which several legislators have proposed — the open-weight release means models will still be available via HuggingFace mirrors and self-hosted deployments. This is a different category of risk than a closed-API service.
Context window use cases. 1 million tokens means you can fit entire codebases, full legal documents, complete financial reports, or 750,000 words of text in a single context. Retrieval-augmented generation becomes optional for many use cases that currently require it.
Multimodal workflows. V3's text-only limitation meant teams building vision-based pipelines had to route to GPT-4o or Gemini for images. V4 removes that requirement for developers who want to stay on the DeepSeek stack.
What Nvidia Loses
Nvidia halted China-bound H200 production in early March 2026 and shifted TSMC capacity allocation to Vera Rubin (its next-generation architecture). Reuters and FT confirmed this March 5.
The strategic logic is: stop investing in chips China will reject or cannot legally import, and focus on products for markets where the export pipeline still works. But this means Nvidia's China revenue — which was $5.5B annualised at its peak before export restrictions — is gone, and the substitute customers (US hyperscalers) are already capacity-constrained.
The US is also mulling new AI chip export rules that would require foreign investments in US AI data centers or security guarantees for exports above 200,000 chips. This is a further squeeze on Nvidia's international business.
Meanwhile, Huawei shipped 1,900 Ascend 910B servers per month in Q4 2024 and is scaling production in 2026. V4's optimisation for Ascend is not incidental — it is validation that the Ascend hardware stack works for frontier model training.
When Does V4 Drop?
No official date. Based on DeepSeek's release cadence and the technical leaks (whitepaper details, benchmark data circulating in Chinese AI research communities), it is expected imminently as of early March 2026. The model was reportedly training at scale in February 2026.
Watch: DeepSeek's official site (deepseek.com), their HuggingFace account, and Chinese AI research forums. The release will be announced simultaneously on all channels with an API date.
The Bottom Line
DeepSeek V4 is the most important AI model release of 2026 that no Western developer is adequately prepared for. When it arrives:
- Benchmark it against whatever you are currently paying for before assuming GPT-5 or Claude 4 is the right choice.
- Understand the geopolitical risk: US access restrictions are possible. Plan for self-hosted fallback if DeepSeek is a production dependency.
- Recognise that the AI stack is splitting. You may need to support both ecosystems for different customer segments within two years.
The era of a single dominant AI supply chain — anchored on Nvidia hardware and US API providers — is ending. V4 is the clearest proof point yet.
For full benchmark numbers and API pricing on the actual V4 release, see DeepSeek V4: 1M Context, Multimodal, Coding Benchmarks.
FAQ
Frequently Asked Questions
What is DeepSeek V4 and when is it releasing?
DeepSeek V4 is the next flagship model from Chinese AI lab DeepSeek. It features approximately 1 trillion total parameters in a Mixture-of-Experts architecture, multimodal input (text + images + code), a 1 million token context window, and is optimised for Huawei Ascend hardware. No official release date has been announced, but technical leaks and DeepSeek's release cadence indicate it is expected imminently as of March 2026.
Why is DeepSeek V4 being trained on Huawei Ascend instead of Nvidia?
US export restrictions have cut off China's access to Nvidia's most advanced AI chips (A100, H100, H800, H20, H200). DeepSeek has responded by engineering its models to run optimally on Huawei Ascend 910B/C accelerators — Huawei's domestically produced alternative. DeepSeek reportedly gave V4 early access to Chinese chip suppliers rather than US chipmakers, signalling a deliberate decoupling from the US hardware ecosystem.
Can US developers access DeepSeek V4?
Likely yes, at least initially. DeepSeek releases its models as open weights under permissive licences (Apache 2.0 or MIT), available on HuggingFace. This means even if API access were restricted, the model weights can be downloaded and self-hosted. Several US legislators have proposed restricting access to DeepSeek models, but as of March 2026 no such restriction has been implemented.
How does DeepSeek V4 compare in cost to GPT-4o or Claude?
DeepSeek V3's API pricing is approximately $0.27 per million input tokens and $1.10 per million output tokens — roughly 10-20x cheaper than GPT-4o ($2.50/$10.00) or Claude 3.5 Sonnet ($3.00/$15.00). V4 is expected to be in a similar range. For cost-sensitive workloads, the gap is significant enough to justify serious benchmarking.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
More on AI
All posts →DeepSeek V4: 1M Context, Multimodal, Coding Benchmarks — What Developers Get in 2026
DeepSeek V4 launch: 1 million token context, multimodal, coding-first. Benchmarks vs GPT-4o and Claude, API pricing, and what developers actually get in 2026.
Cursor Composer 2 Beats Claude Opus 4.6 at $0.50/1M — Built on Chinese AI
Cursor launched Composer 2 on March 19: beats Claude Opus 4.6 on coding benchmarks at $0.50/1M tokens. Built on Kimi K2.5. Moonshot AI is now accusing Cursor of license violation.
Inside China's AI Manhattan Project: Export Control Gaps and the Race to Build Sovereign AI
China is running the largest state-directed AI programme in history — often called its "AI Manhattan Project." But US and allied export controls have critical gaps. Here is how China is navigating restrictions, what the gaps are, and what this means for global AI competition.
Nvidia H200 China: US Approved It, Beijing Blocked It, Zero Delivered
Trump cleared H200 sales to Alibaba, Tencent, ByteDance. Beijing blocked its own companies from taking delivery. Jensen Huang says Nvidia has largely conceded China's AI market.
Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 919+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.
