What is DeepSeek V4 and when is it releasing?

DeepSeek V4 is the next flagship model from Chinese AI lab DeepSeek. It features approximately 1 trillion total parameters in a Mixture-of-Experts architecture, multimodal input (text + images + code), a 1 million token context window, and is optimised for Huawei Ascend hardware. No official release date has been announced, but technical leaks and DeepSeek's release cadence indicate it is expected imminently as of March 2026.

Why is DeepSeek V4 being trained on Huawei Ascend instead of Nvidia?

US export restrictions have cut off China's access to Nvidia's most advanced AI chips (A100, H100, H800, H20, H200). DeepSeek has responded by engineering its models to run optimally on Huawei Ascend 910B/C accelerators — Huawei's domestically produced alternative. DeepSeek reportedly gave V4 early access to Chinese chip suppliers rather than US chipmakers, signalling a deliberate decoupling from the US hardware ecosystem.

Can US developers access DeepSeek V4?

Likely yes, at least initially. DeepSeek releases its models as open weights under permissive licences (Apache 2.0 or MIT), available on HuggingFace. This means even if API access were restricted, the model weights can be downloaded and self-hosted. Several US legislators have proposed restricting access to DeepSeek models, but as of March 2026 no such restriction has been implemented.

How does DeepSeek V4 compare in cost to GPT-4o or Claude?

DeepSeek V3's API pricing is approximately $0.27 per million input tokens and $1.10 per million output tokens — roughly 10-20x cheaper than GPT-4o ($2.50/$10.00) or Claude 3.5 Sonnet ($3.00/$15.00). V4 is expected to be in a similar range. For cost-sensitive workloads, the gap is significant enough to justify serious benchmarking.

AI DeepSeek China AI LLMs Geopolitics

DeepSeek V4 Is Coming: 1 Trillion Parameters, Multimodal, Huawei Ascend-Optimised

Abhishek Gautam·March 12, 2026·8 min read

Quick summary

DeepSeek's next flagship model is imminent — 1 trillion parameter MoE architecture, multimodal support, 1M token context, trained on Huawei Ascend. Here's what it means for developers and the widening US-China AI stack split.

The AI world is watching China very carefully right now. DeepSeek V4 — the successor to the model that sent Nvidia's stock down 17% in a single day — is expected to drop any day. And based on what we know, it is going to land harder than V3.

Here is everything developers need to know before it arrives.

What We Know About DeepSeek V4

DeepSeek V4 is a Mixture-of-Experts (MoE) architecture with approximately 1 trillion total parameters. Like V3 before it, only a subset of those parameters activate per inference — roughly 37 billion — which keeps latency and compute costs low while enabling performance that rivals much larger dense models.

The confirmed additions over V3:

Multimodal input. V4 will accept text, images, and code in the same context window. This closes the gap with GPT-4o and Claude 3.5 Sonnet that existed in V3.

1 million token context. V3 supported 128K tokens. V4 targets 1 million tokens — competitive with Gemini 1.5 Pro and Claude 3.5 Sonnet, and beyond most competitors on raw context length.

Huawei Ascend optimisation. This is the geopolitically significant part. V4 is trained and optimised to run on Huawei Ascend 910B and 910C accelerators — not Nvidia H100 or H200.

The Huawei Ascend Angle

When the US imposed export restrictions on Nvidia A100 and H100 chips in October 2022, followed by H800 restrictions in October 2023 and H20 restrictions in April 2024, the assumption was that Chinese AI development would slow. DeepSeek proved that assumption wrong with V3.

V4 goes further. DeepSeek has reportedly given early model access to Chinese chip suppliers — including Huawei — rather than to US chipmakers like Nvidia and AMD. The message is deliberate: V4 was built for Chinese hardware and will be optimised for Chinese hardware first.

Zhipu AI's GLM-5, with 744 billion parameters, was trained entirely on Huawei Ascend. It demonstrated near-parity with Western models on standard benchmarks. V4 is built on top of that foundation.

This is not a workaround. It is an intentional architectural choice to decouple China's most advanced AI from the US semiconductor supply chain entirely.

The US-China AI Stack Split

What is happening at the infrastructure level is a bifurcation of the global AI stack. Two separate technology supply chains are emerging:

US-centric stack: Nvidia H100/H200/Blackwell GPUs, CUDA, HuggingFace ecosystem, OpenAI/Anthropic/Google models, AWS/Azure/GCP inference.

China-centric stack: Huawei Ascend 910B/C, CANN software (Huawei's CUDA equivalent), DeepSeek/GLM/Qwen models, Alibaba Cloud/Huawei Cloud/Baidu AI Cloud inference.

For most developers outside China and the US, this creates a genuine choice that did not exist 18 months ago. DeepSeek models are open-weight, downloadable, and run well on consumer hardware. You can run DeepSeek V3 on a single A100 or on commodity CPU clusters using quantised versions.

V4, when it drops, will likely be released the same way: open weights, Apache 2.0 or MIT licence, downloadable from HuggingFace.

What This Means for Developers

API cost implications. DeepSeek's API pricing has historically been 10-20x cheaper than OpenAI equivalents. V4 will almost certainly be in the same range. If you are paying $15-30 per million output tokens for GPT-4o, you should benchmark V4 seriously before your next contract renewal.

Open-weight access. If the US government restricts access to DeepSeek models — which several legislators have proposed — the open-weight release means models will still be available via HuggingFace mirrors and self-hosted deployments. This is a different category of risk than a closed-API service.

Context window use cases. 1 million tokens means you can fit entire codebases, full legal documents, complete financial reports, or 750,000 words of text in a single context. Retrieval-augmented generation becomes optional for many use cases that currently require it.

Multimodal workflows. V3's text-only limitation meant teams building vision-based pipelines had to route to GPT-4o or Gemini for images. V4 removes that requirement for developers who want to stay on the DeepSeek stack.

What Nvidia Loses

Nvidia halted China-bound H200 production in early March 2026 and shifted TSMC capacity allocation to Vera Rubin (its next-generation architecture). Reuters and FT confirmed this March 5.

The strategic logic is: stop investing in chips China will reject or cannot legally import, and focus on products for markets where the export pipeline still works. But this means Nvidia's China revenue — which was $5.5B annualised at its peak before export restrictions — is gone, and the substitute customers (US hyperscalers) are already capacity-constrained.

The US is also mulling new AI chip export rules that would require foreign investments in US AI data centers or security guarantees for exports above 200,000 chips. This is a further squeeze on Nvidia's international business.

Meanwhile, Huawei shipped 1,900 Ascend 910B servers per month in Q4 2024 and is scaling production in 2026. V4's optimisation for Ascend is not incidental — it is validation that the Ascend hardware stack works for frontier model training.

When Does V4 Drop?

No official date. Based on DeepSeek's release cadence and the technical leaks (whitepaper details, benchmark data circulating in Chinese AI research communities), it is expected imminently as of early March 2026. The model was reportedly training at scale in February 2026.

Watch: DeepSeek's official site (deepseek.com), their HuggingFace account, and Chinese AI research forums. The release will be announced simultaneously on all channels with an API date.

The Bottom Line

DeepSeek V4 is the most important AI model release of 2026 that no Western developer is adequately prepared for. When it arrives:

Benchmark it against whatever you are currently paying for before assuming GPT-5 or Claude 4 is the right choice.
Understand the geopolitical risk: US access restrictions are possible. Plan for self-hosted fallback if DeepSeek is a production dependency.
Recognise that the AI stack is splitting. You may need to support both ecosystems for different customer segments within two years.

The era of a single dominant AI supply chain — anchored on Nvidia hardware and US API providers — is ending. V4 is the clearest proof point yet.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.