Meta MTIA 400 Deploys: 6 Petaflops, 288GB HBM, and Nvidia Gets Displaced

Abhishek GautamApril 4, 20266 min read

Meta MTIA 400 Deploys: 6 Petaflops, 288GB HBM, and Nvidia Gets Displaced

Quick summary

Meta deployed MTIA 400 chips — 6 petaflops FP8, 288GB HBM, 72 per rack — for GenAI inference. What it means for Llama API costs, Nvidia GPU demand, and open source AI economics.

What MTIA Actually Is

MTIA stands for Meta Training and Inference Accelerator. It is not a general-purpose GPU. Meta designed these chips for one thing: running AI inference workloads at scale inside Meta's own infrastructure.

The MTIA 300 is already deployed and handling Meta's ranking and recommendation tasks — the systems that decide what appears in your Facebook and Instagram feed. Billions of requests per day, inference-only, no training. That is the workload MTIA was built for first.

The MTIA 400, now being deployed, is a step up in capability and is explicitly targeted at generative AI inference: image generation, video synthesis, and text-based AI responses from prompts.

MTIA 400 Specifications

The MTIA 400 delivers 6 petaflops of FP8 compute performance with a 1,200W TDP. HBM capacity is 288GB, and HBM bandwidth is 9.2 terabytes per second — a 51% increase over the MTIA 300.

Each data center rack holds 72 MTIA 400 chips, optimized for AI inference density. That is 432 petaflops of FP8 inference compute per rack.

To put that in context: a single NVIDIA H100 SXM5 delivers around 3.9 petaflops of FP8 compute with 80GB of HBM3. Meta's MTIA 400 delivers roughly 1.5x the FP8 throughput of an H100 on paper — with significantly more HBM capacity. The architectural difference is that MTIA 400 is designed for inference-specific workload patterns, not the flexible training-and-inference use cases NVIDIA targets.

The Roadmap: 400, 450, 500

Meta unveiled four MTIA generations in March 2026, with a new chip releasing roughly every six months.

MTIA 450: Raises compute to 7 petaflops FP8, doubles HBM bandwidth to 18.4 terabytes per second, 288GB HBM, 1,400W TDP. The bandwidth doubling is significant — the constraint in large language model inference is often memory bandwidth, not raw compute. Scheduled for 2027.

MTIA 500: Delivers 10 petaflops FP8, 1,700W TDP, 384-512GB HBM. This is Meta's high-end generative AI chip. Scheduled for late 2027. At 10 petaflops with 512GB HBM, it is competitive with anything NVIDIA plans to offer before the Rubin architecture ships.

The progression from 300 to 500 in roughly two years is aggressive. Google started its TPU program in 2013 and took five years to reach competitive performance at scale. Meta is compressing that timeline considerably.

Why Meta Is Doing This While Still Buying Nvidia

Meta announced major Nvidia and AMD GPU procurement deals in the weeks before the MTIA 400 deployment news. That looks contradictory but follows a clear logic.

Nvidia H100s and H200s are general-purpose. They train models, run inference, and handle research workloads that do not fit neatly into MTIA's inference-optimized architecture. Meta still needs them for training Llama 4 and future model generations. For training at scale, NVIDIA remains the only realistic option right now.

MTIA is for inference — the part of the stack that runs billions of times per day. Inference is where Meta spends the majority of its AI compute budget, because every request to Meta AI, every recommendation call, every image generation goes through inference. Owning that silicon layer means Meta controls both the cost and the capability of its inference stack.

With TSMC wafer prices rising 5-10% and Nvidia GPU prices under upward pressure from the same tariffs, the economics of building custom silicon improve every quarter.

What This Means for Nvidia

Meta's MTIA program does not threaten NVIDIA's training business. Training frontier models still requires NVIDIA's software ecosystem (CUDA), memory bandwidth characteristics, and multi-chip interconnect (NVLink). Meta's $10 billion+ in Nvidia procurement this year reflects that reality.

Where MTIA displaces Nvidia is at inference at scale inside Meta's walls. Every MTIA 400 rack that handles production GenAI inference is a rack that did not get filled with H100s or H200s. Meta runs some of the highest-volume AI inference in the world. Shifting even 20-30% of that to custom silicon is billions of dollars of NVIDIA demand that disappears.

NVIDIA's data center revenue hit $115 billion in fiscal 2026. The hyperscaler custom silicon risk is the one analyst note that keeps appearing in NVIDIA earnings discussions. Google has TPUs. Amazon has Trainium and Inferentia. Microsoft has Maia. Meta now has MTIA at scale. Apple has its own Neural Engine. The pattern is clear: every company large enough to justify the R&D investment is building away from total NVIDIA dependence.

Developer and Open Source Impact

Meta's infrastructure cost reduction has a direct line to Llama. Llama 4 is already available through Meta AI and through third-party inference providers. When Meta's inference cost per token drops because MTIA 400 handles the workload more cheaply than an H100, that savings can either go to Meta's margin or get passed into lower API costs for developers.

The more interesting effect is on open source availability. Meta releases Llama weights publicly. If internal inference costs drop, the calculus on how much compute budget to allocate to running Meta AI versus investing in model research changes. Lower inference costs make it cheaper for the community to self-host Llama — less GPU demand per request on the same model.

For teams comparing hosted Llama options against OpenAI and Anthropic pricing, the LLM API Pricing Tracker has current per-token rates across providers. For how the TSMC price increases affect NVIDIA GPU procurement this year, see the TSMC chip price hike analysis.

The Custom Silicon Race Is Now a Full Lap Ahead

In 2024, custom AI silicon was a research project at most hyperscalers. In 2026, it is in production at all five of the major players (Google, Amazon, Microsoft, Meta, Apple) and the roadmaps are accelerating.

Meta's MTIA program is notable specifically because it is moving fast and the specs are competitive. Going from MTIA 300 (ranking workloads) to MTIA 500 (10 petaflops, 512GB HBM) in under two years is not gradual displacement — it is a sprint.

The companies best positioned for the next wave of AI infrastructure cost pressure are the ones building their own silicon. For developers, this matters because it determines which providers can sustain low inference pricing as GPU costs rise everywhere else.

Key Takeaways

MTIA 400 deployed: 6 petaflops FP8, 288GB HBM, 9.2 TB/s bandwidth, 1,200W TDP — 72 chips per rack for GenAI inference
MTIA 450 (2027): 7 petaflops, HBM bandwidth doubled to 18.4 TB/s — memory bandwidth is the bottleneck for LLM inference
MTIA 500 (late 2027): 10 petaflops, 384-512GB HBM, 1,700W — competitive with NVIDIA's next generation
MTIA 300 already running: live in production for ranking/recommendation — the highest-volume AI workload Meta runs
Nvidia not displaced from training: Meta still buying Nvidia GPUs at scale for model training; MTIA targets inference economics
Cost pressure context: rising TSMC wafer prices and April 9 equipment tariffs make custom silicon economics improve every quarter
Open source impact: lower Meta inference costs could translate into cheaper hosted Llama options and easier self-hosting economics for developers

FAQ

Frequently Asked Questions

What are the MTIA 400 chip specifications?

The Meta MTIA 400 delivers 6 petaflops of FP8 compute, 288GB of HBM capacity, 9.2 terabytes per second of HBM bandwidth, and a 1,200W TDP. Data center racks hold 72 MTIA 400 chips, giving 432 petaflops of FP8 inference compute per rack.

Is Meta's MTIA chip replacing Nvidia GPUs?

Meta's MTIA replaces Nvidia GPUs for inference workloads inside Meta's own data centers. It does not replace Nvidia for model training, where Meta still relies on H100s and H200s. The MTIA program displaces demand for Nvidia inference-only deployments at Meta's scale.

How does Meta's MTIA affect Llama and developer API costs?

Lower internal inference costs from MTIA deployments can reduce the cost per token for Meta AI and hosted Llama inference. This may enable lower API pricing for developers over time and improves the economics of self-hosting Llama weights.

How does Meta MTIA compare to Nvidia H100 performance?

MTIA 400 delivers 6 petaflops FP8 versus the H100's approximately 3.9 petaflops FP8, with 288GB HBM versus 80GB on the H100. The comparison is not direct — MTIA is inference-optimized and not suitable for general training workloads where NVIDIA's CUDA ecosystem and NVLink interconnect dominate.

When will Meta MTIA 450 and MTIA 500 be available?

The MTIA 450 is scheduled for 2027 and doubles HBM bandwidth to 18.4 TB/s at 7 petaflops FP8. The MTIA 500 is targeted for late 2027 with 10 petaflops FP8 and up to 512GB HBM. Meta is releasing roughly one new MTIA generation every six months.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.