Meta Built 4 Custom AI Chips in 2 Years. Here's What MTIA Means for Nvidia.

Abhishek Gautam··7 min read

Quick summary

Meta unveiled its MTIA chip roadmap in March 2026 — four generations of custom RISC-V inference chips made by TSMC and designed with Broadcom, with MTIA 300 already in production.

Meta revealed a four-generation custom AI chip roadmap in March 2026 — MTIA 300, MTIA 400, MTIA 450, and MTIA 500 — all built within a two-year window. MTIA 300 is already running in production Meta data centres. MTIA 400 has completed testing and ships soon. The chips are built on RISC-V architecture, manufactured by TSMC, and developed in partnership with Broadcom. None of them are for training large language models. All of them are for inference — the workload that runs at billion-user scale every day.

The Full MTIA Chip Lineup

Meta announced the complete chip roadmap on its AI research blog on March 11-12, 2026, coinciding with the company's broader data centre expansion announcements. Here is what each generation covers:

MTIA 300 — The current production chip, deployed in Meta data centres now. Used for ranking and recommendation training — specifically, the models that decide what content appears in your Instagram and Facebook feeds. This is Meta's highest-volume AI workload by query count.

MTIA 400 — Completed testing, deploying to data centres shortly. Targets more complex generative AI inference tasks: image generation, video generation from text prompts, and multimodal AI features. This is the chip powering the AI features Meta is rolling out across its apps.

MTIA 450 — Scheduled for operational deployment in 2027. An incremental update to the 400 generation, likely targeting efficiency improvements and power optimisation for the same inference workload category.

MTIA 500 — Also 2027. The most advanced generation announced, targeting the highest-complexity generative AI inference including video synthesis at longer durations and more capable multimodal tasks.

Each generation ships roughly every six months, which is an aggressive cadence for custom silicon. Standard chip development cycles run two to three years. Meta achieving four generations in two years suggests either a very disciplined incremental design process or significant parallel development across teams.

Why RISC-V and Why TSMC

The choice of RISC-V as the chip architecture is notable. RISC-V is an open instruction set — no licensing fees, no ARM royalties, full freedom to customise. For a company building four chip generations in two years, avoiding ISA licensing costs and restrictions matters. Google's TPU uses a custom ISA. Amazon's Trainium uses a custom ISA. Meta has chosen to standardise on the open RISC-V foundation.

TSMC is manufacturing all MTIA generations. This is expected — TSMC handles advanced chip manufacturing for every major hyperscaler's custom silicon. Intel Foundry and Samsung are theoretical alternatives, but neither has matched TSMC's advanced node yield rates in the relevant process generations. The more interesting detail is the Broadcom partnership.

Broadcom is not a consumer brand, but it is one of the most important companies in the semiconductor supply chain. It designs custom ASICs for hyperscalers — most famously Google's TPU chips, which Broadcom has manufactured and designed in collaboration with Google for over a decade. Meta working with Broadcom on MTIA follows the same model: a hyperscaler provides the architectural requirements and training data insight, Broadcom provides the chip design expertise and TSMC production relationships, and the hyperscaler owns the resulting IP.

Why Inference, Not Training

Meta has explicitly said MTIA chips will not be used to train its large language models. LLaMA training runs on Nvidia H100 and H200 clusters. The MTIA focus on inference is a deliberate strategic choice that reflects how Meta's AI economics actually work.

Training a model happens once (or periodically for updates). Inference happens billions of times per day. Every time a user opens Instagram and sees a recommended reel, that is inference. Every time Meta's content moderation system scans a post, that is inference. Every time a user generates an AI image through Meta's tools, that is inference.

At Meta's scale — 3.3 billion daily active users across its family of apps — even a modest improvement in inference efficiency per query translates to hundreds of millions of dollars in annual compute cost. Custom silicon optimised for Meta's specific inference workloads can achieve efficiency that general-purpose Nvidia GPUs cannot, because the chip is designed around the exact model architectures and serving patterns Meta runs.

The 72 MTIA 400 chips per rack figure in the announcement is telling. A standard Nvidia H100 rack holds 8 GPUs. 72 MTIA chips per rack suggests the MTIA is smaller, lower-power, and more numerous — optimised for inference throughput per watt rather than peak FLOPS.

How MTIA Fits Meta's Broader Anti-Nvidia Strategy

The MTIA announcement came within days of the Meta-AMD $100 billion GPU deal — a commitment to purchase AMD Instinct GPUs at massive scale to diversify away from Nvidia for training workloads. Together, the two announcements form a complete picture of Meta's silicon strategy:

Training: Split between Nvidia (existing H100/H200 clusters) and AMD (new Instinct MI300X and future generation GPUs under the $100B commitment). This gives Meta negotiating leverage against Nvidia without fully abandoning the ecosystem where its LLaMA models were originally trained.

Inference: MTIA chips replace Nvidia GPUs entirely for recommendation, ranking, and generative AI inference. Custom silicon purpose-built for Meta's workloads delivers better performance per dollar than general-purpose GPUs for known, stable inference tasks.

The combined strategy reduces Meta's Nvidia dependency on both fronts without requiring a clean break from Nvidia hardware. Meta keeps buying Nvidia for frontier model training where no viable alternative exists, while eliminating Nvidia from the inference side where volume is highest and custom chips are most effective.

What the MTIA Chip Means for Developers

MTIA chips are not available externally. Meta does not sell access to MTIA compute and has no announced plans to offer it as a cloud product. The developer implications are indirect but real:

Meta's AI APIs will be faster and cheaper to operate. Meta AI, the assistant built into WhatsApp, Instagram, and Facebook, runs on MTIA for inference. As MTIA 400 and 450 roll out, Meta AI's response latency should decrease and its operational cost per query should fall. If Meta offers AI API access to developers (currently limited), the economics improve with MTIA.

LLaMA models may be optimised for RISC-V inference. Meta's open-source LLaMA model releases have historically been designed to run on Nvidia GPUs. If Meta's internal inference infrastructure shifts substantially to RISC-V MTIA chips, future LLaMA versions may include inference optimisations for RISC-V targets — which benefits the broader ecosystem of RISC-V hardware that is emerging.

The RISC-V ecosystem gets a major credibility signal. Custom AI silicon from one of the world's largest technology companies built on RISC-V is a significant endorsement of the open ISA. SiFive, Ventana, and other RISC-V chip companies benefit from the ecosystem validation. Edge AI developers exploring RISC-V inference for low-power devices get a stronger ecosystem to build around.

Competitive pressure on inference API pricing. When hyperscalers reduce their inference costs through custom silicon, they have room to lower API pricing. Meta, Google (TPU), Amazon (Trainium/Inferentia), and now AWS-Cerebras are all reducing per-token inference costs. Developers building on foundation model APIs should expect continued price compression as custom silicon scales.

The Cadence Is the Signal

Four chip generations in two years is the most significant detail in the MTIA announcement. It is not the chip specifications that matter most — those will change every six months. It is what the cadence reveals about Meta's organisational commitment.

Custom silicon requires years of investment before it pays off. The design teams, fab relationships, testing infrastructure, and software toolchains are expensive to build. Companies that commit to a six-month chip release cadence are not doing it speculatively — they have validated the economics, locked in the manufacturing relationships, and built the teams. Meta is in for the long term on custom inference silicon.

The comparison to Google is instructive. Google started TPU development in 2013. By 2023, TPUs handled the majority of Google's AI compute internally. Meta is a decade behind Google on custom silicon but catching up fast. The MTIA 500 in 2027 will likely be the first generation where Meta's custom chips handle the majority of its inference compute rather than Nvidia GPUs.

Key Takeaways

  • Meta announced four MTIA chip generations (MTIA 300/400/450/500) built in two years — MTIA 300 already in production, MTIA 400 shipping imminently
  • All MTIA chips target inference, not training — optimised for recommendation systems, image generation, video synthesis, and multimodal AI at 3.3 billion user scale
  • RISC-V architecture, TSMC manufacturing, Broadcom partnership — same model Google used for TPU development
  • 72 MTIA 400 chips per rack versus 8 H100s — smaller, more numerous, power-efficient inference chips replacing high-FLOPS GPU racks
  • Part of a two-pronged Nvidia exit strategy: MTIA replaces Nvidia for inference; the $100B AMD deal replaces Nvidia for training
  • Developer implications are indirect: faster Meta AI products, potential LLaMA RISC-V optimisations, competitive inference API pricing as hyperscaler custom silicon reduces per-token costs

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

ShareX / TwitterLinkedIn

Written by

Abhishek Gautam

Full Stack Developer & Software Engineer based in Delhi, India. Building web applications and SaaS products with React, Next.js, Node.js, and TypeScript. 8+ projects deployed across 7+ countries.