Meta Built 4 Custom AI Chips in 2 Years. Here's What MTIA Means for Nvidia.

Abhishek GautamAbhishek Gautam7 min read
Meta Built 4 Custom AI Chips in 2 Years. Here's What MTIA Means for Nvidia.

Quick summary

Meta unveiled its MTIA chip roadmap in March 2026 — four generations of custom RISC-V inference chips made by TSMC and designed with Broadcom, with MTIA 300 already in production.

Meta revealed a four-generation custom AI chip roadmap in March 2026 — MTIA 300, MTIA 400, MTIA 450, and MTIA 500 — all built within a two-year window. MTIA 300 is already running in production Meta data centres. MTIA 400 has completed testing and ships soon. The chips are built on RISC-V architecture, manufactured by TSMC, and developed in partnership with Broadcom. None of them are for training large language models. All of them are for inference — the workload that runs at billion-user scale every day.

The Full MTIA Chip Lineup

Meta announced the complete chip roadmap on its AI research blog on March 11-12, 2026, coinciding with the company's broader data centre expansion announcements. Here is what each generation covers:

MTIA 300 — The current production chip, deployed in Meta data centres now. Used for ranking and recommendation training — specifically, the models that decide what content appears in your Instagram and Facebook feeds. This is Meta's highest-volume AI workload by query count.

MTIA 400 — Completed testing, deploying to data centres shortly. Targets more complex generative AI inference tasks: image generation, video generation from text prompts, and multimodal AI features. This is the chip powering the AI features Meta is rolling out across its apps.

MTIA 450 — Scheduled for operational deployment in 2027. An incremental update to the 400 generation, likely targeting efficiency improvements and power optimisation for the same inference workload category.

MTIA 500 — Also 2027. The most advanced generation announced, targeting the highest-complexity generative AI inference including video synthesis at longer durations and more capable multimodal tasks.

Each generation ships roughly every six months, which is an aggressive cadence for custom silicon. Standard chip development cycles run two to three years. Meta achieving four generations in two years suggests either a very disciplined incremental design process or significant parallel development across teams.

Why RISC-V and Why TSMC

The choice of RISC-V as the chip architecture is notable. RISC-V is an open instruction set — no licensing fees, no ARM royalties, full freedom to customise. For a company building four chip generations in two years, avoiding ISA licensing costs and restrictions matters. Google's TPU uses a custom ISA. Amazon's Trainium uses a custom ISA. Meta has chosen to standardise on the open RISC-V foundation.

TSMC is manufacturing all MTIA generations. This is expected — TSMC handles advanced chip manufacturing for every major hyperscaler's custom silicon. Intel Foundry and Samsung are theoretical alternatives, but neither has matched TSMC's advanced node yield rates in the relevant process generations. The more interesting detail is the Broadcom partnership.

Broadcom is not a consumer brand, but it is one of the most important companies in the semiconductor supply chain. It designs custom ASICs for hyperscalers — most famously Google's TPU chips, which Broadcom has manufactured and designed in collaboration with Google for over a decade. Meta working with Broadcom on MTIA follows the same model: a hyperscaler provides the architectural requirements and training data insight, Broadcom provides the chip design expertise and TSMC production relationships, and the hyperscaler owns the resulting IP.

Why Inference, Not Training

Meta has explicitly said MTIA chips will not be used to train its large language models. LLaMA training runs on Nvidia H100 and H200 clusters. The MTIA focus on inference is a deliberate strategic choice that reflects how Meta's AI economics actually work.

Training a model happens once (or periodically for updates). Inference happens billions of times per day. Every time a user opens Instagram and sees a recommended reel, that is inference. Every time Meta's content moderation system scans a post, that is inference. Every time a user generates an AI image through Meta's tools, that is inference.

At Meta's scale — 3.3 billion daily active users across its family of apps — even a modest improvement in inference efficiency per query translates to hundreds of millions of dollars in annual compute cost. Custom silicon optimised for Meta's specific inference workloads can achieve efficiency that general-purpose Nvidia GPUs cannot, because the chip is designed around the exact model architectures and serving patterns Meta runs.

The 72 MTIA 400 chips per rack figure in the announcement is telling. A standard Nvidia H100 rack holds 8 GPUs. 72 MTIA chips per rack suggests the MTIA is smaller, lower-power, and more numerous — optimised for inference throughput per watt rather than peak FLOPS.

How MTIA Fits Meta's Broader Anti-Nvidia Strategy

The MTIA announcement came within days of the Meta-AMD $100 billion GPU deal — a commitment to purchase AMD Instinct GPUs at massive scale to diversify away from Nvidia for training workloads. Together, the two announcements form a complete picture of Meta's silicon strategy:

Training: Split between Nvidia (existing H100/H200 clusters) and AMD (new Instinct MI300X and future generation GPUs under the $100B commitment). This gives Meta negotiating leverage against Nvidia without fully abandoning the ecosystem where its LLaMA models were originally trained.

Inference: MTIA chips replace Nvidia GPUs entirely for recommendation, ranking, and generative AI inference. Custom silicon purpose-built for Meta's workloads delivers better performance per dollar than general-purpose GPUs for known, stable inference tasks.

The combined strategy reduces Meta's Nvidia dependency on both fronts without requiring a clean break from Nvidia hardware. Meta keeps buying Nvidia for frontier model training where no viable alternative exists, while eliminating Nvidia from the inference side where volume is highest and custom chips are most effective.

What the MTIA Chip Means for Developers

MTIA chips are not available externally. Meta does not sell access to MTIA compute and has no announced plans to offer it as a cloud product. The developer implications are indirect but real:

Meta's AI APIs will be faster and cheaper to operate. Meta AI, the assistant built into WhatsApp, Instagram, and Facebook, runs on MTIA for inference. As MTIA 400 and 450 roll out, Meta AI's response latency should decrease and its operational cost per query should fall. If Meta offers AI API access to developers (currently limited), the economics improve with MTIA.

LLaMA models may be optimised for RISC-V inference. Meta's open-source LLaMA model releases have historically been designed to run on Nvidia GPUs. If Meta's internal inference infrastructure shifts substantially to RISC-V MTIA chips, future LLaMA versions may include inference optimisations for RISC-V targets — which benefits the broader ecosystem of RISC-V hardware that is emerging.

The RISC-V ecosystem gets a major credibility signal. Custom AI silicon from one of the world's largest technology companies built on RISC-V is a significant endorsement of the open ISA. SiFive, Ventana, and other RISC-V chip companies benefit from the ecosystem validation. Edge AI developers exploring RISC-V inference for low-power devices get a stronger ecosystem to build around.

Competitive pressure on inference API pricing. When hyperscalers reduce their inference costs through custom silicon, they have room to lower API pricing. Meta, Google (TPU), Amazon (Trainium/Inferentia), and now AWS-Cerebras are all reducing per-token inference costs. Developers building on foundation model APIs should expect continued price compression as custom silicon scales.

The Cadence Is the Signal

Four chip generations in two years is the most significant detail in the MTIA announcement. It is not the chip specifications that matter most — those will change every six months. It is what the cadence reveals about Meta's organisational commitment.

Custom silicon requires years of investment before it pays off. The design teams, fab relationships, testing infrastructure, and software toolchains are expensive to build. Companies that commit to a six-month chip release cadence are not doing it speculatively — they have validated the economics, locked in the manufacturing relationships, and built the teams. Meta is in for the long term on custom inference silicon.

The comparison to Google is instructive. Google started TPU development in 2013. By 2023, TPUs handled the majority of Google's AI compute internally. Meta is a decade behind Google on custom silicon but catching up fast. The MTIA 500 in 2027 will likely be the first generation where Meta's custom chips handle the majority of its inference compute rather than Nvidia GPUs.

Key Takeaways

  • Meta announced four MTIA chip generations (MTIA 300/400/450/500) built in two years — MTIA 300 already in production, MTIA 400 shipping imminently
  • All MTIA chips target inference, not training — optimised for recommendation systems, image generation, video synthesis, and multimodal AI at 3.3 billion user scale
  • RISC-V architecture, TSMC manufacturing, Broadcom partnership — same model Google used for TPU development
  • 72 MTIA 400 chips per rack versus 8 H100s — smaller, more numerous, power-efficient inference chips replacing high-FLOPS GPU racks
  • Part of a two-pronged Nvidia exit strategy: MTIA replaces Nvidia for inference; the $100B AMD deal replaces Nvidia for training
  • Developer implications are indirect: faster Meta AI products, potential LLaMA RISC-V optimisations, competitive inference API pricing as hyperscaler custom silicon reduces per-token costs

FAQ

Frequently Asked Questions

What is Meta's MTIA chip and what does it do?

MTIA stands for Meta Training and Inference Accelerator. It is Meta's custom AI chip built on RISC-V architecture, manufactured by TSMC in partnership with Broadcom. Despite the name, current MTIA chips are used exclusively for inference workloads — specifically recommendation ranking for Facebook and Instagram feeds, image generation, and video synthesis. Large language model training like LLaMA continues to run on Nvidia GPUs.

What are the four MTIA chip generations Meta announced?

Meta announced MTIA 300 (in production now, handles recommendation ranking), MTIA 400 (testing complete, shipping soon, handles generative AI inference including image and video generation), MTIA 450 (2027, efficiency improvements), and MTIA 500 (2027, highest-complexity generative AI tasks). New chips ship approximately every six months, which is an unusually aggressive cadence for custom silicon.

Why is Meta building its own chips instead of buying Nvidia?

Custom silicon optimised for Meta's specific inference workloads delivers better performance per dollar than general-purpose Nvidia GPUs. At 3.3 billion daily active users, even small efficiency gains per inference query translate to hundreds of millions in annual cost savings. Meta also gains negotiating leverage against Nvidia and supply chain independence. The MTIA strategy covers inference; the separate $100B AMD GPU deal addresses training diversification.

Can developers use Meta's MTIA chips?

No. MTIA chips are internal Meta infrastructure and are not offered as an external cloud product. Developer implications are indirect — Meta AI products running on MTIA will be faster and cheaper to operate, which may reduce API pricing if Meta opens broader API access. The RISC-V architecture choice may lead to LLaMA model optimisations for RISC-V inference targets, benefiting the broader open RISC-V hardware ecosystem.

How does MTIA compare to what Google and Amazon are building?

Google's TPU has a decade head start — Google started TPU development in 2013 and now runs the majority of its AI compute internally on TPUs. Amazon has Trainium for training and Inferentia for inference. Meta's MTIA is catching up fast with a six-month release cadence. By MTIA 500 in 2027, Meta will likely handle the majority of its inference compute on custom silicon rather than Nvidia GPUs — following the same trajectory Google completed around 2020.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

Free Tool

Will AI replace your job?

4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.

Check Your AI Risk Score →

Written by

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 941+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.