AI Developer Tools Web Development Future

Apple Intelligence for Developers: What Runs On-Device, What Goes to the Cloud, and What APIs You Can Actually Use

Abhishek GautamMarch 1, 202610 min read

Apple Intelligence for Developers: What Runs On-Device, What Goes to the Cloud, and What APIs You Can Actually Use

Quick summary

Apple Intelligence is not one system — it is a pipeline. Some requests never leave your phone. Others go to Private Cloud Compute. A few go to OpenAI. Here is how the routing works, what Core ML and the Writing Tools APIs expose to developers, and what it means if you build iOS apps.

The Three-Tier Routing Model

Apple Intelligence does not simply "run on your device." It routes requests across three tiers based on complexity:

Tier 1: On-device (Apple Silicon Neural Engine)

Simple requests that fit within the 3B parameter on-device model run entirely locally. Writing summaries, notification prioritisation, basic text rewriting, and Smart Reply suggestions never leave the phone. Apple's on-device model is quantised to run efficiently on the Neural Engine in A17 Pro, A18, and M-series chips.

Tier 2: Private Cloud Compute (PCC)

More complex requests that exceed on-device capability are sent to Apple's Private Cloud Compute servers — hardware running stripped-down versions of Apple Silicon in Apple's own data centres. Apple's key claim here: PCC nodes do not retain request data after processing, Apple itself cannot inspect requests, and code running on PCC nodes is publicly auditable via Apple's Security Research Device programme.

Tier 3: OpenAI (ChatGPT integration)

For requests that require broad world knowledge — "write me an essay about the Roman Empire", "plan my trip to Tokyo" — Apple routes to ChatGPT with explicit user consent. This is optional and requires a separate confirmation before any data leaves Apple's infrastructure.

What This Means in Practice

When you dictate a summary of a long document, you are almost certainly staying on-device. When you ask Siri to write a 500-word email from scratch, you are probably hitting PCC. When you ask a general knowledge question through Siri with ChatGPT enabled, you are talking to OpenAI.

The routing decision is not transparent to the user by default, which is a legitimate privacy concern. Apple says it shows an indicator when PCC or ChatGPT is used, but in practice it is easy to miss.

Core ML: The Developer Layer

For developers, Apple Intelligence is largely a closed system — you cannot call the on-device foundation model directly via a public API. What you can do:

Core ML (available to all developers)

Core ML lets you bundle and run your own machine learning models on-device. You can convert models from PyTorch or TensorFlow using the coremltools Python library. The models run on the Neural Engine with no network call required. This is not Apple Intelligence's model — it is your own model, running locally.

Practical uses: image classification, text classification, on-device embeddings, custom NLP tasks.

Natural Language framework

Apple's NL framework (part of Core ML) provides tokenisation, language identification, named entity recognition, sentiment analysis, and embedding lookup — all on-device. Good for search, recommendation, and annotation use cases.

Writing Tools API (iOS 18+)

If your app hosts user-generated text content (notes, emails, messages), you can opt into Apple's Writing Tools panel — the rewrite/proofread/summarise overlay. Users get it in your app automatically if you use standard UITextView or WKWebView. You can customise which tools appear and intercept the results.

App Intents + Siri

The most developer-accessible part of Apple Intelligence for third-party apps is App Intents. By defining App Intents, your app's actions become available to Siri, Spotlight, and the new on-device Siri that understands screen context. This is how Siri can "send a message via App X" or "open recipe Y in App Z" — the developer exposes the intent and parameters; Siri handles the natural language parsing.

What You Cannot Do (Yet)

Apple has not opened the on-device foundation model as a general-purpose API the way Google has with Gemini Nano's AICore. You cannot prompt Apple's 3B model directly from your app, pass it arbitrary text, or get embeddings from it. This is a deliberate choice — Apple controls the model to maintain quality and privacy guarantees.

There is speculation in the developer community that a more general local-inference API will come, particularly as on-device model capability grows. For now, if you need general-purpose on-device LLM inference in an iOS app, your options are:

Bundle a small open-source model via Core ML (Mistral 7B quantised runs on iPhone 15 Pro with acceptable latency)
Use a third-party SDK (llama.cpp has iOS bindings)
Call an external API (and accept the privacy and latency trade-offs)

The Privacy Architecture Is Genuinely Novel

Whatever you think of Apple's product strategy, the Private Cloud Compute architecture is technically serious. The combination of hardware attestation, no persistent storage of requests, and public code auditability is a meaningful attempt to solve the "I do not want my data on someone's server" problem for cloud inference.

It does not solve everything — you are still trusting Apple's attestation claims. But compared to sending requests to OpenAI's standard API or Google's cloud, it is architecturally different in ways that matter for privacy-sensitive applications.

What Developers Should Take Away

App Intents are the highest-leverage integration point right now. If your app does anything actionable, define App Intents. Siri's on-screen awareness is expanding with each iOS release.
Core ML is mature and underused. On-device model inference is fast enough for many real-world tasks. If you have a classification or NLP task and are currently calling an external API, benchmark a Core ML model first.
The foundation model API will open eventually. Apple needs developers to build differentiated Apple Intelligence features. A more direct API access is likely in a future WWDC announcement.
Privacy is a differentiator you can use. Especially for enterprise and healthcare apps, being able to say "this AI processing runs on-device via Core ML, nothing leaves the phone" is a genuine competitive advantage.

FAQ

Frequently Asked Questions

Does Apple Intelligence send data to the cloud?

It depends on the request. Simple tasks (writing summaries, notification prioritisation) run entirely on-device. More complex requests go to Apple's Private Cloud Compute servers, which Apple claims do not retain data. Requests requiring broad knowledge (if you opt in) go to ChatGPT. Apple says it shows an indicator when cloud processing is used.

Can developers access Apple's on-device AI model directly?

Not directly via a public API. Apple has not exposed the on-device foundation model for general developer use. You can use Core ML to run your own models on-device, use the Natural Language framework for text tasks, define App Intents for Siri integration, and opt into Writing Tools for text editing apps.

What is Apple Private Cloud Compute?

Private Cloud Compute (PCC) is Apple's server infrastructure for handling Apple Intelligence requests that exceed on-device capability. It runs on Apple Silicon hardware, Apple claims it does not retain request data after processing, and the software running on PCC nodes is publicly auditable through Apple's Security Research Device programme.

How do I add AI features to my iOS app?

The main options are: App Intents (expose your app's actions to Siri), Core ML (run your own models on-device), the Natural Language framework (on-device NLP tasks like classification and entity recognition), and Writing Tools (opt-in text rewriting panel for text-heavy apps). General-purpose LLM access requires either bundling an open-source model via Core ML or calling an external API.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

More on AI

All posts →

AIWeb Development

Will AI Replace Developers in 2026? 55,000 Job Cuts Cited AI Last Year. Here's What the Data Actually Shows.

Get your personalised AI risk score in 4 questions (free). Plus: will AI replace developers in 2026? What's actually happening to dev jobs and what to do next.

Feb 24, 2026·8 min read

AICareer

How to Future-Proof Your Career Against AI: The 2026 Playbook

Not vague advice about "staying curious". A specific, actionable plan for how to make your skills more valuable in a world where AI handles more and more work. For developers, engineers, and knowledge workers.

Feb 25, 2026·8 min read

AICareer

Will AI Replace Backend Developers? The Honest Answer Is More Specific Than You Think.

Backend development is not one job — it is ten different jobs with the same title. AI is replacing some of them fast and nowhere near replacing others. Here is a precise breakdown of which backend skills are under pressure in 2026 and which ones are not.

Feb 25, 2026·8 min read

AIDeveloper Tools

The Agentic Coding Era Has Started. Most Developers Haven't Noticed Yet.

AI coding tools have moved from autocomplete to agents that run entire workflows autonomously. GPT-5.3-Codex scores 56% on real-world software issues. Claude Code is live. Xcode now supports agentic backends. Here is what this shift actually means for how you work.

Feb 26, 2026·9 min read

Free Tool

What should your project cost?

Get honest 2026 price ranges for any project type — website, SaaS, MVP, or e-commerce. No fluff.

Try the Website Cost Calculator →

Free Tool

Will AI replace your job?

4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.

Check Your AI Risk Score →

ShareX / Twitter LinkedIn Instagram

Written by

Abhishek Gautam

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 941+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.

LinkedIn Instagram GitHub Portfolio Leave a thought →