Apple Intelligence for Developers: What Runs On-Device, What Goes to the Cloud, and What APIs You Can Actually Use
Quick summary
Apple Intelligence is not one system — it is a pipeline. Some requests never leave your phone. Others go to Private Cloud Compute. A few go to OpenAI. Here is how the routing works, what Core ML and the Writing Tools APIs expose to developers, and what it means if you build iOS apps.
Apple Intelligence shipped with iOS 18.1 and has been quietly expanding ever since. If you build iOS apps — or you are just trying to understand what Apple is actually doing with AI on your phone — the architecture is more interesting (and more nuanced) than the marketing suggests.
The Three-Tier Routing Model
Apple Intelligence does not simply "run on your device." It routes requests across three tiers based on complexity:
Tier 1: On-device (Apple Silicon Neural Engine)
Simple requests that fit within the 3B parameter on-device model run entirely locally. Writing summaries, notification prioritisation, basic text rewriting, and Smart Reply suggestions never leave the phone. Apple's on-device model is quantised to run efficiently on the Neural Engine in A17 Pro, A18, and M-series chips.
Tier 2: Private Cloud Compute (PCC)
More complex requests that exceed on-device capability are sent to Apple's Private Cloud Compute servers — hardware running stripped-down versions of Apple Silicon in Apple's own data centres. Apple's key claim here: PCC nodes do not retain request data after processing, Apple itself cannot inspect requests, and code running on PCC nodes is publicly auditable via Apple's Security Research Device programme.
Tier 3: OpenAI (ChatGPT integration)
For requests that require broad world knowledge — "write me an essay about the Roman Empire", "plan my trip to Tokyo" — Apple routes to ChatGPT with explicit user consent. This is optional and requires a separate confirmation before any data leaves Apple's infrastructure.
What This Means in Practice
When you dictate a summary of a long document, you are almost certainly staying on-device. When you ask Siri to write a 500-word email from scratch, you are probably hitting PCC. When you ask a general knowledge question through Siri with ChatGPT enabled, you are talking to OpenAI.
The routing decision is not transparent to the user by default, which is a legitimate privacy concern. Apple says it shows an indicator when PCC or ChatGPT is used, but in practice it is easy to miss.
Core ML: The Developer Layer
For developers, Apple Intelligence is largely a closed system — you cannot call the on-device foundation model directly via a public API. What you can do:
Core ML (available to all developers)
Core ML lets you bundle and run your own machine learning models on-device. You can convert models from PyTorch or TensorFlow using the coremltools Python library. The models run on the Neural Engine with no network call required. This is not Apple Intelligence's model — it is your own model, running locally.
Practical uses: image classification, text classification, on-device embeddings, custom NLP tasks.
Natural Language framework
Apple's NL framework (part of Core ML) provides tokenisation, language identification, named entity recognition, sentiment analysis, and embedding lookup — all on-device. Good for search, recommendation, and annotation use cases.
Writing Tools API (iOS 18+)
If your app hosts user-generated text content (notes, emails, messages), you can opt into Apple's Writing Tools panel — the rewrite/proofread/summarise overlay. Users get it in your app automatically if you use standard UITextView or WKWebView. You can customise which tools appear and intercept the results.
App Intents + Siri
The most developer-accessible part of Apple Intelligence for third-party apps is App Intents. By defining App Intents, your app's actions become available to Siri, Spotlight, and the new on-device Siri that understands screen context. This is how Siri can "send a message via App X" or "open recipe Y in App Z" — the developer exposes the intent and parameters; Siri handles the natural language parsing.
What You Cannot Do (Yet)
Apple has not opened the on-device foundation model as a general-purpose API the way Google has with Gemini Nano's AICore. You cannot prompt Apple's 3B model directly from your app, pass it arbitrary text, or get embeddings from it. This is a deliberate choice — Apple controls the model to maintain quality and privacy guarantees.
There is speculation in the developer community that a more general local-inference API will come, particularly as on-device model capability grows. For now, if you need general-purpose on-device LLM inference in an iOS app, your options are:
- Bundle a small open-source model via Core ML (Mistral 7B quantised runs on iPhone 15 Pro with acceptable latency)
- Use a third-party SDK (llama.cpp has iOS bindings)
- Call an external API (and accept the privacy and latency trade-offs)
The Privacy Architecture Is Genuinely Novel
Whatever you think of Apple's product strategy, the Private Cloud Compute architecture is technically serious. The combination of hardware attestation, no persistent storage of requests, and public code auditability is a meaningful attempt to solve the "I do not want my data on someone's server" problem for cloud inference.
It does not solve everything — you are still trusting Apple's attestation claims. But compared to sending requests to OpenAI's standard API or Google's cloud, it is architecturally different in ways that matter for privacy-sensitive applications.
What Developers Should Take Away
- App Intents are the highest-leverage integration point right now. If your app does anything actionable, define App Intents. Siri's on-screen awareness is expanding with each iOS release.
- Core ML is mature and underused. On-device model inference is fast enough for many real-world tasks. If you have a classification or NLP task and are currently calling an external API, benchmark a Core ML model first.
- The foundation model API will open eventually. Apple needs developers to build differentiated Apple Intelligence features. A more direct API access is likely in a future WWDC announcement.
- Privacy is a differentiator you can use. Especially for enterprise and healthcare apps, being able to say "this AI processing runs on-device via Core ML, nothing leaves the phone" is a genuine competitive advantage.
Free Tool
What should your project cost?
Get honest 2026 price ranges for any project type — website, SaaS, MVP, or e-commerce. No fluff.
Try the Website Cost Calculator →Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Abhishek Gautam
Full Stack Developer & Software Engineer based in Delhi, India. Building web applications and SaaS products with React, Next.js, Node.js, and TypeScript. 8+ projects deployed across 7+ countries.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
You might also like
Will AI Replace Developers in 2026? Companies Cited AI in 55,000 Job Cuts Last Year. Here Is the Real Answer.
Get your personalised AI risk score in 4 questions (free). Plus: will AI replace developers in 2026? What's actually happening to dev jobs and what to do next.
8 min read
How to Future-Proof Your Career Against AI: The 2026 Playbook
Not vague advice about "staying curious". A specific, actionable plan for how to make your skills more valuable in a world where AI handles more and more work. For developers, engineers, and knowledge workers.
8 min read
Will AI Replace Backend Developers? The Honest Answer Is More Specific Than You Think.
Backend development is not one job — it is ten different jobs with the same title. AI is replacing some of them fast and nowhere near replacing others. Here is a precise breakdown of which backend skills are under pressure in 2026 and which ones are not.
8 min read