Google Gemini Nano on Android: What Developers Can Actually Build With It in 2026
Quick summary
Gemini Nano is Google's on-device model for Pixel and Samsung Galaxy devices. Unlike Apple, Google has opened it via the Android AI Edge SDK. Here is what it can do, what it cannot, which devices support it, and where it actually makes sense in a real app.
Google took a different approach to on-device AI than Apple. Instead of keeping the foundation model closed, Google opened Gemini Nano to third-party developers through the Android AI Edge SDK. If you build Android apps and have not looked at this yet, here is what is actually available.
What Is Gemini Nano?
Gemini Nano is Google's smallest Gemini model — designed specifically to run on mobile hardware without a network connection. It shipped first on Pixel 8 Pro, then Pixel 9 series, and has been rolling out to Samsung Galaxy S24 and S25 series devices via Android's AICore system service.
It is not the same as the Gemini you access via the API or in the Gemini app. Those are larger cloud-hosted models (Gemini Pro, Gemini Ultra/1.5/2.0). Nano is smaller, quantised, and runs entirely on the device's NPU/GPU.
The Android AI Edge SDK
Google released the Android AI Edge SDK to give developers access to Gemini Nano and other on-device models via a consistent API. The key components:
Gemini Nano via AICore
AICore is a system-level service on supported Android devices that manages the Gemini Nano model. Your app requests inference through the Google Play Services AI APIs — you do not bundle the model yourself, AICore manages it. This keeps your APK small (the model is several GB; you do not want to ship that in your app).
The API is intentionally similar to the Gemini cloud API, so switching between on-device and cloud is mostly a model name swap.
MediaPipe LLM Inference
For more control — including running models other than Gemini Nano — MediaPipe's LLM Inference API lets you load any compatible model (Gemma 2B, Phi-2, Falcon, etc.) and run inference locally. You bundle the model file or download it on first run. This works on a wider range of devices since it does not depend on AICore.
What Gemini Nano Can Do Well
Based on developer testing and Google's own documentation, Gemini Nano performs well at:
- Summarisation: Condensing long articles, emails, or documents into key points. This is the flagship use case and works reliably.
- Reply suggestions: Generating short contextual replies to messages. Used in Gboard and Google Messages.
- Text rewriting and tone adjustment: Paraphrasing, simplifying, or changing the register of text.
- Simple classification and labelling: Categorising text into predefined categories, spam detection, sentiment.
- Proofreading: Grammar and spelling correction with explanations.
- Short generation tasks: Writing short descriptions, alt text, tags — tasks with well-bounded outputs.
What It Cannot Do (Honestly)
Nano is a small model. Compared to cloud-hosted Gemini or GPT-4 class models, it has real limitations:
- No internet access or real-time knowledge: Nano has a training cutoff and no retrieval capability.
- Weak on complex reasoning: Multi-step logic problems, code generation beyond simple snippets, and structured data extraction from long documents are unreliable.
- Context window is limited: Current on-device context is much smaller than cloud models — roughly 2K–4K tokens depending on device.
- Device support is narrow: AICore with Gemini Nano requires Pixel 6+, Pixel 8+, or Samsung Galaxy S24/S25. If your app targets a broad Android audience, you cannot assume Nano is available.
Practical Architecture Pattern
The sensible pattern for production apps is a hybrid with graceful fallback:
- Check availability first: Verify whether Nano is accessible on the current device before calling inference.
- Use Nano for latency-sensitive or offline tasks: Summaries, reply suggestions, local classification.
- Fall back to cloud API for complex tasks or unsupported devices: Same Gemini API, just swap the model endpoint.
- Never block the UI on inference: Nano inference takes 1–5 seconds depending on task length and device. Always run async, show a loading state.
Gemini Nano vs Apple on-Device AI: The Key Difference
Apple's approach: closed, high quality, consistent experience, zero developer access to the foundation model.
Google's approach: open API, more device fragmentation, less polished integration, but third-party developers can actually call it.
For pure user experience in Apple's apps, Apple's approach wins. For developers who want to build AI features into their own apps without routing everything through a cloud API, Google's approach is more useful right now.
Should You Use It?
Use Gemini Nano (via Android AI Edge SDK) if:
- Your app has summarisation, reply suggestion, or text rewriting features
- You need these features to work offline or with low latency
- Your primary audience is on recent Pixel or Samsung devices
- You want to avoid cloud API costs for lightweight inference tasks
Use the Gemini cloud API (or another provider) if:
- You need complex reasoning, code generation, or large context windows
- You need to support a wide range of Android devices
- Latency of a network call is acceptable
The on-device and cloud APIs are intentionally similar, so building with Gemini Nano first and falling back to cloud is a sensible architecture for most use cases.
Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Abhishek Gautam
Full Stack Developer & Software Engineer based in Delhi, India. Building web applications and SaaS products with React, Next.js, Node.js, and TypeScript. 8+ projects deployed across 7+ countries.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
You might also like
Apple Intelligence for Developers: What Runs On-Device, What Goes to the Cloud, and What APIs You Can Actually Use
Apple Intelligence is not one system — it is a pipeline. Some requests never leave your phone. Others go to Private Cloud Compute. A few go to OpenAI. Here is how the routing works, what Core ML and the Writing Tools APIs expose to developers, and what it means if you build iOS apps.
10 min read
What Is AGI? The Honest Explanation Nobody Else Will Give You
AGI — Artificial General Intelligence — is the most debated term in tech. Here is a plain English explanation of what it actually means, why experts disagree, how close we are, and what it would actually change.
9 min read
Will AI Replace Developers in 2026? Companies Cited AI in 55,000 Job Cuts Last Year. Here Is the Real Answer.
Get your personalised AI risk score in 4 questions (free). Plus: will AI replace developers in 2026? What's actually happening to dev jobs and what to do next.
8 min read