BUILD WITH AI · NEW DELHI 2026
THE TALK — 01 / 32
ON-DEVICE AI WITH FLUTTER & GEMMA
No Server.
No Bills.
No Internet.
No Problem.
Akansha Jain
SR. SWE - AUTONATION INC.
ORGANISER - FLUTTER DELHI · FFDG NEW DELHI
ON-DEVICE AI · FLUTTER · GEMMA
APR 2026
ACT I OF VII
02 / 32
CHAPTER ONE
The
Bill Shock.
Monday you shipped. Tuesday they loved it.
Wednesday the invoice arrived.
FIG. 01 — OBSERVED IN THE WILD
THE HOOK
PAIN → SOLUTION
PULLQUOTE · 10 SEC
03 / 32
“
”
You built an AI feature.
You shipped it.
Now life happens.
— AKANSHA JAIN
THE HOOK
THREE FAILURE MODES
04 / 32
THE CLOUD TAX · IN THREE PARTS
Three things that go wrong.
The cloud tax.
10,000 users × 50 queries a day × per-token pricing.
Your invoice arrives. Founder shakes. CFO has questions.
The network gap.
Delhi metro tunnel. Flight mode. A village with 2G.
Your app just stops being smart. Users notice.
The data trail.
Health records. Bank statements. Private photos.
Every request travels to someone else's server. Hopefully encrypted.
THE HOOK · 45s
RAISE YOUR HAND IF…
THE PIVOT
05 / 32
THE QUESTION NOBODY ASKED
What if the AI just lived
inside your app?
HOLD FOR 3 SECONDS
— THE HOOK ENDS HERE —
AUDIENCE CHECK
06 / 32
This talk is for you if…
-
01
You build with Flutter and you're AI-curious.
-
02
You've used Gemini / ChatGPT APIs and wondered — "must this all be cloud?"
-
03
You care about privacy, cost, or the next billion users.
-
04
You want to ship something.
ACT I · THE HOOK
2:00 / 28:00
ACT II OF VII
07 / 32
CHAPTER TWO
Meet the
Family.
Google has two AI model families built from the same research.
Gemini lives in the cloud — powerful, proprietary, pay-per-request. Gemma lives on your device — open, free, yours to run.
FIG. 02 — TWO SIBLINGS
ACT II · MEET THE FAMILY→ 5 MIN
COMPARISON · GOOGLE'S TWO AI FAMILIES
08 / 32
SAME CHEF · TWO KITCHENS
Same research. Different lives.
CLOUD · PROPRIETARY
Gemini
The five-star restaurant
- →Hundreds of billions of parameters
- →API key + network required
- →Pay per request
- →Always up-to-date
- →Needs internet connection
ON-DEVICE · OPEN SOURCE
Gemma
A recipe in your pocket.
- →270M to 31B parameters
- →Downloaded once, yours forever
- →Zero cost per inference
- →Works 100% offline
- →Apache 2.0 license
RIGHT BRAIN FOR THE RIGHT TASK
60s
TIMELINE · TWO YEARS · FOUR GENERATIONS
09 / 32
EVOLUTION
Gemma has grown up fast.
The smallest now fits on a phone.
FEB 2024
Gemma 1
Text only.
JUN 2024
Gemma 2
Better reasoning.
EARLY 2025
Gemma 3
+ Vision.
MID 2025
Gemma 3n
+ Audio. Mobile-tuned.
Gemma 4
Frontier-grade on your phone.
TODAY'S DEMO ↓
Gemma 3n E2B
Multimodal · ~2B effective params · runs offline · ~3 GB download.
ACT II30s
VOCAB · THE ONLY FIVE WORDS YOU NEED
10 / 32
GLOSSARY
Five words. That's it.
Everything else is built on these.
Parameters.
Numbers inside the model. A radio with 2 billion knobs — each tuned during training.
Tokens.
Pieces of text. Not letters, not words — somewhere between. 100 English words ≈ 150 tokens.
Context.
The model's working memory. 256K on paper — realistically 2–8K on a phone.
Quantization.
JPEG, but for models. int4 shrinks a 2B-param model to ~1.5 GB. Barely hurts quality.
Inference.
Running the model. Input in, output out. The whole "AI thinking" thing — it's just math on a chip.
LOCK IN THE VOCAB90s
THE BIG NUMBER MOMENT
11 / 32
THIS WASN'T POSSIBLE THREE YEARS AGO.
2B
parameters. Fitting on a phone.
Running in 200 milliseconds.
Understanding text, images & audio —
in 140+ languages.
WALK DOWNSTAGE · HOLD 3 SECONDS
END OF ACT II
ACT III OF VII
12 / 32
CHAPTER THREE
Under the
Hood.
If we're putting 2 billion parameters on a phone, we ought to know the machinery.
FIG. 03 — THE STACK
ACT III · UNDER THE HOOD→ 4 MIN
ARCHITECTURE · FIVE LAYERS · YOU WRITE ONE
13 / 32
THE WHOLE STACK
You only touch the top.
Everything else is already handled.
L1
Your Flutter Code
DART · WIDGETS
L2
flutter_gemma
PUB.DEV PACKAGE
L3
MediaPipe · LiteRT-LM
INFERENCE ENGINE
L4
GPU / NPU Delegates
HARDWARE ROUTING
L5
Phone Silicon
NEURAL ENGINE · HEXAGON · MALI
LiteRT-LM
The same engine powering AI in Chrome, Chromebook Plus, Pixel Watch.
Delegates
Auto-routes math to the fastest hardware: NPU → GPU → CPU.
You
Write Dart. That's it.
ACT III90s
THE PAYOFF · FOUR CONSTRAINTS · GONE AT ONCE
14 / 32
ENTIRE VALUE OF ON-DEVICE AI · ONE SLIDE
Four constraints. Gone at once.
01 · COMPUTE
No
Server.
The model lives in the app. Inference on phone silicon. Not someone's datacenter.
02 · COST
No
Bills.
Apache 2.0. Open weights. Download once, run forever. Zero cost per inference.
03 · CONNECTIVITY
No
Internet.
Delhi metro tunnel. Flight mode. Rural 2G. Works wherever the phone works.
04 · PRIVACY
No
Problem.
Data never leaves the device. Privacy by architecture — not by policy.
THE THESIS45s
TRANSITION
15 / 32
ENOUGH THEORY
Let me show
you.
01 TEXT CHAT
·
02 OFFLINE
·
03 MULTIMODAL
WALK CENTER STAGE10 SEC
● REC · DEMO 01 · TEXT INFERENCE
16 / 32
DEMO 01
Text chat.
Streaming.
Home screen. Tap Get Started.
Type a prompt. Watch tokens stream in.
Not my laptop. Not a server. This phone.
NARRATE LIVE · POINT AT SCREEN90s
● REC · DEMO 02 · AIRPLANE MODE
17 / 32
✈
DEMO 02 · THE MOMENT
Airplane mode.
Same speed.
Swipe down. Tap the airplane icon. Wi-Fi gone. Cellular gone.
Now ask for a haiku about privacy.
No internet. No server. No cost.
This is the whole pitch of this talk — right here.
PAUSE 2s AFTER RESPONSE · LET IT LAND60s
● REC · DEMO 03 · MULTIMODAL VISION
18 / 32
DEMO 03
The phone
can see.
Tap the image icon. Pick a photo.
Ask: What do you see? Be detailed.
Scene, colors, mood — described.
Multimodal. On-device. Offline.
Gemma 3n E2B running vision on a 6 GB phone.
NARRATE AS IT STREAMS75s
END OF DEMO · IN ONE SENTENCE
19 / 32
Three clips.
One phone.
No internet. No bills.
No data leaving your device.
— the whole talk in three minutes of video.
ACT IV ENDS30s
ACT V OF VII
20 / 32
CHAPTER FIVE
Honest
Engineering.
Every talk shows you the happy path.
Here's the part other talks skip.
FIG. 04 — REAL TRADEOFFS
ACT V · HONEST TRUTH→ 4 MIN
LEARNED THE HARD WAY · BUILDING THIS DEMO
21 / 32
A STORY FROM BUILDING THIS
I wanted live camera.
The app crashed.
Every time.
Total on a 6 GB phone
→ OOM. KILLED.
So I switched to gallery picker. Which is what you just saw.
On-device AI makes you think about memory again.
Like it's 2005.
— THE LESSON
6 GB RAM
Tight.
Architect carefully. Gallery over live camera.
8 GB RAM
Fine.
Standard flows work without heroics.
12 GB RAM
Smooth — live camera is in play.
SAY IT PERSONAL · SLOW DOWN90s
DECISION TABLE · GEMMA VS GEMINI
22 / 32
On-device isn't always the right answer.
Choose the right model for the task.
FACTOR
ON-DEVICE (GEMMA)
CLOUD (GEMINI)
Privacy
✓
Data never leaves device
Data sent to cloud
Cost at scale
✓
Zero per request
Pay per API call
Offline
✓
Works in airplane mode
Needs internet
Latency
✓
No network round-trip
Network delay
Raw power
Good for focused tasks
✓
Frontier-level capability
Up-to-date knowledge
Frozen at download
✓
Continuously updated
App size
Model adds ~1–2 GB
✓
Small app binary
ACT V60s
THIS ISN'T HYPOTHETICAL · REAL PRODUCTS TODAY
23 / 32
WHAT YOU CAN BUILD WITH GEMMA
Four things you can ship.
01 · SUMMARIZE
Condense long content, locally.
Like Pixel Recorder and Samsung Note Assist.
02 · REWRITE
Proofread & polish on-device.
Like Apple Writing Tools and Gboard Magic Compose.
03 · CLASSIFY LIVE
Act on streams as they arrive.
Like Pixel Scam Detection and Live Caption.
04 · TRANSLATE
Bridge languages without a network.
Like Google Translate offline packs.
END OF ACT V45s
ACT VI · YOUR TURN · THIRTY MINUTES TO RUNNING
24 / 32
THE FAST PATH
Six steps. Thirty minutes. Running on your phone.
01
Create the project.
$ flutter create app
02
Add the package.
$ flutter pub add flutter_gemma
03
HF token in .env.
30 seconds on huggingface.co — for model access.
04
Bump platform SDKs.
Android minSdk 26 (2017). iOS 16+ (2022). Covers every modern phone.
05
Run. First-time ≈ 3 GB download.
5–15 min over Wi-Fi. Show a progress UI.
06
Chat. Ship. Show your friends.
You are talking to an offline AI. Start building the product.
ACT VI · YOUR TURN60s
CODE 01 · INITIALIZE GEMMA
25 / 32
CODE 01
Initialize Gemma.
Load .env. Pass the Hugging Face token. Set a retry budget for flaky Wi-Fi. Run your app.
That's it — on-device AI is configured.
// main.dart
void main() async {
await dotenv.load(fileName: ".env");
FlutterGemma.initialize(
huggingFaceToken: dotenv.env['HUGGINGFACE_TOKEN'],
maxDownloadRetries: 20,
);
runApp(const GemmaDemoApp());
}
45sREAL CODE · SHIPS IN PROD
CODE 02 · STREAM RESPONSES
26 / 32
CODE 02
Stream responses.
Get the active model. Open a chat with temperature (creativity) and topK (focus). await for each token.
Ten lines of Dart. The whole API surface.
// chat_screen.dart
final model = await FlutterGemma.getActiveModel(
maxTokens: 2048,
preferredBackend: PreferredBackend.gpu,
supportImage: true,
);
final chat = await model.createChat(
temperature: 0.8,
topK: 40,
);
await chat.addQueryChunk(
Message.text(text: userInput, isUser: true),
);
await for (final response in chat.generateChatResponseAsync()) {
if (response is TextResponse) {
setState(() => streamingText += response.token);
}
}
45sSTREAMS LIKE CHATGPT
THE CODELAB · TWO-HOUR GUIDED BUILD
27 / 32
WANT THE FULL BUILD?
Build it yourself.
Two hours. Guided. Hands-on. Every line of the demo you just saw — with UI polish, error handling, and the why behind each decision.
✓
Project setup & Hugging Face token
✓
Chat UI with token streaming
✓
Multimodal — text + image prompts
✓
Model download UX & error recovery
FREE · OPEN · YOURSTHE WHOLE BUILD
RESOURCES · EVERYTHING YOU NEED
28 / 32
SCAN · CLONE · STAR · SHIP
Resources.
Clone. Star. Ship.
END OF ACT VI30s
THE CALLBACK · WE PROVED ALL FOUR
29 / 32
WE OPENED WITH FOUR NOS —
Four checks. Thesis closed.
✓
01
No
Server.
The model lives in the app.
✓
02
No
Bills.
Apache 2.0. Zero per inference.
✓
03
No
Internet.
You watched airplane mode work.
✓
04
No
Problem.
Data never leaves the device.
That's the thesis. And you can ship it tonight.
ACT VII · PAYOFF45s
THE ONE SENTENCE · IF YOU REMEMBER ONE THING
30 / 32
On-device AI isn't replacing cloud AI.
It's unlocking a different class of app — one that's private, offline-capable, and free to scale.
STAND STILL · MIC DROP15s
THANK YOU · DELHI
31 / 32
Thankyou.
Akansha Jain
Senior Software Engineer — Autonation Inc.
Happy to chat about Flutter, Gemma, or anything you're building.
FEW MINUTES FOR QUESTIONS →15s
Q & A · FIVE MINUTES
32 / 32
OVER TO YOU
Questions?
Or DM me after — I read every message.
PLANT 2–3 QUESTIONS JUST IN CASE5:00