BUILD WITH AI · NEW DELHI 2026 THE TALK — 01 / 32
ON-DEVICE AI WITH FLUTTER & GEMMA

No Server. No Bills. No Internet.  No Problem.

Akansha Jain
Akansha Jain
ON-DEVICE AI · FLUTTER · GEMMA APR 2026
ACT I OF VII 02 / 32
CHAPTER ONE

The
Bill Shock.

Monday you shipped. Tuesday they loved it.
Wednesday the invoice arrived.

FIG. 01 — OBSERVED IN THE WILD
INVOICE · CLOUD AI Inference requests 14,221,000 Input tokens 812M Output tokens 501M $47,293 DUE IMMEDIATELY "But the app is free..." oh no
THE HOOK PAIN → SOLUTION
PULLQUOTE · 10 SEC 03 / 32

You built an AI feature.
You shipped it.
Now life happens.

— AKANSHA JAIN THE HOOK
THREE FAILURE MODES 04 / 32
THE CLOUD TAX · IN THREE PARTS

Three things that go wrong.

01
ERR:BILL_SHOCK

The cloud tax.

10,000 users × 50 queries a day × per-token pricing.

Your invoice arrives. Founder shakes. CFO has questions.

02
ERR:NO_NETWORK

The network gap.

Delhi metro tunnel. Flight mode. A village with 2G.

Your app just stops being smart. Users notice.

03
ERR:PRIVACY

The data trail.

Health records. Bank statements. Private photos.

Every request travels to someone else's server. Hopefully encrypted.

THE HOOK · 45s RAISE YOUR HAND IF…
THE PIVOT 05 / 32
THE QUESTION NOBODY ASKED

What if the AI just lived
inside your app?

HOLD FOR 3 SECONDS — THE HOOK ENDS HERE —
AUDIENCE CHECK 06 / 32

This talk is for you if…

  • 01 You build with Flutter and you're AI-curious.
  • 02 You've used Gemini / ChatGPT APIs and wondered — "must this all be cloud?"
  • 03 You care about privacy, cost, or the next billion users.
  • 04 You want to ship something.
ACT I · THE HOOK 2:00 / 28:00
ACT II OF VII 07 / 32
CHAPTER TWO

Meet the
Family.

Google has two AI model families built from the same research.

Gemini lives in the cloud — powerful, proprietary, pay-per-request. Gemma lives on your device — open, free, yours to run.

FIG. 02 — TWO SIBLINGS
GEMINI big sibling · cloud GEMMA in your pocket same research · different lives
ACT II · MEET THE FAMILY→ 5 MIN
COMPARISON · GOOGLE'S TWO AI FAMILIES 08 / 32
SAME CHEF · TWO KITCHENS

Same research. Different lives.

CLOUD · PROPRIETARY

Gemini

The five-star restaurant

  • Hundreds of billions of parameters
  • API key + network required
  • Pay per request
  • Always up-to-date
  • Needs internet connection
vs.
ON-DEVICE · OPEN SOURCE

Gemma

A recipe in your pocket.

  • 270M to 31B parameters
  • Downloaded once, yours forever
  • Zero cost per inference
  • Works 100% offline
  • Apache 2.0 license
RIGHT BRAIN FOR THE RIGHT TASK 60s
TIMELINE · TWO YEARS · FOUR GENERATIONS 09 / 32
EVOLUTION

Gemma has grown up fast.

The smallest now fits on a phone.

FEB 2024
Gemma 1
Text only.
JUN 2024
Gemma 2
Better reasoning.
EARLY 2025
Gemma 3
+ Vision.
MID 2025
Gemma 3n
+ Audio. Mobile-tuned.
APR 2026 · NEW
Gemma 4
Frontier-grade on your phone.
TODAY'S DEMO ↓
Gemma 3n E2B
Multimodal · ~2B effective params · runs offline · ~3 GB download.
ACT II30s
VOCAB · THE ONLY FIVE WORDS YOU NEED 10 / 32
GLOSSARY

Five words. That's it.

Everything else is built on these.

Parameters.
Numbers inside the model. A radio with 2 billion knobs — each tuned during training.
Tokens.
Pieces of text. Not letters, not words — somewhere between. 100 English words ≈ 150 tokens.
Context.
The model's working memory. 256K on paper — realistically 2–8K on a phone.
Quantization.
JPEG, but for models. int4 shrinks a 2B-param model to ~1.5 GB. Barely hurts quality.
Inference.
Running the model. Input in, output out. The whole "AI thinking" thing — it's just math on a chip.
LOCK IN THE VOCAB90s
THE BIG NUMBER MOMENT 11 / 32
THIS WASN'T POSSIBLE THREE YEARS AGO.
2B

parameters. Fitting on a phone.
Running in 200 milliseconds.
Understanding text, images & audio —
in 140+ languages.

WALK DOWNSTAGE · HOLD 3 SECONDS END OF ACT II
ACT III OF VII 12 / 32
CHAPTER THREE

Under the
Hood.

If we're putting 2 billion parameters on a phone, we ought to know the machinery.

FIG. 03 — THE STACK
YOUR FLUTTER APP you write this ↑ flutter_gemma (pub.dev) MediaPipe · LiteRT-LM GPU / NPU DELEGATES PHONE SILICON Neural Engine · Hexagon · Mali five layers · you write one
ACT III · UNDER THE HOOD→ 4 MIN
ARCHITECTURE · FIVE LAYERS · YOU WRITE ONE 13 / 32
THE WHOLE STACK

You only touch the top.

Everything else is already handled.

L1
Your Flutter Code
DART · WIDGETS
L2
flutter_gemma
PUB.DEV PACKAGE
L3
MediaPipe · LiteRT-LM
INFERENCE ENGINE
L4
GPU / NPU Delegates
HARDWARE ROUTING
L5
Phone Silicon
NEURAL ENGINE · HEXAGON · MALI
LiteRT-LM

The same engine powering AI in Chrome, Chromebook Plus, Pixel Watch.

Delegates

Auto-routes math to the fastest hardware: NPU → GPU → CPU.

You

Write Dart. That's it.

ACT III90s
THE PAYOFF · FOUR CONSTRAINTS · GONE AT ONCE 14 / 32
ENTIRE VALUE OF ON-DEVICE AI · ONE SLIDE

Four constraints. Gone at once.

01 · COMPUTE

No
Server.

The model lives in the app. Inference on phone silicon. Not someone's datacenter.

02 · COST

No
Bills.

Apache 2.0. Open weights. Download once, run forever. Zero cost per inference.

03 · CONNECTIVITY

No
Internet.

Delhi metro tunnel. Flight mode. Rural 2G. Works wherever the phone works.

04 · PRIVACY

No
Problem.

Data never leaves the device. Privacy by architecture — not by policy.

THE THESIS45s
TRANSITION 15 / 32
ENOUGH THEORY

Let me show
you.

01 TEXT CHAT · 02 OFFLINE · 03 MULTIMODAL
WALK CENTER STAGE10 SEC
● REC · DEMO 01 · TEXT INFERENCE 16 / 32
DEMO 01

Text chat.
Streaming.

Home screen. Tap Get Started.
Type a prompt. Watch tokens stream in.
Not my laptop. Not a server. This phone.

WARM-UP
~10 s
RESPONSE
~3 s
NARRATE LIVE · POINT AT SCREEN90s
● REC · DEMO 02 · AIRPLANE MODE 17 / 32
DEMO 02 · THE MOMENT

Airplane mode.
Same speed.

Swipe down. Tap the airplane icon. Wi-Fi gone. Cellular gone.
Now ask for a haiku about privacy.
No internet. No server. No cost.

This is the whole pitch of this talk — right here.

PAUSE 2s AFTER RESPONSE · LET IT LAND60s
● REC · DEMO 03 · MULTIMODAL VISION 18 / 32
DEMO 03

The phone
can see.

Tap the image icon. Pick a photo.
Ask: What do you see? Be detailed.
Scene, colors, mood — described.

Multimodal. On-device. Offline.
Gemma 3n E2B running vision on a 6 GB phone.

NARRATE AS IT STREAMS75s
END OF DEMO · IN ONE SENTENCE 19 / 32

Three clips.
One phone.
No internet. No bills.
No data leaving your device.

— the whole talk in three minutes of video.

ACT IV ENDS30s
ACT V OF VII 20 / 32
CHAPTER FIVE

Honest
Engineering.

Every talk shows you the happy path.
Here's the part other talks skip.

FIG. 04 — REAL TRADEOFFS
CLOUD AI ON-DEVICE real tradeoffs · both sides
ACT V · HONEST TRUTH→ 4 MIN
LEARNED THE HARD WAY · BUILDING THIS DEMO 21 / 32
A STORY FROM BUILDING THIS

I wanted live camera.
The app crashed.
Every time.

Gemma 3n in RAM
~3.0 GB
Camera pipeline
~1.5 GB
Android OS & UI
~1.5 GB
Total on a 6 GB phone
→ OOM. KILLED.

So I switched to gallery picker. Which is what you just saw.

On-device AI makes you think about memory again.
Like it's 2005.

— THE LESSON
6 GB RAM
Tight.

Architect carefully. Gallery over live camera.

8 GB RAM
Fine.

Standard flows work without heroics.

12 GB RAM
Smooth — live camera is in play.
SAY IT PERSONAL · SLOW DOWN90s
DECISION TABLE · GEMMA VS GEMINI 22 / 32

On-device isn't always the right answer.

Choose the right model for the task.

FACTOR
ON-DEVICE (GEMMA)
CLOUD (GEMINI)
Privacy
Data never leaves device
Data sent to cloud
Cost at scale
Zero per request
Pay per API call
Offline
Works in airplane mode
Needs internet
Latency
No network round-trip
Network delay
Raw power
Good for focused tasks
Frontier-level capability
Up-to-date knowledge
Frozen at download
Continuously updated
App size
Model adds ~1–2 GB
Small app binary
ACT V60s
THIS ISN'T HYPOTHETICAL · REAL PRODUCTS TODAY 23 / 32
WHAT YOU CAN BUILD WITH GEMMA

Four things you can ship.

01 · SUMMARIZE

Condense long content, locally.

Like Pixel Recorder and Samsung Note Assist.

02 · REWRITE

Proofread & polish on-device.

Like Apple Writing Tools and Gboard Magic Compose.

03 · CLASSIFY LIVE

Act on streams as they arrive.

Like Pixel Scam Detection and Live Caption.

04 · TRANSLATE

Bridge languages without a network.

Like Google Translate offline packs.

END OF ACT V45s
ACT VI · YOUR TURN · THIRTY MINUTES TO RUNNING 24 / 32
THE FAST PATH

Six steps. Thirty minutes. Running on your phone.

01
Create the project.
$ flutter create app
02
Add the package.
$ flutter pub add flutter_gemma
03
HF token in .env.

30 seconds on huggingface.co — for model access.

04
Bump platform SDKs.

Android minSdk 26 (2017). iOS 16+ (2022). Covers every modern phone.

05
Run. First-time ≈ 3 GB download.

5–15 min over Wi-Fi. Show a progress UI.

06
Chat. Ship. Show your friends.

You are talking to an offline AI. Start building the product.

ACT VI · YOUR TURN60s
CODE 01 · INITIALIZE GEMMA 25 / 32
CODE 01

Initialize Gemma.

Load .env. Pass the Hugging Face token. Set a retry budget for flaky Wi-Fi. Run your app.

That's it — on-device AI is configured.

// main.dart
void main() async {
  await dotenv.load(fileName: ".env");

  FlutterGemma.initialize(
    huggingFaceToken: dotenv.env['HUGGINGFACE_TOKEN'],
    maxDownloadRetries: 20,
  );

  runApp(const GemmaDemoApp());
}
45sREAL CODE · SHIPS IN PROD
CODE 02 · STREAM RESPONSES 26 / 32
CODE 02

Stream responses.

Get the active model. Open a chat with temperature (creativity) and topK (focus). await for each token.

Ten lines of Dart. The whole API surface.

// chat_screen.dart
final model = await FlutterGemma.getActiveModel(
  maxTokens: 2048,
  preferredBackend: PreferredBackend.gpu,
  supportImage: true,
);

final chat = await model.createChat(
  temperature: 0.8,
  topK: 40,
);

await chat.addQueryChunk(
  Message.text(text: userInput, isUser: true),
);

await for (final response in chat.generateChatResponseAsync()) {
  if (response is TextResponse) {
    setState(() => streamingText += response.token);
  }
}
45sSTREAMS LIKE CHATGPT
THE CODELAB · TWO-HOUR GUIDED BUILD 27 / 32
WANT THE FULL BUILD?

Build it yourself.

Two hours. Guided. Hands-on. Every line of the demo you just saw — with UI polish, error handling, and the why behind each decision.

Project setup & Hugging Face token
Chat UI with token streaming
Multimodal — text + image prompts
Model download UX & error recovery
Codelab QR code
SCAN TO START
akanshajain.dev/codelabs/on-device-ai
FREE · OPEN · YOURSTHE WHOLE BUILD
RESOURCES · EVERYTHING YOU NEED 28 / 32
SCAN · CLONE · STAR · SHIP

Resources.

DEMO CODE
Demo repo QR
FLUTTER_GEMMA
flutter_gemma package QR
The package itself.
pub.dev/packages/flutter_gemma

Clone. Star. Ship.

END OF ACT VI30s
THE CALLBACK · WE PROVED ALL FOUR 29 / 32
WE OPENED WITH FOUR NOS —

Four checks. Thesis closed.

01

No
Server.

The model lives in the app.

02

No
Bills.

Apache 2.0. Zero per inference.

03

No
Internet.

You watched airplane mode work.

04

No
Problem.

Data never leaves the device.

That's the thesis. And you can ship it tonight.

ACT VII · PAYOFF45s
THE ONE SENTENCE · IF YOU REMEMBER ONE THING 30 / 32

On-device AI isn't replacing cloud AI.

It's unlocking a different class of app — one that's private, offline-capable, and free to scale.

STAND STILL · MIC DROP15s
THANK YOU · DELHI 31 / 32
Akansha Jain

Thankyou.

Akansha Jain
Senior Software Engineer — Autonation Inc.
Happy to chat about Flutter, Gemma, or anything you're building.
akanshajain.dev QR
LET'S CONNECT
akanshajain.dev

DMs open.

FEW MINUTES FOR QUESTIONS →15s
Q & A · FIVE MINUTES 32 / 32
OVER TO YOU

Questions?

Or DM me after — I read every message.
PLANT 2–3 QUESTIONS JUST IN CASE5:00