ON-DEVICE AI · FLUTTER · GEMMA THE TALK — 01 / 32

ON-DEVICE AI WITH FLUTTER & GEMMA

No Server. No Bills. No Internet. No Problem.

Akansha Jain

SR. SWE - AUTONATION INC.

ORGANISER - FLUTTER DELHI · FFDG NEW DELHI

ON-DEVICE AI · FLUTTER · GEMMA APR 2026

ACT I OF VII 02 / 32

CHAPTER ONE

The
Bill Shock.

Monday you shipped. Tuesday they loved it.
Wednesday the invoice arrived.

FIG. 01 — OBSERVED IN THE WILD

THE HOOK PAIN → SOLUTION

THREE FAILURE MODES 04 / 32

THE CLOUD TAX · IN THREE PARTS

Three things that go wrong.

ERR:BILL_SHOCK

The cloud tax.

10,000 users × 50 queries a day × per-token pricing.

Your invoice arrives. Founder shakes. CFO has questions.

ERR:NO_NETWORK

The network gap.

Delhi metro tunnel. Flight mode. A village with 2G.

Your app just stops being smart. Users notice.

ERR:PRIVACY

The data trail.

Health records. Bank statements. Private photos.

Every request travels to someone else's server. Hopefully encrypted.

THE HOOK · 45s RAISE YOUR HAND IF…

AUDIENCE CHECK 06 / 32

This talk is for you if…

01 You build with Flutter and you're AI-curious.
02 You've used Gemini / ChatGPT APIs and wondered — "must this all be cloud?"
03 You care about privacy, cost, or the next billion users.
04 You want to ship something.

ACT I · THE HOOK 2:00 / 28:00

ACT II OF VII 07 / 32

CHAPTER TWO

Meet the
Family.

Google has two AI model families built from the same research.

Gemini lives in the cloud — powerful, proprietary, pay-per-request. Gemma lives on your device — open, free, yours to run.

FIG. 02 — TWO SIBLINGS

ACT II · MEET THE FAMILY→ 5 MIN

COMPARISON · GOOGLE'S TWO AI FAMILIES 08 / 32

SAME CHEF · TWO KITCHENS

Same research. Different lives.

CLOUD · PROPRIETARY

Gemini

The five-star restaurant

→Hundreds of billions of parameters
→API key + network required
→Pay per request
→Always up-to-date
→Needs internet connection

vs.

ON-DEVICE · OPEN SOURCE

Gemma

A recipe in your pocket.

→270M to 31B parameters
→Downloaded once, yours forever
→Zero cost per inference
→Works 100% offline
→Apache 2.0 license

RIGHT BRAIN FOR THE RIGHT TASK 60s

TIMELINE · TWO YEARS · FOUR GENERATIONS 09 / 32

EVOLUTION

Gemma has grown up fast.

The smallest now fits on a phone.

FEB 2024

Gemma 1

Text only.

JUN 2024

Gemma 2

Better reasoning.

EARLY 2025

Gemma 3

+ Vision.

MID 2025

Gemma 3n

+ Audio. Mobile-tuned.

APR 2026 · NEW

Gemma 4

Frontier-grade on your phone.

TODAY'S DEMO ↓

Gemma 3n E2B

Multimodal · ~2B effective params · runs offline · ~3 GB download.

ACT II30s

VOCAB · THE ONLY FIVE WORDS YOU NEED 10 / 32

GLOSSARY

Five words. That's it.

Everything else is built on these.

Parameters.

Numbers inside the model. A radio with 2 billion knobs — each tuned during training.

Tokens.

Pieces of text. Not letters, not words — somewhere between. 100 English words ≈ 150 tokens.

Context.

The model's working memory. 256K on paper — realistically 2–8K on a phone.

Quantization.

JPEG, but for models. int4 shrinks a 2B-param model to ~1.5 GB. Barely hurts quality.

Inference.

Running the model. Input in, output out. The whole "AI thinking" thing — it's just math on a chip.

LOCK IN THE VOCAB90s

THE BIG NUMBER MOMENT 11 / 32

THIS WASN'T POSSIBLE THREE YEARS AGO.

parameters. Fitting on a phone.
Running in 200 milliseconds.
Understanding text, images & audio —
in 140+ languages.

WALK DOWNSTAGE · HOLD 3 SECONDS END OF ACT II

ACT III OF VII 12 / 32

CHAPTER THREE

Under the
Hood.

If we're putting 2 billion parameters on a phone, we ought to know the machinery.

FIG. 03 — THE STACK

ACT III · UNDER THE HOOD→ 4 MIN

ARCHITECTURE · FIVE LAYERS · YOU WRITE ONE 13 / 32

THE WHOLE STACK

You only touch the top.

Everything else is already handled.

Your Flutter Code

DART · WIDGETS

flutter_gemma

PUB.DEV PACKAGE

MediaPipe · LiteRT-LM

INFERENCE ENGINE

GPU / NPU Delegates

HARDWARE ROUTING

Phone Silicon

NEURAL ENGINE · HEXAGON · MALI

LiteRT-LM

The same engine powering AI in Chrome, Chromebook Plus, Pixel Watch.

Delegates

Auto-routes math to the fastest hardware: NPU → GPU → CPU.

You

Write Dart. That's it.

ACT III90s

THE PAYOFF · FOUR CONSTRAINTS · GONE AT ONCE 14 / 32

ENTIRE VALUE OF ON-DEVICE AI · ONE SLIDE

Four constraints. Gone at once.

01 · COMPUTE

No
Server.

The model lives in the app. Inference on phone silicon. Not someone's datacenter.

02 · COST

No
Bills.

Apache 2.0. Open weights. Download once, run forever. Zero cost per inference.

03 · CONNECTIVITY

No
Internet.

Delhi metro tunnel. Flight mode. Rural 2G. Works wherever the phone works.

04 · PRIVACY

No
Problem.

Data never leaves the device. Privacy by architecture — not by policy.

THE THESIS45s

● REC · DEMO 01 · TEXT INFERENCE 16 / 32

DEMO 01

Text chat.
Streaming.

Home screen. Tap Get Started.
Type a prompt. Watch tokens stream in.
Not my laptop. Not a server. This phone.

WARM-UP

~10 s

RESPONSE

~3 s

NARRATE LIVE · POINT AT SCREEN90s

● REC · DEMO 02 · AIRPLANE MODE 17 / 32

✈

DEMO 02 · THE MOMENT

Airplane mode.
Same speed.

Swipe down. Tap the airplane icon. Wi-Fi gone. Cellular gone.
Now ask for a haiku about privacy.
No internet. No server. No cost.

This is the whole pitch of this talk — right here.

PAUSE 2s AFTER RESPONSE · LET IT LAND60s

● REC · DEMO 03 · MULTIMODAL VISION 18 / 32

DEMO 03

The phone
can see.

Tap the image icon. Pick a photo.
Ask: What do you see? Be detailed.
Scene, colors, mood — described.

Multimodal. On-device. Offline.
Gemma 3n E2B running vision on a 6 GB phone.

NARRATE AS IT STREAMS75s

END OF DEMO · IN ONE SENTENCE 19 / 32

Three clips.
One phone.
No internet. No bills.
No data leaving your device.

— the whole talk in three minutes of video.

ACT IV ENDS30s

ACT V OF VII 20 / 32

CHAPTER FIVE

Honest
Engineering.

Every talk shows you the happy path.
Here's the part other talks skip.

FIG. 04 — REAL TRADEOFFS

ACT V · HONEST TRUTH→ 4 MIN

LEARNED THE HARD WAY · BUILDING THIS DEMO 21 / 32

A STORY FROM BUILDING THIS

I wanted live camera.
The app crashed.
Every time.

Gemma 3n in RAM

~3.0 GB

Camera pipeline

~1.5 GB

Android OS & UI

~1.5 GB

Total on a 6 GB phone

→ OOM. KILLED.

So I switched to gallery picker. Which is what you just saw.

On-device AI makes you think about memory again.
Like it's 2005.

— THE LESSON

6 GB RAM

Tight.

Architect carefully. Gallery over live camera.

8 GB RAM

Fine.

Standard flows work without heroics.

12 GB RAM

Smooth — live camera is in play.

SAY IT PERSONAL · SLOW DOWN90s

DECISION TABLE · GEMMA VS GEMINI 22 / 32

On-device isn't always the right answer.

Choose the right model for the task.

FACTOR

ON-DEVICE (GEMMA)

CLOUD (GEMINI)

Privacy

✓ Data never leaves device

Data sent to cloud

Cost at scale

✓ Zero per request

Pay per API call

Offline

✓ Works in airplane mode

Needs internet

Latency

✓ No network round-trip

Network delay

Raw power

Good for focused tasks

✓ Frontier-level capability

Up-to-date knowledge

Frozen at download

✓ Continuously updated

App size

Model adds ~1–2 GB

✓ Small app binary

ACT V60s

THIS ISN'T HYPOTHETICAL · REAL PRODUCTS TODAY 23 / 32

WHAT YOU CAN BUILD WITH GEMMA

Four things you can ship.

01 · SUMMARIZE

Condense long content, locally.

Like Pixel Recorder and Samsung Note Assist.

02 · REWRITE

Proofread & polish on-device.

Like Apple Writing Tools and Gboard Magic Compose.

03 · CLASSIFY LIVE

Act on streams as they arrive.

Like Pixel Scam Detection and Live Caption.

04 · TRANSLATE

Bridge languages without a network.

Like Google Translate offline packs.

END OF ACT V45s

ACT VI · YOUR TURN · THIRTY MINUTES TO RUNNING 24 / 32

THE FAST PATH

Six steps. Thirty minutes. Running on your phone.

Create the project.

$ flutter create app

Add the package.

$ flutter pub add flutter_gemma

HF token in .env.

30 seconds on huggingface.co — for model access.

Bump platform SDKs.

Android minSdk 26 (2017). iOS 16+ (2022). Covers every modern phone.

Run. First-time ≈ 3 GB download.

5–15 min over Wi-Fi. Show a progress UI.

Chat. Ship. Show your friends.

You are talking to an offline AI. Start building the product.

ACT VI · YOUR TURN60s

CODE 01 · INITIALIZE GEMMA 25 / 32

CODE 01

Initialize Gemma.

Load .env. Pass the Hugging Face token. Set a retry budget for flaky Wi-Fi. Run your app.

That's it — on-device AI is configured.

// main.dart

void main() async {
  await dotenv.load(fileName: ".env");

  FlutterGemma.initialize(
    huggingFaceToken: dotenv.env['HUGGINGFACE_TOKEN'],
    maxDownloadRetries: 20,
  );

  runApp(const GemmaDemoApp());
}

45sREAL CODE · SHIPS IN PROD

CODE 02 · STREAM RESPONSES 26 / 32

CODE 02

Stream responses.

Get the active model. Open a chat with temperature (creativity) and topK (focus). await for each token.

Ten lines of Dart. The whole API surface.

// chat_screen.dart

final model = await FlutterGemma.getActiveModel(
  maxTokens: 2048,
  preferredBackend: PreferredBackend.gpu,
  supportImage: true,
);

final chat = await model.createChat(
  temperature: 0.8,
  topK: 40,
);

await chat.addQueryChunk(
  Message.text(text: userInput, isUser: true),
);

await for (final response in chat.generateChatResponseAsync()) {
  if (response is TextResponse) {
    setState(() => streamingText += response.token);
  }
}

45sSTREAMS LIKE CHATGPT

THE CODELAB · TWO-HOUR GUIDED BUILD 27 / 32

WANT THE FULL BUILD?

Build it yourself.

Two hours. Guided. Hands-on. Every line of the demo you just saw — with UI polish, error handling, and the why behind each decision.

✓ Project setup & Hugging Face token

✓ Chat UI with token streaming

✓ Multimodal — text + image prompts

✓ Model download UX & error recovery

SCAN TO START

akanshajain.dev/codelabs/on-device-ai

FREE · OPEN · YOURSTHE WHOLE BUILD

RESOURCES · EVERYTHING YOU NEED 28 / 32

SCAN · CLONE · STAR · SHIP

Resources.

DEMO CODE

The full demo app.

github.com/jakansha2001/flutter-gemma-demo

FLUTTER_GEMMA

The package itself.

pub.dev/packages/flutter_gemma

GEMMA MODELS

All the weights.

huggingface.co/google/gemma-3n-E2B-it-litert-preview

DEEP DIVE

Why on-device AI changes mobile architecture.

medium.com/p/3cc6b09afd83

Clone. Star. Ship.

END OF ACT VI30s

THE CALLBACK · WE PROVED ALL FOUR 29 / 32

WE OPENED WITH FOUR NOS —