Chrome Built-in AI: Local Gemini Nano in the Browser

Chrome now ships Gemini Nano directly in the browser, exposed through a small family of task-specific APIs. You can add AI to a web app without standing up any of the usual infrastructure: no servers to deploy, no model files to ship, no inference pipeline to maintain. You call an API and get a result.

What on-device AI means

Built-in AI falls under the umbrella of client-side AI. Tools like WebLLM, Transformers.js, and MediaPipe have been doing client-side AI for a while, and they allow for running models directly in the browser, on the user's own device, without sending data to a server. These tools share some common scaling and UX problems, as each web application that uses them ships and downloads its own model into the browser.

Client-side frameworks

Three sites, three downloads, three copies on disk.

Built-in AI

Three sites, one download, one copy — managed by Chrome.

Client-side frameworks make every site download its own model (left). Built-in AI shares one browser-managed model across every site (right).

Built-in AI takes a more practical approach: the browser manages one shared model that any web application can use. Chrome downloads it once and maintains it. For a developer, it behaves like any other browser API.

I was semi-blown away the first time I used these APIs and was able to get a prompt working in a few lines of code.

The low footprint makes it a good fit for prototyping and for adding small, focused features without a backend. It also works as an offline fallback when your server-side AI is unreachable.

Checking availability

The architecture for Built-in AI means we cannot assume the model is there, so every built-in AI feature should check first and degrade gracefully when the answer is no.

Checking availability is straightforward:

availability.js

const availability = await LanguageModel.availability({
  expectedInputs: [{ type: "text", languages: ["en"] }],
  expectedOutputs: [{ type: "text", languages: ["en"] }],
});

if (availability === "unavailable") {
  // No local model here. Fall back to a server call or a non-AI path.
}

available: The model is downloaded and ready to use right now.
downloadable: Supported, but the model has to be fetched first.
downloading: A fetch is already in progress.
unavailable: This browser or device can't run it, so reach for your fallback.

If the model is already available, you can prompt right away; unavailable means this device will never have it, so route to your fallback.

Downloading the model

Any call to LanguageModel.create() starts the download if the model isn't already available. Show the user some download progress while it runs:

download.js

const session = await LanguageModel.create({
  monitor(m) {
    m.addEventListener("downloadprogress", (e) => {
      console.log(`Downloaded ${Math.round(e.loaded * 100)}%`);
    });
  },
});

The model's inspection and download status can be viewed at any time via chrome://on-device-internals.

The first download can be large and slow, so trigger it from a user gesture and show clear progress once the user opts in. The model download guide is worth a read.

Prompting the model

Chrome exposes several task-specific APIs (Summarizer, Writer, Rewriter, Proofreader, Translator), but the Prompt API is the general-purpose one and the best place to start. Use cases from the Prompt API actually shaped the more specific writing APIs.

Once the model is available, prompting is as simple as two lines:

prompt.js

const session = await LanguageModel.create();
const result = await session.prompt("Tell me a joke.");
console.log(result);

Streaming the output

For anything longer than a sentence, you'll want to stream the response. promptStreaming() returns a ReadableStream, so you can render partial results as they arrive instead of waiting for the whole thing.

Each chunk is the new text (a delta), not the full response, so append it as it arrives.

streaming.js

const session = await LanguageModel.create();
const stream = session.promptStreaming(
  "I just ate a Scotch Pie and looked inside... I'm scared",
);

for await (const chunk of stream) {
  // Each chunk is the new text, not the full response, so append it.
  console.log(chunk);
}

Session management

A session keeps the conversation's context: each prompt and response is remembered and fed into the next turn, until the context window fills up.

That context is not free; it draws from a fixed token budget. You can check how much of the window you have used:

context-usage.js

console.log(`${session.contextUsage}/${session.contextWindow}`);

The context window is pretty small, so longer conversations can overflow it. Chrome then drops the oldest prompt-response pairs one at a time to make room, but the system prompt is never removed.

You can react to it by listening for the context overflow event:

context-overflow.js

session.addEventListener("contextoverflow", () => {
  console.log("Past the context window. The oldest turns will be dropped.");
});

Demo - ChatBot for a Bambot

Here's a simple chat app using the Prompt API. Assuming you have completed the preceding steps in your console, the model should be accessible. Otherwise, the demo will instruct you to download it. We've all seen a boring chat bot at this point, so I've spiced it up with a custom system prompt.

BamPot ChatBot

type Message = {
  role: "user" | "assistant";
  content: string;
};

const TEXT_OPTIONS = {
  expectedInputs: [{ type: "text", languages: ["en"] }],
  expectedOutputs: [{ type: "text", languages: ["en"] }],
};

const SYSTEM_PROMPT = "You are BamPot ChatBot, a warm Glasgow patter merchant.";

async function createChatSession(signal) {
  if (!globalThis.LanguageModel) {
    throw new Error("Chrome built-in AI is unavailable.");
  }

  const availability = await globalThis.LanguageModel.availability(TEXT_OPTIONS);
  if (availability === "unavailable") {
    throw new Error("The local model is unavailable on this device.");
  }

  return globalThis.LanguageModel.create({
    ...TEXT_OPTIONS,
    initialPrompts: [{ role: "system", content: SYSTEM_PROMPT }],
    signal,
  });
}

async function sendMessage(session, text, onChunk, signal) {
  const stream = session.promptStreaming(text, { signal });
  const reader = stream.getReader();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    onChunk(value);
  }
}

When to reach for it

The smaller context window is going to make it better suited to more focused pieces of work, with smaller input tokens. I think the task-based APIs that have arisen from the initial Prompt API are a good indication of where Google sees use cases for these types of models, for the short term at least.

My main takeaways are:

Cheap to run: There's no per-call API bill, so experiments don't turn into a cloud invoice.
Fast to prototype: Add a small AI assist to a form, editor, or dashboard without standing up a backend.
Private by default: Prompt data stays on the device, which matters for sensitive inputs.
Graceful fallback: Well-scoped features can keep working offline, without a network round trip.
Free demos: Ship an AI-powered demo or trial without worrying about usage costs.

Chrome Built-in AI in Practice