Chrome Built-in AI in Practice
Chrome now ships Gemini Nano directly in the browser, exposed through a small family of task-specific APIs. You can add AI to a web app without the usual headaches: no deployment, no inference infrastructure, no model files to load, no runtime to configure, and no pipeline to maintain. You call an API and get a result.
What on-device AI means
Built-in AI falls under the umbrella of client-side AI. Tools like WebLLM, Transformers.js, and MediaPipe have been doing client-side AI for a while, and they allow for running models directly in the browser, on the user's own device, without sending data to a server. These tools share some common scaling and UX problems, as each web application that uses them ships and downloads its own model into the browser.
Built-in AI has a potentially more practical architecture where the browser manages a shared model that any web application can use. This model is downloaded once and managed by Chrome. From a developer's perspective, this behaves like any other browser API. Call an API, receive a result, and update the UI.
I was semi-blown away the first time I used these APIs and was able to get a prompt working in a few lines of code.
This low-footprint design makes it great for prototyping new AI features, building smaller, focused features, and providing offline support and graceful fallbacks for server-side AI infrastructure.
Checking availability and downloading the model
The architecture for Built-in AI means we cannot assume the model is there, so every built-in AI feature should check first and degrade gracefully when the answer is no.
Checking availability is straight forward:
const availability = await LanguageModel.availability({
expectedInputs: [{ type: "text", languages: ["en"] }],
expectedOutputs: [{ type: "text", languages: ["en"] }],
});
if (availability === "unavailable") {
// No local model here. Fall back to a server call or a non-AI path.
}- available: The model is downloaded and ready to use right now.
- downloadable: Supported, but the model has to be fetched first.
- downloading: A fetch is already in progress.
- unavailable: This browser or device can't run it, so reach for your fallback.
Downloading the model
We can trigger the download of the model from within an application using LanguageModel.create(). Any call to create() will trigger the download if the model is not available. It is recommended to show some sort of download progress to the user:
const session = await LanguageModel.create({
monitor(m) {
m.addEventListener("downloadprogress", (e) => {
console.log(`Downloaded ${Math.round(e.loaded * 100)}%`);
});
},
});The model's inspection and download status can be viewed at any time via chrome://on-device-internals.
The first download can be large and slow, so trigger it from a user gesture and show clear progress once the user opts in. The model download guide is worth a read.
Prompting the model
Chrome exposes several task-specific APIs (Summarizer, Writer, Rewriter, Proofreader, Translator), but the Prompt API, is the general-purpose one and the best place to start. They actually used use cases from the Prompt API to identify the more specific writing APIs.
Once the model is available, prompting is as simple as two lines:
const session = await LanguageModel.create();
const result = await session.prompt("Tell me a joke.");
console.log(result);Streaming the output
For anything longer than a sentence, you're going to want to think about streaming the response. We can use promptStreaming() to return a ReadableStream, so you can render partial results as they arrive instead of waiting for the whole response.
Each chunk is the new text (a delta), not the full response, so append it as it arrives.
const session = await LanguageModel.create();
const stream = session.promptStreaming("I just eat a Scotch Pie and looked inside... Im scared");
for await (const chunk of stream) {
// Each chunk is the new text, not the full response, so append it.
console.log(chunk);
}Session management
A session keeps the conversation's context: each prompt and response is remembered and fed into the next turn, until the context window fills up.
That context is not free; it draws from a fixed token budget. You can check how much of the window you have used:
console.log(`${session.contextUsage}/${session.contextWindow}`);The context window is pretty small, so longer conversations can overflow it. Chrome then drops the oldest prompt-response pairs one at a time to make room; but the system prompt is never removed.
You can react to it by listening for the context overflow event:
session.addEventListener("contextoverflow", () => {
console.log("Past the context window. The oldest turns will be dropped.");
});Demo - ChatBot for a Bambot
Here's a simple chat app using the Prompt API. Assuming you have completed the preceding steps in your console, the model should be accessible. Otherwise, the demo will instruct you to download it. We've all seen a boring chat bot at this point, so I've spiced it up with a custom system prompt.
A'right pal
When to reach for it
- Cheap to run: There's no per-call API bill, so experiments don't turn into a cloud invoice.
- Fast to prototype: Add a small AI assist to a form, editor, or dashboard without standing up a backend.
- Private by default: Prompt data stays on the device, which matters for sensitive inputs.
- Graceful fallback: Well-scoped features can keep working offline, without a network round trip.
- Free demos: Ship an AI-powered demo or trial without worrying about usage costs.