• Google’s Gemma 4 runs entirely on your phone — text, images, audio, all processed locally with zero cloud dependency.
  • The on-device variants need just 6 to 8 GB of RAM and run up to four times faster than the previous generation.
  • It ships with built-in “agent skills” that let the AI autonomously use tools like Wikipedia, maps, and QR code generators.

Google just shipped an AI model that does something most of the industry said wasn’t practical yet: runs a multimodal, agentic assistant entirely on your phone, with no data ever touching a server. It’s called Gemma 4, it’s free, and it’s already the fourth most-downloaded productivity app on iOS.

The model processes text, images, and audio across more than 140 languages, all on-device. It comes in four sizes, but the two that matter for regular users are E2B and E4B — edge variants that need just 6 and 8 GB of RAM respectively. Quantized, E2B takes up about 1.3 GB on your phone. E4B needs roughly 2.5 GB. Both run on Android 12 or iOS 17.

Google says Gemma 4 on Android runs up to four times faster than its predecessor while cutting battery drain by 60 percent. Arm’s own benchmarks show an average 5.5x speedup on newer chips with the SME2 instruction set. The gap between what a phone can do locally and what requires a server is closing fast.

What Makes Gemma 4 Different From Other On-Device AI

The headline feature isn’t speed — it’s agency. Gemma 4 ships with what Google calls “agent skills”: built-in tools the model can invoke autonomously. Need a summary? It generates one. Want to look something up? It hits Wikipedia. Need a QR code? It builds one on the spot, no internet required. The model picks up on user intent and fires up the right skill without being told which one to use.

Google’s been building toward this with its AI Edge platform for months. But Gemma 4 is the first time the pieces come together into something that feels like a real assistant rather than a tech demo.

The model is built on the same research behind Google’s proprietary Gemini 3 but ships under the Apache 2.0 license. Developers can create and share custom skills via GitHub. Google says the Gemma family has racked up over 400 million downloads since the first generation launched.

The bigger 26B and 31B variants target servers and high-performance hardware. The 26B version uses a mixture-of-experts architecture with 128 experts — only 3.8 billion parameters are active at any given time. The dense 31B model offers a context window of up to 256,000 tokens and scores 89.2% on AIME 2026.

None of these features individually break new ground compared to what cloud providers already offer. What stands out is that a free app running a purely local model on a phone can now use these tools on its own, with no data leaving the device. In a week dominated by Anthropic’s Mythos cybersecurity drama and closed-door Treasury meetings about AI risk, Google quietly shipped the privacy-first alternative nobody asked for but everybody might need.

The Google AI Edge Gallery app is free on both Android and iOS.

Leave your vote