Google dropped Gemma 4 on April 2, 2026, and this time it’s not messing around. The search giant unveiled its most capable family of open AI models to date — four sizes built from the same research underlying Gemini 3, packaged for everyone from phone developers to enterprise shops running single-GPU workstations. The move also comes with a licensing clean break: Gemma 4 ships under Apache 2.0, abandoning Google’s old custom license entirely.
Developers have downloaded Gemma models over 400 million times since the family launched in February 2024, spawning more than 100,000 community variants — a “Gemmaverse,” Google calls it. Gemma 4 is built to deepen that moat. The 31-billion-parameter dense model debuted at number three on Arena AI’s open model leaderboard, making it the highest-ranked Western open model on the chart. The two models ahead of it are Chinese: GLM-5 from Zhipu AI and Kimi 2.5 from Moonshot AI. That’s the competition Gemma 4 is walking into.
The Four Sizes, Explained
Gemma 4 comes in four configurations targeting wildly different use cases. At the lightweight end are the Effective 2B (E2B) and Effective 4B (E4B) models — designed to run locally on smartphones, Raspberry Pis, and NVIDIA Jetson Orin Nano devices. “Effective” means they activate a 2B or 4B parameter footprint during inference, preserving RAM and battery on mobile hardware. Both pack a 128K token context window and — notably — native speech recognition, making them suitable for voice-driven mobile apps without a server round-trip. Google worked with Qualcomm and MediaTek on optimization, and the Pixel team was involved in development. The E4B variant also ships as a day-one integration inside Android Studio’s Agent Mode.
The two heavy hitters are the 26-billion-parameter Mixture of Experts model and the 31B dense model. The 26B MoE activates only 3.8 billion parameters during inference, which means it generates tokens significantly faster than a dense model of equivalent size — ideal for developer tooling where latency matters. The 31B dense model is tuned for raw quality above all else; unquantized bfloat16 weights fit on a single 80GB NVIDIA H100 GPU, making it accessible for fine-tuning without a cluster. Both large models support a 256K token context window, up from the 128K ceiling on edge variants.
All four models handle video and images natively. The edge variants go further with built-in audio input. Every model includes native function-calling, structured JSON output, and native system instructions — the plumbing that makes agentic workflows possible from day one. Google says improvements in reasoning, math, and instruction-following benchmarks are significant, and that Gemma 4 models outperform systems 20 times their size on Arena ELO ratings.
Who Is This For?
The edge models answer a concrete need: developers building mobile apps, IoT products, or offline-first experiences that can’t afford a round-trip to a cloud API. The E4B ships with Qualcomm and MediaTek optimization baked in, and Google AI Edge Gallery lets you run it directly on a supported Android device. The 26B MoE is the developer workstation choice — fast enough for IDE integrations and code generation inside a local setup. The 31B dense model is the fine-tuning foundation, and the community is already using it for that. One team built BgGPT, a Bulgarian-first large language model, on top of a Gemma 4 base. Another group at Yale used it in cancer therapy research.
For enterprises, the Apache 2.0 license is the headline. Google’s previous Gemma licenses carried restrictions — a custom “prohibited use” policy, unilateral update rights, and clauses that arguably captured synthetic data derivatives. The Apache 2.0 switch removes all of that. You can run Gemma 4 in air-gapped environments, fine-tune it into a commercial product, and never send a byte of data to Google’s infrastructure. Data sovereignty is built into the license, not bolted on as a cloud upsell. That’s a meaningful change from a company that makes most of its money from hosted AI services.
The US-China Open-Source Race
Gemma 4 lands in the middle of a tense moment in the global AI race. Chinese labs have been consistently outperforming Western open models on leaderboards, even as US export controls restrict Chinese access to the NVIDIA chips used to train frontier models. Zhipu AI trained GLM-5 entirely on Huawei Ascend silicon — no NVIDIA required — and it still sits at the top of the Arena open model chart. We covered that dynamic in depth when the US moved to license every AI chip sale on Earth.
The dynamic cuts both ways. Export controls designed to slow China’s AI development have arguably pushed Chinese labs to build more efficient training pipelines. Meanwhile, American companies face no equivalent constraint — but they do face pressure to produce open models that developers actually want to use. Gemma 4’s Apache license and broad hardware support suggest Google is trying to win back developers who might otherwise reach for Meta’s Llama or Alibaba’s Qwen series. The 31B model being the highest-ranked Western open model on Arena is a point Google will emphasize — but it’s also an admission that Chinese labs are still ahead on the metric that matters most to researchers.
The smaller model sizes carry their own strategic weight. A model that runs well on a smartphone or a $300 Raspberry Pi setup is inherently more accessible than anything requiring an H100 cluster. If Google can dominate the edge and mobile segment of the open-source market — where power constraints matter more than raw benchmark scores — that could be a more durable advantage than a leaderboard position. Gemma Nano 4, the derivative of the E2B/E4B tier destined for Pixel devices, is the clearest signal of where this strategy is heading.
Gemma 4 is available now in Google AI Studio, Hugging Face, Kaggle, and Ollama. The edge models (E2B and E4B) are accessible via Google AI Edge Gallery. Cloud deployments run on Vertex AI, Cloud Run, GKE, and Google Sovereign Cloud. Google also launched a Gemma 4 Good Challenge on Kaggle, inviting developers to build socially useful applications with the models.

