Gemma 4: Google's Most Capable Open Model
On April 2, 2026, Google DeepMind released Gemma 4 — a family of four open models built from the same research as Gemini 3, now available under Apache 2.0 for the first time. This license change alone may matter more than any benchmark number.
Previous Gemma versions shipped under Google's custom license with usage restrictions. Apache 2.0 means fully unrestricted commercial use, modification, and redistribution — putting Gemma 4 on equal licensing footing with GLM-5.1 (MIT) and DeepSeek V3.2 (MIT).
The Four Model Sizes
Model | Type | Active Params | Target Hardware |
|---|---|---|---|
E2B | Edge | ~2.3B effective | Phones, Raspberry Pi, Jetson Nano |
E4B | Edge | ~4.5B effective | Phones, edge devices |
26B MoE (A4B) | Mixture of Experts | ~4B active (3.8B per inference) | Consumer GPUs |
31B Dense | Dense | 31B | Workstation GPUs, cloud |
The naming is worth noting: E2B and E4B are edge-optimized models that run completely offline with near-zero latency on devices like smartphones and Raspberry Pi. The 26B MoE activates only ~4B parameters per token (designated "A4B") — getting 97% of the 31B model's quality at a fraction of the compute.
Benchmark Scores: A Generational Leap
The improvement from Gemma 3 (27B) to Gemma 4 (31B) is not incremental — it's a generational jump:
Benchmark | Gemma 4 31B | Gemma 3 27B | Improvement |
|---|---|---|---|
AIME 2026 (Math) | 89.2% | 20.8% | +68.4 pts |
LiveCodeBench v6 | 80.0% | 29.1% | +50.9 pts |
GPQA Diamond | 84.3% | 42.4% | +41.9 pts |
τ2-bench (Agentic) | 86.4% | 6.6% | +79.8 pts |
MMLU Pro | 85.2% | — | — |
MMMU Pro (Multimodal) | 76.9% | — | — |
Codeforces ELO | 2,150 | 110 | +2,040 |
The Codeforces ELO jump from 110 to 2,150 is particularly striking — that's the difference between a beginner and an expert-level competitive programmer.
26B MoE: 97% Quality at 4B Cost
The 26B MoE variant is arguably the most interesting model in the family. With only ~4B active parameters per token (designated A4B), it delivers:
AIME 2026: 88.3% (vs 31B's 89.2%)
LiveCodeBench: 77.1% (vs 31B's 80.0%)
GPQA Diamond: 82.3% (vs 31B's 84.3%)
You get 27B-class reasoning at 4B-class inference speed. For cost-sensitive deployments, this is the model to watch.
How It Compares to Other Open Models
Among open models in the ~30B parameter range, Gemma 4 31B leads on most benchmarks according to BenchLM comparisons:
Model | Arena ELO | Coding | Reasoning | License |
|---|---|---|---|---|
Gemma 4 31B | ~1452 (#3 open) | 80.0 | 66.4 | Apache 2.0 |
Qwen 3.5 27B | ~1403 | 77.6 | 60.6 | Apache 2.0 |
DeepSeek V3.2 | ~1425 | — | — | MIT |
Gemma 4 31B edges out Qwen 3.5 27B on coding (80 vs 77.6) and reasoning (66.4 vs 60.6), while Qwen leads on knowledge tasks (80.6 vs 61.3).
Important context: Larger open models like GLM-5.1 (744B MoE), Qwen 3.5 397B, and DeepSeek V3.2 Speciale still outperform Gemma 4 on the most demanding benchmarks. Gemma 4's advantage is that it fits on a single consumer GPU — something no 400B+ model can claim.
Key Capabilities
Multimodal (Text + Image + Audio)
All four models support text and image inputs. The edge models (E2B, E4B) also support audio input — enabling on-device voice assistants without cloud dependency. The 31B and 26B models support text + image.
140+ Languages
Gemma 4 supports over 140 languages, making it one of the most multilingual open models available.
Native Agentic Support
The 86.4% τ2-bench score reflects native function calling and tool-use capabilities. Gemma 4 can autonomously plan multi-step workflows, make API calls, navigate apps, and complete tasks — without external scaffolding.
Context Window
Up to 256K tokens for the 31B and 26B models, and 128K tokens for the edge models (E2B, E4B). This is shorter than frontier models like GPT-5.4 (1M tokens) but sufficient for most practical use cases and impressive for models this size.
Where to Get It
Gemma 4 is available immediately from:
Hugging Face — All model weights
Ollama — For local deployment
Kaggle — Notebooks and model cards
LM Studio — GUI-based local inference
Google AI Studio — Cloud-hosted inference
NVIDIA NIM — Optimized inference containers
Since the Gemma launch in 2024, developers have downloaded Gemma models over 400 million times, with 100,000+ community variants. Gemma 4 download velocity in its first week has reportedly exceeded any previous release.
Who Should Use Gemma 4
Edge/mobile developers: E2B and E4B are purpose-built for on-device AI with zero cloud dependency
Cost-sensitive teams: The 26B MoE delivers frontier-class results with minimal compute
Enterprise with compliance needs: Apache 2.0 license + self-hosting = full data control
Multilingual projects: 140+ language support out of the box
Who Should Look Elsewhere
Maximum coding performance: GLM-5.1 (58.4% SWE-Bench Pro) or Claude Opus 4.6 (57.3%) still lead on real-world software engineering benchmarks that Gemma 4 isn't evaluated on
Million-token context: GPT-5.4 and Qwen 3.6 Plus offer 1M+ token context windows
Absolute best reasoning: Qwen 3.5 397B and DeepSeek V3.2 Speciale still outperform at much larger parameter counts
Bottom Line
Gemma 4 is the best open model you can run on a single GPU in April 2026. The Apache 2.0 license removes all commercial restrictions. The 26B MoE variant's 97%-of-31B performance at ~4B inference cost is genuinely remarkable. And the edge models finally make on-device AI practical for production applications.
It won't replace frontier models like Claude Opus or GPT-5.4 for the hardest tasks. But for the vast majority of development workflows — especially where cost, latency, or data privacy matter — Gemma 4 is now the model to beat in its weight class.