Gemma 4: Google's Most Capable Open Model

On April 2, 2026, Google DeepMind released Gemma 4 — a family of four open models built from the same research as Gemini 3, now available under Apache 2.0 for the first time. This license change alone may matter more than any benchmark number.

Previous Gemma versions shipped under Google's custom license with usage restrictions. Apache 2.0 means fully unrestricted commercial use, modification, and redistribution — putting Gemma 4 on equal licensing footing with GLM-5.1 (MIT) and DeepSeek V3.2 (MIT).

The Four Model Sizes

Model	Type	Active Params	Target Hardware
E2B	Edge	~2.3B effective	Phones, Raspberry Pi, Jetson Nano
E4B	Edge	~4.5B effective	Phones, edge devices
26B MoE (A4B)	Mixture of Experts	~4B active (3.8B per inference)	Consumer GPUs
31B Dense	Dense	31B	Workstation GPUs, cloud

The naming is worth noting: E2B and E4B are edge-optimized models that run completely offline with near-zero latency on devices like smartphones and Raspberry Pi. The 26B MoE activates only ~4B parameters per token (designated "A4B") — getting 97% of the 31B model's quality at a fraction of the compute.

Benchmark Scores: A Generational Leap

The improvement from Gemma 3 (27B) to Gemma 4 (31B) is not incremental — it's a generational jump:

Benchmark	Gemma 4 31B	Gemma 3 27B	Improvement
AIME 2026 (Math)	89.2%	20.8%	+68.4 pts
LiveCodeBench v6	80.0%	29.1%	+50.9 pts
GPQA Diamond	84.3%	42.4%	+41.9 pts
τ2-bench (Agentic)	86.4%	6.6%	+79.8 pts
MMLU Pro	85.2%	—	—
MMMU Pro (Multimodal)	76.9%	—	—
Codeforces ELO	2,150	110	+2,040

The Codeforces ELO jump from 110 to 2,150 is particularly striking — that's the difference between a beginner and an expert-level competitive programmer.

26B MoE: 97% Quality at 4B Cost

The 26B MoE variant is arguably the most interesting model in the family. With only ~4B active parameters per token (designated A4B), it delivers:

AIME 2026: 88.3% (vs 31B's 89.2%)
LiveCodeBench: 77.1% (vs 31B's 80.0%)
GPQA Diamond: 82.3% (vs 31B's 84.3%)

You get 27B-class reasoning at 4B-class inference speed. For cost-sensitive deployments, this is the model to watch.

How It Compares to Other Open Models

Among open models in the ~30B parameter range, Gemma 4 31B leads on most benchmarks according to BenchLM comparisons:

Model	Arena ELO	Coding	Reasoning	License
Gemma 4 31B	~1452 (#3 open)	80.0	66.4	Apache 2.0
Qwen 3.5 27B	~1403	77.6	60.6	Apache 2.0
DeepSeek V3.2	~1425	—	—	MIT

Gemma 4 31B edges out Qwen 3.5 27B on coding (80 vs 77.6) and reasoning (66.4 vs 60.6), while Qwen leads on knowledge tasks (80.6 vs 61.3).

Important context: Larger open models like GLM-5.1 (744B MoE), Qwen 3.5 397B, and DeepSeek V3.2 Speciale still outperform Gemma 4 on the most demanding benchmarks. Gemma 4's advantage is that it fits on a single consumer GPU — something no 400B+ model can claim.

Key Capabilities

Multimodal (Text + Image + Audio)

All four models support text and image inputs. The edge models (E2B, E4B) also support audio input — enabling on-device voice assistants without cloud dependency. The 31B and 26B models support text + image.

140+ Languages

Gemma 4 supports over 140 languages, making it one of the most multilingual open models available.

Native Agentic Support

The 86.4% τ2-bench score reflects native function calling and tool-use capabilities. Gemma 4 can autonomously plan multi-step workflows, make API calls, navigate apps, and complete tasks — without external scaffolding.

Context Window

Up to 256K tokens for the 31B and 26B models, and 128K tokens for the edge models (E2B, E4B). This is shorter than frontier models like GPT-5.4 (1M tokens) but sufficient for most practical use cases and impressive for models this size.

Where to Get It

Gemma 4 is available immediately from:

Hugging Face — All model weights
Ollama — For local deployment
Kaggle — Notebooks and model cards
LM Studio — GUI-based local inference
Google AI Studio — Cloud-hosted inference
NVIDIA NIM — Optimized inference containers

Since the Gemma launch in 2024, developers have downloaded Gemma models over 400 million times, with 100,000+ community variants. Gemma 4 download velocity in its first week has reportedly exceeded any previous release.

Who Should Use Gemma 4

Edge/mobile developers: E2B and E4B are purpose-built for on-device AI with zero cloud dependency
Cost-sensitive teams: The 26B MoE delivers frontier-class results with minimal compute
Enterprise with compliance needs: Apache 2.0 license + self-hosting = full data control
Multilingual projects: 140+ language support out of the box

Who Should Look Elsewhere

Maximum coding performance: GLM-5.1 (58.4% SWE-Bench Pro) or Claude Opus 4.6 (57.3%) still lead on real-world software engineering benchmarks that Gemma 4 isn't evaluated on
Million-token context: GPT-5.4 and Qwen 3.6 Plus offer 1M+ token context windows
Absolute best reasoning: Qwen 3.5 397B and DeepSeek V3.2 Speciale still outperform at much larger parameter counts

Bottom Line

Gemma 4 is the best open model you can run on a single GPU in April 2026. The Apache 2.0 license removes all commercial restrictions. The 26B MoE variant's 97%-of-31B performance at ~4B inference cost is genuinely remarkable. And the edge models finally make on-device AI practical for production applications.

It won't replace frontier models like Claude Opus or GPT-5.4 for the hardest tasks. But for the vast majority of development workflows — especially where cost, latency, or data privacy matter — Gemma 4 is now the model to beat in its weight class.

Google Gemma 4 Review: The Best Open Model You Can Run on a Single GPU