The "Hunter Alpha" Reveal

In mid-March 2026, a mystery model appeared on OpenRouter under the name "Hunter Alpha." It quickly topped the daily usage charts, surpassing 1 trillion tokens in total usage. The AI community was convinced it was DeepSeek V4 testing in stealth mode.

It wasn't. On March 18, 2026, Xiaomi revealed that Hunter Alpha was actually MiMo-V2-Pro — the smartphone giant's flagship large language model, built by a team led by former DeepSeek researcher Luo Fuli. Reuters subsequently debunked the DeepSeek V4 speculation.

Two Models: Flash and Pro

Xiaomi's MiMo-V2 comes in two variants:

Spec	MiMo-V2-Flash	MiMo-V2-Pro
Total Parameters	309B	1T
Active Parameters	15B	42B
Architecture	MoE, Hybrid Attention (5:1)	MoE, Hybrid Attention (7:1)
Context Window	256K tokens	1M tokens
Input Price	$0.10/M tokens	$1.00/M tokens
Output Price	$0.30/M tokens	$3.00/M tokens
License	MIT (open weights on HuggingFace)	Proprietary (API-only)

MiMo-V2-Pro Benchmarks

The Pro model's performance has been independently evaluated by Artificial Analysis:

Benchmark	MiMo-V2-Pro	Comparison
Intelligence Index	49	GLM-5: 50, GPT-5.2 Codex: ~49
GDPval-AA (Agentic)	1426 ELO	GLM-5: 1406, Claude Sonnet 4.6: 1633
ClawEval (Agent Scaffold)	61.5	Claude Opus 4.6: 66.3, GPT-5.2: 50.0
Hallucination Rate	30%	Flash: 48%

The Pro model is the highest-scoring Chinese-origin model on GDPval-AA (agentic tasks), beating GLM-5 (1406) and Kimi K2.5 (1283). On ClawEval, it scores 61.5 — approaching Claude Opus 4.6's 66.3 and significantly beating GPT-5.2's 50.0.

Token Efficiency: A Key Advantage

MiMo-V2-Pro used 77M output tokens to run the Intelligence Index — significantly less than GLM-5 (109M) and Kimi K2.5 (89M). This matters because many Chinese models are notoriously verbose, driving up effective costs even when per-token pricing looks cheap.

MiMo-V2-Flash Benchmarks

The Flash variant punches well above its weight for a 15B-active-parameter model:

Benchmark	MiMo-V2-Flash	Notes
SWE-Bench Verified	73.4%	Leading open-source model for SWE
SWE-Bench Multilingual	71.7%	Strong cross-language coding
AIME 2025 (Math)	94.1%	Near-frontier math reasoning
Intelligence Index	41	Average: 26

Flash generates output at 141.9 tokens per second (median across providers, per Artificial Analysis Feb 2026 data; Xiaomi claims up to 150 tok/s) — nearly 2.5x the average for open-weight models of similar size. This speed comes from Xiaomi's Multi-Token Prediction (MTP) architecture, which predicts multiple future tokens in a single forward pass.

Key Technical Innovations

Rollout Routing Replay (R3)

Xiaomi developed R3 to solve a common MoE problem: routing drift between training and inference. R3 enforces a deterministic constraint where experts activated during rollout are strictly reused during backpropagation, eliminating performance degradation that plagues other sparse models in production.

Multi-Teacher On-Policy Distillation (MOPD)

MiMo-V2-Flash uses a novel training technique where domain-specialized teacher models provide dense, token-level rewards. This lets the smaller model absorb expertise from multiple larger teachers without the quality loss typical of standard distillation.

Speculative Decoding via MTP

The Multi-Token Prediction layers serve double duty: during training they improve learning, and during inference they act as draft models for speculative decoding — achieving up to 3.6 acceptance length and 2.6× decoding speedup.

Pricing in Context

Model	Input/M	Output/M	Intelligence Index
MiMo-V2-Flash	$0.10	$0.30	41
MiMo-V2-Pro	$1.00	$3.00	49
GLM-5.1	$1.00	$3.20	50
Claude Opus 4.6	$5.00	$25.00	53
GPT-5.4	$2.50	$15.00	57

MiMo-V2-Pro delivers roughly 92% of GLM-5's intelligence at nearly identical pricing. Flash delivers 78% of GLM-5's intelligence at 1/10th the cost.

The MiMo-V2 Ecosystem

Xiaomi isn't just shipping models — they're building an agent ecosystem. According to A2A Protocol, the MiMo-V2 series includes:

MiMo-V2-Pro: Flagship reasoning and coding model
MiMo-V2-Flash: Fast, efficient model for high-volume tasks
MiMo-V2-Omni: Multimodal variant (text + image + video)
MiMo-V2-TTS: Text-to-speech model

MiMo-V2-Pro has partnerships with five major agent frameworks: OpenClaw, OpenCode, KiloCode, Blackbox, and Cline — offering one week of free API access for developers worldwide.

Considerations

Data sovereignty: MiMo-V2 is operated by Xiaomi, a Chinese company. Enterprises with strict data handling requirements should evaluate compliance needs before production deployment.
Flash verbosity: While Pro is token-efficient (77M tokens for the Intelligence Index), Artificial Analysis noted Flash generated 97M tokens for the same evaluation — making effective costs much higher than per-token pricing suggests.
Creative writing: Both models excel at reasoning and coding but trail denser models like Claude Opus on creative and nuanced text generation.

Bottom Line

MiMo-V2-Pro is the strongest model from a company nobody expected to compete in frontier AI. Its stealth launch as Hunter Alpha proved the model can compete on merit without brand recognition. At $1/$3 per million tokens, it's a legitimate alternative to GLM-5.1 for agentic and coding workloads — and the Flash variant at $0.10/$0.30 is the cheapest frontier-adjacent model available.

Xiaomi's $8.7 billion AI investment is producing results. With DeepSeek researchers on the team and a smartphone ecosystem of hundreds of millions of devices waiting for on-device AI, MiMo-V2 is just the beginning.

Xiaomi MiMo-V2 Review: The 1T-Parameter Model That Fooled the AI Community

The "Hunter Alpha" Reveal

Two Models: Flash and Pro

MiMo-V2-Pro Benchmarks

Token Efficiency: A Key Advantage

MiMo-V2-Flash Benchmarks

Key Technical Innovations

Rollout Routing Replay (R3)

Multi-Teacher On-Policy Distillation (MOPD)

Speculative Decoding via MTP

Pricing in Context

The MiMo-V2 Ecosystem

Considerations

Bottom Line

Sources

Related Articles

Best AI Models 2026: Gemini 3.1 Pro vs Claude Opus 4.6 vs GPT-5.2 — Full Comparison

GLM-5.1: Zhipu's Open-Source Model Scores 94.6% of Claude Opus 4.6 in Coding

AI Coding Model API Pricing Compared: GLM-5 vs Claude vs GPT vs Gemini (2026)

Ready to automate your workflows?