The "Hunter Alpha" Reveal
In mid-March 2026, a mystery model appeared on OpenRouter under the name "Hunter Alpha." It quickly topped the daily usage charts, surpassing 1 trillion tokens in total usage. The AI community was convinced it was DeepSeek V4 testing in stealth mode.
It wasn't. On March 18, 2026, Xiaomi revealed that Hunter Alpha was actually MiMo-V2-Pro — the smartphone giant's flagship large language model, built by a team led by former DeepSeek researcher Luo Fuli. Reuters subsequently debunked the DeepSeek V4 speculation.
Two Models: Flash and Pro
Xiaomi's MiMo-V2 comes in two variants:
Spec | MiMo-V2-Flash | MiMo-V2-Pro |
|---|---|---|
Total Parameters | 309B | 1T |
Active Parameters | 15B | 42B |
Architecture | MoE, Hybrid Attention (5:1) | MoE, Hybrid Attention (7:1) |
Context Window | 256K tokens | 1M tokens |
Input Price | $0.10/M tokens | $1.00/M tokens |
Output Price | $0.30/M tokens | $3.00/M tokens |
License | MIT (open weights on HuggingFace) | Proprietary (API-only) |
MiMo-V2-Pro Benchmarks
The Pro model's performance has been independently evaluated by Artificial Analysis:
Benchmark | MiMo-V2-Pro | Comparison |
|---|---|---|
Intelligence Index | 49 | GLM-5: 50, GPT-5.2 Codex: ~49 |
GDPval-AA (Agentic) | 1426 ELO | GLM-5: 1406, Claude Sonnet 4.6: 1633 |
ClawEval (Agent Scaffold) | 61.5 | Claude Opus 4.6: 66.3, GPT-5.2: 50.0 |
Hallucination Rate | 30% | Flash: 48% |
The Pro model is the highest-scoring Chinese-origin model on GDPval-AA (agentic tasks), beating GLM-5 (1406) and Kimi K2.5 (1283). On ClawEval, it scores 61.5 — approaching Claude Opus 4.6's 66.3 and significantly beating GPT-5.2's 50.0.
Token Efficiency: A Key Advantage
MiMo-V2-Pro used 77M output tokens to run the Intelligence Index — significantly less than GLM-5 (109M) and Kimi K2.5 (89M). This matters because many Chinese models are notoriously verbose, driving up effective costs even when per-token pricing looks cheap.
MiMo-V2-Flash Benchmarks
The Flash variant punches well above its weight for a 15B-active-parameter model:
Benchmark | MiMo-V2-Flash | Notes |
|---|---|---|
SWE-Bench Verified | 73.4% | Leading open-source model for SWE |
SWE-Bench Multilingual | 71.7% | Strong cross-language coding |
AIME 2025 (Math) | 94.1% | Near-frontier math reasoning |
Intelligence Index | 41 | Average: 26 |
Flash generates output at 141.9 tokens per second (median across providers, per Artificial Analysis Feb 2026 data; Xiaomi claims up to 150 tok/s) — nearly 2.5x the average for open-weight models of similar size. This speed comes from Xiaomi's Multi-Token Prediction (MTP) architecture, which predicts multiple future tokens in a single forward pass.
Key Technical Innovations
Rollout Routing Replay (R3)
Xiaomi developed R3 to solve a common MoE problem: routing drift between training and inference. R3 enforces a deterministic constraint where experts activated during rollout are strictly reused during backpropagation, eliminating performance degradation that plagues other sparse models in production.
Multi-Teacher On-Policy Distillation (MOPD)
MiMo-V2-Flash uses a novel training technique where domain-specialized teacher models provide dense, token-level rewards. This lets the smaller model absorb expertise from multiple larger teachers without the quality loss typical of standard distillation.
Speculative Decoding via MTP
The Multi-Token Prediction layers serve double duty: during training they improve learning, and during inference they act as draft models for speculative decoding — achieving up to 3.6 acceptance length and 2.6× decoding speedup.
Pricing in Context
Model | Input/M | Output/M | Intelligence Index |
|---|---|---|---|
MiMo-V2-Flash | $0.10 | $0.30 | 41 |
MiMo-V2-Pro | $1.00 | $3.00 | 49 |
GLM-5.1 | $1.00 | $3.20 | 50 |
Claude Opus 4.6 | $5.00 | $25.00 | 53 |
GPT-5.4 | $2.50 | $15.00 | 57 |
MiMo-V2-Pro delivers roughly 92% of GLM-5's intelligence at nearly identical pricing. Flash delivers 78% of GLM-5's intelligence at 1/10th the cost.
The MiMo-V2 Ecosystem
Xiaomi isn't just shipping models — they're building an agent ecosystem. According to A2A Protocol, the MiMo-V2 series includes:
MiMo-V2-Pro: Flagship reasoning and coding model
MiMo-V2-Flash: Fast, efficient model for high-volume tasks
MiMo-V2-Omni: Multimodal variant (text + image + video)
MiMo-V2-TTS: Text-to-speech model
MiMo-V2-Pro has partnerships with five major agent frameworks: OpenClaw, OpenCode, KiloCode, Blackbox, and Cline — offering one week of free API access for developers worldwide.
Considerations
Data sovereignty: MiMo-V2 is operated by Xiaomi, a Chinese company. Enterprises with strict data handling requirements should evaluate compliance needs before production deployment.
Flash verbosity: While Pro is token-efficient (77M tokens for the Intelligence Index), Artificial Analysis noted Flash generated 97M tokens for the same evaluation — making effective costs much higher than per-token pricing suggests.
Creative writing: Both models excel at reasoning and coding but trail denser models like Claude Opus on creative and nuanced text generation.
Bottom Line
MiMo-V2-Pro is the strongest model from a company nobody expected to compete in frontier AI. Its stealth launch as Hunter Alpha proved the model can compete on merit without brand recognition. At $1/$3 per million tokens, it's a legitimate alternative to GLM-5.1 for agentic and coding workloads — and the Flash variant at $0.10/$0.30 is the cheapest frontier-adjacent model available.
Xiaomi's $8.7 billion AI investment is producing results. With DeepSeek researchers on the team and a smartphone ecosystem of hundreds of millions of devices waiting for on-device AI, MiMo-V2 is just the beginning.
Sources
Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance — VentureBeat
MiMo-V2-Pro Intelligence, Performance & Price Analysis — Artificial Analysis
MiMo-V2-Flash (Feb 2026) Intelligence & Price Analysis — Artificial Analysis
MiMo-V2-Pro: Everything you need to know — Artificial Analysis