What Is Kimi K2.5?

Kimi K2.5 is Moonshot AI's flagship model, released January 27, 2026. It's a native multimodal Mixture-of-Experts (MoE) model with 1.04 trillion total parameters and 32 billion active parameters per forward pass. It processes text, images, and video through a 256,000-token context window.

The headline feature: Agent Swarm — an orchestration system that deploys up to 100 parallel sub-agents coordinating up to 1,500 tool calls on a single task. No other open-weight model ships anything like it.

Moonshot AI is a Beijing-based startup founded in 2023. As of March 2026, the company is targeting an $18 billion valuation — up from $4.3 billion just three months prior — making it the fastest Chinese AI company to reach decacorn status.

Architecture: 1T Parameters, 32B Active

Kimi K2.5 uses a Mixture-of-Experts architecture where only 32 billion of its 1.04 trillion parameters activate per forward pass. This keeps inference costs manageable while maintaining frontier-level reasoning capability.

Key architecture details:

Total parameters: 1.04 trillion
Active parameters: 32 billion (MoE routing)
Context window: 256,000 tokens
Vision encoder: MoonViT-3D (400M parameters) — enables native image and video understanding
Training data: ~15 trillion mixed visual and text tokens (continued pretraining from Kimi K2)
Input modalities: Text, image, video
Output modality: Text only
License: Modified MIT (commercial use permitted)

The model is available on Hugging Face and GitHub as open weights.

Agent Swarm: 100 Parallel Sub-Agents

Agent Swarm is what separates Kimi K2.5 from every other open-weight model on the market. According to InfoQ's technical analysis, it works like this:

A trainable orchestrator agent decomposes complex tasks into parallelizable subtasks
Up to 100 concurrent sub-agents execute those subtasks simultaneously
Sub-agents can coordinate up to 1,500 tool calls per task
Wall-clock time is reduced by 4.5x compared to single-agent execution

The PARL Training Technique

Moonshot developed Parallel-Agent Reinforcement Learning (PARL) specifically to train the orchestrator. The technique solves three hard problems:

Training instability during multi-agent coordination
Ambiguous credit assignment across agents (which agent gets reward for what?)
Serial collapse — where the orchestrator defaults to using just one agent sequentially, ignoring the parallel capability entirely

PARL addresses serial collapse through staged reward shaping that encourages parallelism early in training. The approach freezes sub-agents and trains only the orchestrator, with rewards incentivizing both sub-agent creation and successful task completion.

Moonshot also introduced a "Critical Steps" metric — a latency-oriented measurement inspired by the critical path in parallel computation — to evaluate how efficiently the swarm parallelizes work.

Agent Swarm Benchmark Impact

Benchmark	Standard Mode	Agent Swarm	Improvement
BrowseComp	60.6%	78.4%	+17.8 points
WideSearch	72.7%	79.0%	+6.3 points

With Agent Swarm enabled, Kimi K2.5 outperformed GPT-5.2 Pro on BrowseComp and surpassed Claude Opus 4.5 on WideSearch. The system also provides "proactive context control," reducing context overflow risks during long-running tasks without requiring summarization.

Full Benchmark Breakdown

All numbers below are from Moonshot's official tech blog and cross-referenced with Artificial Analysis.

Coding

Benchmark	Kimi K2.5	Context
SWE-Bench Verified	76.8%	Open-weight SOTA at launch
SWE-Bench Multilingual	73.0%	Cross-language code fixing
Terminal-Bench 2.0	50.8%	Terminal-based coding tasks
LiveCodeBench	85.0%	Real-time competitive coding

Math & Reasoning

Benchmark	Kimi K2.5	Context
AIME 2025	96.1%	Best-in-class open-weight
HMMT 2025	95.4%	Beats most proprietary models
GPQA-Diamond	87.6%	Graduate-level science QA
HLE (Humanity's Last Exam)	50.2%	Beats Claude Opus 4.5 (32.0%) and GPT-5.2 High (41.7%)

Vision & Multimodal

Benchmark	Kimi K2.5
MMMU Pro	78.5%
MathVision	84.2%
OmniDocBench 1.5	88.8%
VideoMMMU	86.6%
LongVideoBench	79.8%

Agentic Search

Benchmark	Standard	With Agent Swarm
BrowseComp	60.6%	78.4%
WideSearch	72.7%	79.0%
DeepSearchQA	77.1%

Pricing: 10x Cheaper Than Claude

According to Artificial Analysis and NxCode's pricing guide. Note: The standard Kimi K2 rate is $0.60/$2.50, while the K2.5 Reasoning variant is $0.60/$3.00. Prices below reflect the K2.5 Reasoning model:

Metric	Kimi K2.5	Claude Sonnet 4.6	GPT-5.4
Input (per 1M tokens)	$0.60	$3.00	$2.50
Output (per 1M tokens)	$3.00	$15.00	$15.00
Cache discount	75%	—	—

That's 5x cheaper than Claude Sonnet and GPT-5.4 on output tokens. The 75% cache discount on repeated prompts makes it even more cost-effective for agentic workloads where the same system prompt is reused across sub-agents.

Third-party API providers offer even lower rates. Artificial Analysis lists 15 providers, with DeepInfra at $0.90 blended per 1M tokens being the cheapest.

Speed and Latency: The Weak Spot

Per Artificial Analysis benchmarking:

Output speed: 36.4 tokens/second (ranked #39 of 68 models)
Time to first token: 1.74 seconds
Verbosity: 89 million output tokens generated during evaluation — notably verbose

This is a genuine weakness. Kimi K2.5 is slow and verbose. Artificial Analysis's evaluation cost was $370.66 — inflated by the model's tendency to produce far more output tokens than necessary. If you're building latency-sensitive applications, this matters.

For comparison, MiMo-V2 Flash runs at 141.9 tokens/second — nearly 4x faster.

What's Actually Good

Agent Swarm is genuinely novel. No other open-weight model ships a built-in multi-agent orchestration system. The 4.5x speedup on complex tasks is real.
Math reasoning is best-in-class open-weight. 96.1% AIME and 95.4% HMMT beat every open-weight competitor and most proprietary models.
HLE score of 50.2% is exceptional. It beats Claude Opus 4.5 (32.0%) and GPT-5.2 High (41.7%) on one of the hardest reasoning benchmarks.
Pricing is extremely competitive. At $0.60/M input, you can run serious agentic workloads without burning through budget.
Open weights with commercial license. Modified MIT means you can self-host and customize.
Native multimodal. MoonViT-3D vision encoder handles images, documents, and video natively — not bolted on after the fact.

What's Not Good

Verbosity is a real problem. The model generates far too many tokens. This eats into the cost savings and slows down responses. Artificial Analysis flagged this explicitly.
Speed is mediocre. 36.4 tok/s puts it in the bottom half of available models. Not suitable for real-time chat applications.
Vision is strong but text-only output. Despite multimodal input, it can only output text — no image generation or editing.
Agent Swarm requires specific API integration. You can't just drop it into existing LLM toolchains and get the swarm behavior automatically.
Chinese-company origin raises deployment concerns. Some enterprises have compliance restrictions on Chinese AI models, regardless of the open-weight license.

Moonshot AI: The Company Behind Kimi

Moonshot AI's fundraising trajectory tells a story of explosive growth:

Date	Round	Amount	Valuation
2023	Seed	$60M	$300M
Feb 2024	Series B (Alibaba-led)	$1B	$2.5B
Jan 2026	Series C (IDG Capital-led)	$500M	$4.3B
Feb 2026	Extension (Alibaba + Tencent)	$700M+	~$12B
Mar 2026	Discussions	Up to $1B	$18B target

Moonshot became the fastest Chinese company to reach decacorn status ($10B+ valuation), achieving it in roughly two years. As of March 2026, the company is also reportedly considering an IPO on the Hong Kong Stock Exchange.

Who Should Use Kimi K2.5?

Use it if:

You need agentic capabilities (research, multi-step coding, complex analysis) at low cost
You want open weights you can self-host and fine-tune
Math or science reasoning is your primary use case
You're building multi-agent systems and want a model designed for parallel execution

Skip it if:

You need fast, concise responses for real-time chat
Latency is critical for your application
You need image/audio generation (text-only output)
Enterprise compliance restricts use of Chinese AI models

Bottom Line

Kimi K2.5 is the most interesting open-weight model of early 2026 — not because of raw benchmark numbers alone, but because Agent Swarm introduces a genuinely new capability. Deploying 100 parallel sub-agents with PARL-trained orchestration at $0.60/$3.00 per million tokens is a combination nobody else offers.

The weaknesses are real: it's slow, verbose, and the Agent Swarm integration isn't plug-and-play. But for developers building agentic applications who can tolerate higher latency, the price-to-performance ratio is unmatched.

With Moonshot AI's valuation rocketing from $4.3B to a potential $18B in three months, the market clearly agrees.

]]>

Kimi K2.5 Deep Review: 100 AI Agents, 76.8% SWE-Bench, and $0.60/M Tokens — The Real Story