What Is Kimi K2.5?
Kimi K2.5 is Moonshot AI's flagship model, released January 27, 2026. It's a native multimodal Mixture-of-Experts (MoE) model with 1.04 trillion total parameters and 32 billion active parameters per forward pass. It processes text, images, and video through a 256,000-token context window.
The headline feature: Agent Swarm — an orchestration system that deploys up to 100 parallel sub-agents coordinating up to 1,500 tool calls on a single task. No other open-weight model ships anything like it.
Moonshot AI is a Beijing-based startup founded in 2023. As of March 2026, the company is targeting an $18 billion valuation — up from $4.3 billion just three months prior — making it the fastest Chinese AI company to reach decacorn status.
Architecture: 1T Parameters, 32B Active
Kimi K2.5 uses a Mixture-of-Experts architecture where only 32 billion of its 1.04 trillion parameters activate per forward pass. This keeps inference costs manageable while maintaining frontier-level reasoning capability.
Key architecture details:
Total parameters: 1.04 trillion
Active parameters: 32 billion (MoE routing)
Context window: 256,000 tokens
Vision encoder: MoonViT-3D (400M parameters) — enables native image and video understanding
Training data: ~15 trillion mixed visual and text tokens (continued pretraining from Kimi K2)
Input modalities: Text, image, video
Output modality: Text only
License: Modified MIT (commercial use permitted)
The model is available on Hugging Face and GitHub as open weights.
Agent Swarm: 100 Parallel Sub-Agents
Agent Swarm is what separates Kimi K2.5 from every other open-weight model on the market. According to InfoQ's technical analysis, it works like this:
A trainable orchestrator agent decomposes complex tasks into parallelizable subtasks
Up to 100 concurrent sub-agents execute those subtasks simultaneously
Sub-agents can coordinate up to 1,500 tool calls per task
Wall-clock time is reduced by 4.5x compared to single-agent execution
The PARL Training Technique
Moonshot developed Parallel-Agent Reinforcement Learning (PARL) specifically to train the orchestrator. The technique solves three hard problems:
Training instability during multi-agent coordination
Ambiguous credit assignment across agents (which agent gets reward for what?)
Serial collapse — where the orchestrator defaults to using just one agent sequentially, ignoring the parallel capability entirely
PARL addresses serial collapse through staged reward shaping that encourages parallelism early in training. The approach freezes sub-agents and trains only the orchestrator, with rewards incentivizing both sub-agent creation and successful task completion.
Moonshot also introduced a "Critical Steps" metric — a latency-oriented measurement inspired by the critical path in parallel computation — to evaluate how efficiently the swarm parallelizes work.
Agent Swarm Benchmark Impact
Benchmark | Standard Mode | Agent Swarm | Improvement |
|---|---|---|---|
BrowseComp | 60.6% | 78.4% | +17.8 points |
WideSearch | 72.7% | 79.0% | +6.3 points |
With Agent Swarm enabled, Kimi K2.5 outperformed GPT-5.2 Pro on BrowseComp and surpassed Claude Opus 4.5 on WideSearch. The system also provides "proactive context control," reducing context overflow risks during long-running tasks without requiring summarization.
Full Benchmark Breakdown
All numbers below are from Moonshot's official tech blog and cross-referenced with Artificial Analysis.
Coding
Benchmark | Kimi K2.5 | Context |
|---|---|---|
SWE-Bench Verified | 76.8% | Open-weight SOTA at launch |
SWE-Bench Multilingual | 73.0% | Cross-language code fixing |
Terminal-Bench 2.0 | 50.8% | Terminal-based coding tasks |
LiveCodeBench | 85.0% | Real-time competitive coding |
Math & Reasoning
Benchmark | Kimi K2.5 | Context |
|---|---|---|
AIME 2025 | 96.1% | Best-in-class open-weight |
HMMT 2025 | 95.4% | Beats most proprietary models |
GPQA-Diamond | 87.6% | Graduate-level science QA |
HLE (Humanity's Last Exam) | 50.2% | Beats Claude Opus 4.5 (32.0%) and GPT-5.2 High (41.7%) |
Vision & Multimodal
Benchmark | Kimi K2.5 |
|---|---|
MMMU Pro | 78.5% |
MathVision | 84.2% |
OmniDocBench 1.5 | 88.8% |
VideoMMMU | 86.6% |
LongVideoBench | 79.8% |
Agentic Search
Benchmark | Standard | With Agent Swarm |
|---|---|---|
BrowseComp | 60.6% | 78.4% |
WideSearch | 72.7% | 79.0% |
DeepSearchQA | 77.1% | |
Pricing: 10x Cheaper Than Claude
According to Artificial Analysis and NxCode's pricing guide. Note: The standard Kimi K2 rate is $0.60/$2.50, while the K2.5 Reasoning variant is $0.60/$3.00. Prices below reflect the K2.5 Reasoning model:
Metric | Kimi K2.5 | Claude Sonnet 4.6 | GPT-5.4 |
|---|---|---|---|
Input (per 1M tokens) | $0.60 | $3.00 | $2.50 |
Output (per 1M tokens) | $3.00 | $15.00 | $15.00 |
Cache discount | 75% | — | — |
That's 5x cheaper than Claude Sonnet and GPT-5.4 on output tokens. The 75% cache discount on repeated prompts makes it even more cost-effective for agentic workloads where the same system prompt is reused across sub-agents.
Third-party API providers offer even lower rates. Artificial Analysis lists 15 providers, with DeepInfra at $0.90 blended per 1M tokens being the cheapest.
Speed and Latency: The Weak Spot
Per Artificial Analysis benchmarking:
Output speed: 36.4 tokens/second (ranked #39 of 68 models)
Time to first token: 1.74 seconds
Verbosity: 89 million output tokens generated during evaluation — notably verbose
This is a genuine weakness. Kimi K2.5 is slow and verbose. Artificial Analysis's evaluation cost was $370.66 — inflated by the model's tendency to produce far more output tokens than necessary. If you're building latency-sensitive applications, this matters.
For comparison, MiMo-V2 Flash runs at 141.9 tokens/second — nearly 4x faster.
What's Actually Good
Agent Swarm is genuinely novel. No other open-weight model ships a built-in multi-agent orchestration system. The 4.5x speedup on complex tasks is real.
Math reasoning is best-in-class open-weight. 96.1% AIME and 95.4% HMMT beat every open-weight competitor and most proprietary models.
HLE score of 50.2% is exceptional. It beats Claude Opus 4.5 (32.0%) and GPT-5.2 High (41.7%) on one of the hardest reasoning benchmarks.
Pricing is extremely competitive. At $0.60/M input, you can run serious agentic workloads without burning through budget.
Open weights with commercial license. Modified MIT means you can self-host and customize.
Native multimodal. MoonViT-3D vision encoder handles images, documents, and video natively — not bolted on after the fact.
What's Not Good
Verbosity is a real problem. The model generates far too many tokens. This eats into the cost savings and slows down responses. Artificial Analysis flagged this explicitly.
Speed is mediocre. 36.4 tok/s puts it in the bottom half of available models. Not suitable for real-time chat applications.
Vision is strong but text-only output. Despite multimodal input, it can only output text — no image generation or editing.
Agent Swarm requires specific API integration. You can't just drop it into existing LLM toolchains and get the swarm behavior automatically.
Chinese-company origin raises deployment concerns. Some enterprises have compliance restrictions on Chinese AI models, regardless of the open-weight license.
Moonshot AI: The Company Behind Kimi
Moonshot AI's fundraising trajectory tells a story of explosive growth:
Date | Round | Amount | Valuation |
|---|---|---|---|
2023 | Seed | $60M | $300M |
Feb 2024 | Series B (Alibaba-led) | $1B | $2.5B |
Jan 2026 | Series C (IDG Capital-led) | $500M | $4.3B |
Feb 2026 | $700M+ | ~$12B | |
Mar 2026 | Up to $1B | $18B target |
Moonshot became the fastest Chinese company to reach decacorn status ($10B+ valuation), achieving it in roughly two years. As of March 2026, the company is also reportedly considering an IPO on the Hong Kong Stock Exchange.
Who Should Use Kimi K2.5?
Use it if:
You need agentic capabilities (research, multi-step coding, complex analysis) at low cost
You want open weights you can self-host and fine-tune
Math or science reasoning is your primary use case
You're building multi-agent systems and want a model designed for parallel execution
Skip it if:
You need fast, concise responses for real-time chat
Latency is critical for your application
You need image/audio generation (text-only output)
Enterprise compliance restricts use of Chinese AI models
Bottom Line
Kimi K2.5 is the most interesting open-weight model of early 2026 — not because of raw benchmark numbers alone, but because Agent Swarm introduces a genuinely new capability. Deploying 100 parallel sub-agents with PARL-trained orchestration at $0.60/$3.00 per million tokens is a combination nobody else offers.
The weaknesses are real: it's slow, verbose, and the Agent Swarm integration isn't plug-and-play. But for developers building agentic applications who can tolerate higher latency, the price-to-performance ratio is unmatched.
With Moonshot AI's valuation rocketing from $4.3B to a potential $18B in three months, the market clearly agrees.
]]>
Sources
Kimi K2.5 Intelligence, Performance & Price Analysis — Artificial Analysis
Moonshot AI Releases Open-Weight Kimi K2.5 Model with Agent Swarm — InfoQ
Moonshot AI targets US$12 billion valuation — South China Morning Post
Moonshot AI sees revenue surge, secures over $700 million — TechNode
China AI Startup Moonshot Seeks $10 Billion Value — Bloomberg
Kimi K2.5 Pricing 2026: Plans, API Costs & Free Tier — NxCode