Back to Articles
Review

Kimi K2.5 Deep Review: 100 AI Agents, 76.8% SWE-Bench, and $0.60/M Tokens — The Real Story

Moonshot AI's Kimi K2.5 deploys up to 100 parallel sub-agents, scores 76.8% on SWE-Bench Verified, and costs $0.60/$3.00 per million tokens. We break down the benchmarks, the PARL training technique, and what actually works.

Nishant LamichhaneUpdated 12 min read
Cover image for Kimi K2.5 Deep Review: 100 AI Agents, 76.8% SWE-Bench, and $0.60/M Tokens — The Real Story

What Is Kimi K2.5?

Kimi K2.5 is Moonshot AI's flagship model, released January 27, 2026. It's a native multimodal Mixture-of-Experts (MoE) model with 1.04 trillion total parameters and 32 billion active parameters per forward pass. It processes text, images, and video through a 256,000-token context window.

The headline feature: Agent Swarm — an orchestration system that deploys up to 100 parallel sub-agents coordinating up to 1,500 tool calls on a single task. No other open-weight model ships anything like it.

Moonshot AI is a Beijing-based startup founded in 2023. As of March 2026, the company is targeting an $18 billion valuation — up from $4.3 billion just three months prior — making it the fastest Chinese AI company to reach decacorn status.

Architecture: 1T Parameters, 32B Active

Kimi K2.5 uses a Mixture-of-Experts architecture where only 32 billion of its 1.04 trillion parameters activate per forward pass. This keeps inference costs manageable while maintaining frontier-level reasoning capability.

Key architecture details:

  • Total parameters: 1.04 trillion

  • Active parameters: 32 billion (MoE routing)

  • Context window: 256,000 tokens

  • Vision encoder: MoonViT-3D (400M parameters) — enables native image and video understanding

  • Training data: ~15 trillion mixed visual and text tokens (continued pretraining from Kimi K2)

  • Input modalities: Text, image, video

  • Output modality: Text only

  • License: Modified MIT (commercial use permitted)

The model is available on Hugging Face and GitHub as open weights.

Agent Swarm: 100 Parallel Sub-Agents

Agent Swarm is what separates Kimi K2.5 from every other open-weight model on the market. According to InfoQ's technical analysis, it works like this:

  • A trainable orchestrator agent decomposes complex tasks into parallelizable subtasks

  • Up to 100 concurrent sub-agents execute those subtasks simultaneously

  • Sub-agents can coordinate up to 1,500 tool calls per task

  • Wall-clock time is reduced by 4.5x compared to single-agent execution

The PARL Training Technique

Moonshot developed Parallel-Agent Reinforcement Learning (PARL) specifically to train the orchestrator. The technique solves three hard problems:

  1. Training instability during multi-agent coordination

  2. Ambiguous credit assignment across agents (which agent gets reward for what?)

  3. Serial collapse — where the orchestrator defaults to using just one agent sequentially, ignoring the parallel capability entirely

PARL addresses serial collapse through staged reward shaping that encourages parallelism early in training. The approach freezes sub-agents and trains only the orchestrator, with rewards incentivizing both sub-agent creation and successful task completion.

Moonshot also introduced a "Critical Steps" metric — a latency-oriented measurement inspired by the critical path in parallel computation — to evaluate how efficiently the swarm parallelizes work.

Agent Swarm Benchmark Impact

Benchmark

Standard Mode

Agent Swarm

Improvement

BrowseComp

60.6%

78.4%

+17.8 points

WideSearch

72.7%

79.0%

+6.3 points

With Agent Swarm enabled, Kimi K2.5 outperformed GPT-5.2 Pro on BrowseComp and surpassed Claude Opus 4.5 on WideSearch. The system also provides "proactive context control," reducing context overflow risks during long-running tasks without requiring summarization.

Full Benchmark Breakdown

All numbers below are from Moonshot's official tech blog and cross-referenced with Artificial Analysis.

Coding

Benchmark

Kimi K2.5

Context

SWE-Bench Verified

76.8%

Open-weight SOTA at launch

SWE-Bench Multilingual

73.0%

Cross-language code fixing

Terminal-Bench 2.0

50.8%

Terminal-based coding tasks

LiveCodeBench

85.0%

Real-time competitive coding

Math & Reasoning

Benchmark

Kimi K2.5

Context

AIME 2025

96.1%

Best-in-class open-weight

HMMT 2025

95.4%

Beats most proprietary models

GPQA-Diamond

87.6%

Graduate-level science QA

HLE (Humanity's Last Exam)

50.2%

Beats Claude Opus 4.5 (32.0%) and GPT-5.2 High (41.7%)

Vision & Multimodal

Benchmark

Kimi K2.5

MMMU Pro

78.5%

MathVision

84.2%

OmniDocBench 1.5

88.8%

VideoMMMU

86.6%

LongVideoBench

79.8%

Benchmark

Standard

With Agent Swarm

BrowseComp

60.6%

78.4%

WideSearch

72.7%

79.0%

DeepSearchQA

77.1%

Pricing: 10x Cheaper Than Claude

According to Artificial Analysis and NxCode's pricing guide. Note: The standard Kimi K2 rate is $0.60/$2.50, while the K2.5 Reasoning variant is $0.60/$3.00. Prices below reflect the K2.5 Reasoning model:

Metric

Kimi K2.5

Claude Sonnet 4.6

GPT-5.4

Input (per 1M tokens)

$0.60

$3.00

$2.50

Output (per 1M tokens)

$3.00

$15.00

$15.00

Cache discount

75%

That's 5x cheaper than Claude Sonnet and GPT-5.4 on output tokens. The 75% cache discount on repeated prompts makes it even more cost-effective for agentic workloads where the same system prompt is reused across sub-agents.

Third-party API providers offer even lower rates. Artificial Analysis lists 15 providers, with DeepInfra at $0.90 blended per 1M tokens being the cheapest.

Speed and Latency: The Weak Spot

Per Artificial Analysis benchmarking:

  • Output speed: 36.4 tokens/second (ranked #39 of 68 models)

  • Time to first token: 1.74 seconds

  • Verbosity: 89 million output tokens generated during evaluation — notably verbose

This is a genuine weakness. Kimi K2.5 is slow and verbose. Artificial Analysis's evaluation cost was $370.66 — inflated by the model's tendency to produce far more output tokens than necessary. If you're building latency-sensitive applications, this matters.

For comparison, MiMo-V2 Flash runs at 141.9 tokens/second — nearly 4x faster.

What's Actually Good

  • Agent Swarm is genuinely novel. No other open-weight model ships a built-in multi-agent orchestration system. The 4.5x speedup on complex tasks is real.

  • Math reasoning is best-in-class open-weight. 96.1% AIME and 95.4% HMMT beat every open-weight competitor and most proprietary models.

  • HLE score of 50.2% is exceptional. It beats Claude Opus 4.5 (32.0%) and GPT-5.2 High (41.7%) on one of the hardest reasoning benchmarks.

  • Pricing is extremely competitive. At $0.60/M input, you can run serious agentic workloads without burning through budget.

  • Open weights with commercial license. Modified MIT means you can self-host and customize.

  • Native multimodal. MoonViT-3D vision encoder handles images, documents, and video natively — not bolted on after the fact.

What's Not Good

  • Verbosity is a real problem. The model generates far too many tokens. This eats into the cost savings and slows down responses. Artificial Analysis flagged this explicitly.

  • Speed is mediocre. 36.4 tok/s puts it in the bottom half of available models. Not suitable for real-time chat applications.

  • Vision is strong but text-only output. Despite multimodal input, it can only output text — no image generation or editing.

  • Agent Swarm requires specific API integration. You can't just drop it into existing LLM toolchains and get the swarm behavior automatically.

  • Chinese-company origin raises deployment concerns. Some enterprises have compliance restrictions on Chinese AI models, regardless of the open-weight license.

Moonshot AI: The Company Behind Kimi

Moonshot AI's fundraising trajectory tells a story of explosive growth:

Date

Round

Amount

Valuation

2023

Seed

$60M

$300M

Feb 2024

Series B (Alibaba-led)

$1B

$2.5B

Jan 2026

Series C (IDG Capital-led)

$500M

$4.3B

Feb 2026

Extension (Alibaba + Tencent)

$700M+

~$12B

Mar 2026

Discussions

Up to $1B

$18B target

Moonshot became the fastest Chinese company to reach decacorn status ($10B+ valuation), achieving it in roughly two years. As of March 2026, the company is also reportedly considering an IPO on the Hong Kong Stock Exchange.

Who Should Use Kimi K2.5?

Use it if:

  • You need agentic capabilities (research, multi-step coding, complex analysis) at low cost

  • You want open weights you can self-host and fine-tune

  • Math or science reasoning is your primary use case

  • You're building multi-agent systems and want a model designed for parallel execution

Skip it if:

  • You need fast, concise responses for real-time chat

  • Latency is critical for your application

  • You need image/audio generation (text-only output)

  • Enterprise compliance restricts use of Chinese AI models

Bottom Line

Kimi K2.5 is the most interesting open-weight model of early 2026 — not because of raw benchmark numbers alone, but because Agent Swarm introduces a genuinely new capability. Deploying 100 parallel sub-agents with PARL-trained orchestration at $0.60/$3.00 per million tokens is a combination nobody else offers.

The weaknesses are real: it's slow, verbose, and the Agent Swarm integration isn't plug-and-play. But for developers building agentic applications who can tolerate higher latency, the price-to-performance ratio is unmatched.

With Moonshot AI's valuation rocketing from $4.3B to a potential $18B in three months, the market clearly agrees.

]]>

Sources

Share this article

Related Articles

Ready to automate your workflows?

Start building AI-powered automations with Serenities AI today.