An AI That Helped Build Itself

On March 18, 2026, Chinese AI lab MiniMax released M2.7 — and on April 12, open-sourced the weights. What makes M2.7 different from every other model release: it's the first model that actively participated in its own development cycle.

During training, MiniMax let M2.7 autonomously optimize its own programming scaffold over 100+ rounds — analyzing failure trajectories, modifying code, running evaluations, and deciding to keep or revert changes. The result: a 30% performance improvement with no human intervention. MiniMax calls this "self-evolution."

Architecture

Spec

Detail

Parameters

230 billion total

Active per token

10 billion (MoE)

Experts

256

Context window

200K tokens

Reasoning

Native chain-of-thought (extended thinking)

License

Non-commercial (commercial use requires separate agreement)

Release

March 18, 2026 (weights open-sourced April 12)

Note on licensing: Unlike MiniMax's earlier M2 (MIT) and M2.5 (Modified-MIT), M2.7 uses a MiniMax proprietary non-commercial license. You can download and experiment, but commercial deployment requires a separate agreement.

Benchmarks

M2.7 scores within striking distance of Claude Opus 4.6 and GPT-5.4 on the hardest coding benchmarks — at a fraction of the cost.

Coding & Software Engineering

Benchmark

M2.7

Claude Opus 4.6

GPT-5.4

SWE-Pro

56.22%

57.3%

57.7%

Terminal Bench 2

57.0%

65.4%

75.1%

VIBE-Pro

55.6%

~56%

Multi SWE Bench

52.7%

NL2Repo

39.8%

SWE Multilingual

76.5

On SWE-Pro — the industry's hardest coding benchmark — M2.7 is just 1.08 points behind Claude Opus 4.6 and 1.48 points behind GPT-5.4. For a 230B open-weight model with only 10B active parameters, this is remarkable.

Agent & Tool Use

Benchmark

M2.7

Notes

GDPval-AA

1495 Elo

Highest among open-source models

MM Claw

62.7%

Real-world work/life tasks

MLE Bench Lite

66.6% avg

ML competitions

Toolathon

46.3%

Tool interaction accuracy

The 1495 Elo on GDPval-AA — which evaluates professional domain expertise across 45 models — is the highest score among all open-source models. Only Claude Opus 4.6, Claude Sonnet 4.6, and GPT-5.4 score higher.

Overall Intelligence

From Artificial Analysis:

Metric

M2.7

Context

Intelligence Index

50

#3 of 71 models (average: 27)

Output Speed

46.6 t/s

Below average (median: 55.8 t/s)

TTFT

2.67s

Near average (median: 2.28s)

Token Usage

87M tokens

Very verbose (avg: 41M)

The Self-Evolution Story

This is what sets M2.7 apart from every other model. During its development:

  1. MiniMax gave M2.7 access to its own training infrastructure

  2. The model autonomously updated its own memory and built 40+ complex skills for RL experiments

  3. It ran 100+ optimization rounds — analyzing failures, modifying code, running evaluations, and deciding what to keep

  4. It achieved a 30% performance improvement on internal evaluations with no human intervention

  5. It maintains a 97% skill adherence rate across 40+ complex skills (each exceeding 2,000 tokens)

Within MiniMax's RL team, M2.7 now handles 30–50% of daily workflows end-to-end. Researchers interact only for critical decisions while the model manages literature review, experiment tracking, data pipelines, debugging, and merge requests.

Agent Teams

M2.7 supports native multi-agent collaboration through what MiniMax calls Agent Teams. Multiple model instances maintain distinct role identities and work together on tasks — with stable role boundaries, adversarial reasoning, and behavioral differentiation between agents.

MiniMax also open-sourced OpenRoom, an interactive demo that places agent interactions inside a web GUI with real-time visual feedback.

Pricing

Model

Input/M

Output/M

MiniMax M2.7

$0.30

$1.20

GLM-5.1

$1.00

$3.20

GPT-5.4

$2.50

$15.00

Claude Opus 4.6

$5.00

$25.00

M2.7 is 17x cheaper than Claude Opus 4.6 on input and 21x cheaper on output. It's even 3.3x cheaper than GLM-5.1 on input. For a model scoring 56.22% on SWE-Pro (vs Opus's 57.3%), the cost-per-quality ratio is extraordinary.

Two API variants are available: M2.7 (standard) and M2.7-highspeed (same results, faster inference).

Availability

Platform

Status

Hugging Face

Open weights available

GitHub

MiniMax-AI/MiniMax-M2.7

Ollama

Available

NVIDIA NIM

Supported

SGLang

v0.5.10+

vLLM

v0.19.0+

OpenRouter

Available

MiniMax API

Available (+ Coding Plan subscription)

The Elephant in the Room: MiniMax and AI Theft

MiniMax is one of the three Chinese AI firms named in the OpenAI/Anthropic/Google coalition against adversarial distillation. Anthropic documented approximately 13 million unauthorized exchanges from MiniMax — roughly 81% of the total 16 million stolen exchanges across all three named firms.

This creates an uncomfortable context for M2.7's release. The model's impressive performance raises questions about what role, if any, distilled outputs from Claude and other US models played in its training. MiniMax has not publicly addressed the allegations.

For developers evaluating M2.7: the technical capabilities are real and independently measurable. But the ethical and legal landscape around the model is unsettled.

Strengths and Weaknesses

Strengths

  • Price-performance: 56.22% SWE-Pro at $0.30/$1.20 is unmatched

  • Self-evolution: Genuinely novel capability — no other open model does this

  • Agent Teams: Native multi-agent with role stability

  • Professional expertise: 1495 Elo GDPval-AA — highest open-source score

  • Efficiency: Only 10B active parameters from 230B total

Weaknesses

  • Non-commercial license: Unlike GLM-5.1 (MIT) or Gemma 4 (Apache 2.0), commercial use requires a separate deal

  • Speed: 46.6 t/s is below average for its class

  • Verbosity: 87M tokens on evaluation vs 41M average — it over-explains

  • Distillation controversy: The theft allegations create reputational and legal risk

  • Terminal Bench gap: 57.0% vs GPT-5.4's 75.1% — significant distance on terminal tasks

Bottom Line

MiniMax M2.7 is a technically impressive model at an absurdly low price point. At $0.30/$1.20 per million tokens, it delivers SWE-Pro performance within 1 point of Claude Opus 4.6 — a model that costs 17x more. The self-evolution capability is genuinely novel, and the agent teams feature fills a real gap for multi-agent workflows.

But the non-commercial license limits its appeal compared to truly open alternatives like GLM-5.1 (MIT) and Gemma 4 (Apache 2.0). And the ongoing distillation controversy — with MiniMax named as the primary offender in the Frontier Model Forum coalition — casts a shadow over the entire release.

For research and experimentation: M2.7 is a no-brainer at this price. For production: the licensing restrictions and reputational risk make GLM-5.1 or Gemma 4 safer bets.

Sources

Share this article

Related Articles

Ready to automate your workflows?

Start building AI-powered automations with Serenities AI today.