An AI That Helped Build Itself

On March 18, 2026, Chinese AI lab MiniMax released M2.7 — and on April 12, open-sourced the weights. What makes M2.7 different from every other model release: it's the first model that actively participated in its own development cycle.

During training, MiniMax let M2.7 autonomously optimize its own programming scaffold over 100+ rounds — analyzing failure trajectories, modifying code, running evaluations, and deciding to keep or revert changes. The result: a 30% performance improvement with no human intervention. MiniMax calls this "self-evolution."

Architecture

Spec	Detail
Parameters	230 billion total
Active per token	10 billion (MoE)
Experts	256
Context window	200K tokens
Reasoning	Native chain-of-thought (extended thinking)
License	Non-commercial (commercial use requires separate agreement)
Release	March 18, 2026 (weights open-sourced April 12)

Note on licensing: Unlike MiniMax's earlier M2 (MIT) and M2.5 (Modified-MIT), M2.7 uses a MiniMax proprietary non-commercial license. You can download and experiment, but commercial deployment requires a separate agreement.

Benchmarks

M2.7 scores within striking distance of Claude Opus 4.6 and GPT-5.4 on the hardest coding benchmarks — at a fraction of the cost.

Coding & Software Engineering

Benchmark	M2.7	Claude Opus 4.6	GPT-5.4
SWE-Pro	56.22%	57.3%	57.7%
Terminal Bench 2	57.0%	65.4%	75.1%
VIBE-Pro	55.6%	~56%	—
Multi SWE Bench	52.7%	—	—
NL2Repo	39.8%	—	—
SWE Multilingual	76.5	—	—

On SWE-Pro — the industry's hardest coding benchmark — M2.7 is just 1.08 points behind Claude Opus 4.6 and 1.48 points behind GPT-5.4. For a 230B open-weight model with only 10B active parameters, this is remarkable.

Agent & Tool Use

Benchmark	M2.7	Notes
GDPval-AA	1495 Elo	Highest among open-source models
MM Claw	62.7%	Real-world work/life tasks
MLE Bench Lite	66.6% avg	ML competitions
Toolathon	46.3%	Tool interaction accuracy

The 1495 Elo on GDPval-AA — which evaluates professional domain expertise across 45 models — is the highest score among all open-source models. Only Claude Opus 4.6, Claude Sonnet 4.6, and GPT-5.4 score higher.

Overall Intelligence

From Artificial Analysis:

Metric	M2.7	Context
Intelligence Index	50	#3 of 71 models (average: 27)
Output Speed	46.6 t/s	Below average (median: 55.8 t/s)
TTFT	2.67s	Near average (median: 2.28s)
Token Usage	87M tokens	Very verbose (avg: 41M)

The Self-Evolution Story

This is what sets M2.7 apart from every other model. During its development:

MiniMax gave M2.7 access to its own training infrastructure
The model autonomously updated its own memory and built 40+ complex skills for RL experiments
It ran 100+ optimization rounds — analyzing failures, modifying code, running evaluations, and deciding what to keep
It achieved a 30% performance improvement on internal evaluations with no human intervention
It maintains a 97% skill adherence rate across 40+ complex skills (each exceeding 2,000 tokens)

Within MiniMax's RL team, M2.7 now handles 30–50% of daily workflows end-to-end. Researchers interact only for critical decisions while the model manages literature review, experiment tracking, data pipelines, debugging, and merge requests.

Agent Teams

M2.7 supports native multi-agent collaboration through what MiniMax calls Agent Teams. Multiple model instances maintain distinct role identities and work together on tasks — with stable role boundaries, adversarial reasoning, and behavioral differentiation between agents.

MiniMax also open-sourced OpenRoom, an interactive demo that places agent interactions inside a web GUI with real-time visual feedback.

Pricing

Model	Input/M	Output/M
MiniMax M2.7	$0.30	$1.20
GLM-5.1	$1.00	$3.20
GPT-5.4	$2.50	$15.00
Claude Opus 4.6	$5.00	$25.00

M2.7 is 17x cheaper than Claude Opus 4.6 on input and 21x cheaper on output. It's even 3.3x cheaper than GLM-5.1 on input. For a model scoring 56.22% on SWE-Pro (vs Opus's 57.3%), the cost-per-quality ratio is extraordinary.

Two API variants are available: M2.7 (standard) and M2.7-highspeed (same results, faster inference).

Availability

Platform	Status
Hugging Face	Open weights available
GitHub	MiniMax-AI/MiniMax-M2.7
Ollama	Available
NVIDIA NIM	Supported
SGLang	v0.5.10+
vLLM	v0.19.0+
OpenRouter	Available
MiniMax API	Available (+ Coding Plan subscription)

The Elephant in the Room: MiniMax and AI Theft

MiniMax is one of the three Chinese AI firms named in the OpenAI/Anthropic/Google coalition against adversarial distillation. Anthropic documented approximately 13 million unauthorized exchanges from MiniMax — roughly 81% of the total 16 million stolen exchanges across all three named firms.

This creates an uncomfortable context for M2.7's release. The model's impressive performance raises questions about what role, if any, distilled outputs from Claude and other US models played in its training. MiniMax has not publicly addressed the allegations.

For developers evaluating M2.7: the technical capabilities are real and independently measurable. But the ethical and legal landscape around the model is unsettled.

Strengths and Weaknesses

Strengths

Price-performance: 56.22% SWE-Pro at $0.30/$1.20 is unmatched
Self-evolution: Genuinely novel capability — no other open model does this
Agent Teams: Native multi-agent with role stability
Professional expertise: 1495 Elo GDPval-AA — highest open-source score
Efficiency: Only 10B active parameters from 230B total

Weaknesses

Non-commercial license: Unlike GLM-5.1 (MIT) or Gemma 4 (Apache 2.0), commercial use requires a separate deal
Speed: 46.6 t/s is below average for its class
Verbosity: 87M tokens on evaluation vs 41M average — it over-explains
Distillation controversy: The theft allegations create reputational and legal risk
Terminal Bench gap: 57.0% vs GPT-5.4's 75.1% — significant distance on terminal tasks

Bottom Line

MiniMax M2.7 is a technically impressive model at an absurdly low price point. At $0.30/$1.20 per million tokens, it delivers SWE-Pro performance within 1 point of Claude Opus 4.6 — a model that costs 17x more. The self-evolution capability is genuinely novel, and the agent teams feature fills a real gap for multi-agent workflows.

But the non-commercial license limits its appeal compared to truly open alternatives like GLM-5.1 (MIT) and Gemma 4 (Apache 2.0). And the ongoing distillation controversy — with MiniMax named as the primary offender in the Frontier Model Forum coalition — casts a shadow over the entire release.

For research and experimentation: M2.7 is a no-brainer at this price. For production: the licensing restrictions and reputational risk make GLM-5.1 or Gemma 4 safer bets.

MiniMax M2.7: The Self-Evolving AI Model That's Now Open Source — Full Breakdown