An AI That Helped Build Itself
On March 18, 2026, Chinese AI lab MiniMax released M2.7 — and on April 12, open-sourced the weights. What makes M2.7 different from every other model release: it's the first model that actively participated in its own development cycle.
During training, MiniMax let M2.7 autonomously optimize its own programming scaffold over 100+ rounds — analyzing failure trajectories, modifying code, running evaluations, and deciding to keep or revert changes. The result: a 30% performance improvement with no human intervention. MiniMax calls this "self-evolution."
Architecture
Spec | Detail |
|---|---|
Parameters | 230 billion total |
Active per token | 10 billion (MoE) |
Experts | 256 |
Context window | 200K tokens |
Reasoning | Native chain-of-thought (extended thinking) |
License | Non-commercial (commercial use requires separate agreement) |
Release | March 18, 2026 (weights open-sourced April 12) |
Note on licensing: Unlike MiniMax's earlier M2 (MIT) and M2.5 (Modified-MIT), M2.7 uses a MiniMax proprietary non-commercial license. You can download and experiment, but commercial deployment requires a separate agreement.
Benchmarks
M2.7 scores within striking distance of Claude Opus 4.6 and GPT-5.4 on the hardest coding benchmarks — at a fraction of the cost.
Coding & Software Engineering
Benchmark | M2.7 | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|
SWE-Pro | 56.22% | 57.3% | 57.7% |
Terminal Bench 2 | 57.0% | 65.4% | 75.1% |
VIBE-Pro | 55.6% | ~56% | — |
Multi SWE Bench | 52.7% | — | — |
NL2Repo | 39.8% | — | — |
SWE Multilingual | 76.5 | — | — |
On SWE-Pro — the industry's hardest coding benchmark — M2.7 is just 1.08 points behind Claude Opus 4.6 and 1.48 points behind GPT-5.4. For a 230B open-weight model with only 10B active parameters, this is remarkable.
Agent & Tool Use
Benchmark | M2.7 | Notes |
|---|---|---|
GDPval-AA | 1495 Elo | Highest among open-source models |
MM Claw | 62.7% | Real-world work/life tasks |
MLE Bench Lite | 66.6% avg | ML competitions |
Toolathon | 46.3% | Tool interaction accuracy |
The 1495 Elo on GDPval-AA — which evaluates professional domain expertise across 45 models — is the highest score among all open-source models. Only Claude Opus 4.6, Claude Sonnet 4.6, and GPT-5.4 score higher.
Overall Intelligence
From Artificial Analysis:
Metric | M2.7 | Context |
|---|---|---|
Intelligence Index | 50 | #3 of 71 models (average: 27) |
Output Speed | 46.6 t/s | Below average (median: 55.8 t/s) |
TTFT | 2.67s | Near average (median: 2.28s) |
Token Usage | 87M tokens | Very verbose (avg: 41M) |
The Self-Evolution Story
This is what sets M2.7 apart from every other model. During its development:
MiniMax gave M2.7 access to its own training infrastructure
The model autonomously updated its own memory and built 40+ complex skills for RL experiments
It ran 100+ optimization rounds — analyzing failures, modifying code, running evaluations, and deciding what to keep
It achieved a 30% performance improvement on internal evaluations with no human intervention
It maintains a 97% skill adherence rate across 40+ complex skills (each exceeding 2,000 tokens)
Within MiniMax's RL team, M2.7 now handles 30–50% of daily workflows end-to-end. Researchers interact only for critical decisions while the model manages literature review, experiment tracking, data pipelines, debugging, and merge requests.
Agent Teams
M2.7 supports native multi-agent collaboration through what MiniMax calls Agent Teams. Multiple model instances maintain distinct role identities and work together on tasks — with stable role boundaries, adversarial reasoning, and behavioral differentiation between agents.
MiniMax also open-sourced OpenRoom, an interactive demo that places agent interactions inside a web GUI with real-time visual feedback.
Pricing
Model | Input/M | Output/M |
|---|---|---|
MiniMax M2.7 | $0.30 | $1.20 |
GLM-5.1 | $1.00 | $3.20 |
GPT-5.4 | $2.50 | $15.00 |
Claude Opus 4.6 | $5.00 | $25.00 |
M2.7 is 17x cheaper than Claude Opus 4.6 on input and 21x cheaper on output. It's even 3.3x cheaper than GLM-5.1 on input. For a model scoring 56.22% on SWE-Pro (vs Opus's 57.3%), the cost-per-quality ratio is extraordinary.
Two API variants are available: M2.7 (standard) and M2.7-highspeed (same results, faster inference).
Availability
Platform | Status |
|---|---|
Hugging Face | |
GitHub | |
Ollama | |
NVIDIA NIM | Supported |
SGLang | v0.5.10+ |
vLLM | v0.19.0+ |
OpenRouter | |
MiniMax API | Available (+ Coding Plan subscription) |
The Elephant in the Room: MiniMax and AI Theft
MiniMax is one of the three Chinese AI firms named in the OpenAI/Anthropic/Google coalition against adversarial distillation. Anthropic documented approximately 13 million unauthorized exchanges from MiniMax — roughly 81% of the total 16 million stolen exchanges across all three named firms.
This creates an uncomfortable context for M2.7's release. The model's impressive performance raises questions about what role, if any, distilled outputs from Claude and other US models played in its training. MiniMax has not publicly addressed the allegations.
For developers evaluating M2.7: the technical capabilities are real and independently measurable. But the ethical and legal landscape around the model is unsettled.
Strengths and Weaknesses
Strengths
Price-performance: 56.22% SWE-Pro at $0.30/$1.20 is unmatched
Self-evolution: Genuinely novel capability — no other open model does this
Agent Teams: Native multi-agent with role stability
Professional expertise: 1495 Elo GDPval-AA — highest open-source score
Efficiency: Only 10B active parameters from 230B total
Weaknesses
Non-commercial license: Unlike GLM-5.1 (MIT) or Gemma 4 (Apache 2.0), commercial use requires a separate deal
Speed: 46.6 t/s is below average for its class
Verbosity: 87M tokens on evaluation vs 41M average — it over-explains
Distillation controversy: The theft allegations create reputational and legal risk
Terminal Bench gap: 57.0% vs GPT-5.4's 75.1% — significant distance on terminal tasks
Bottom Line
MiniMax M2.7 is a technically impressive model at an absurdly low price point. At $0.30/$1.20 per million tokens, it delivers SWE-Pro performance within 1 point of Claude Opus 4.6 — a model that costs 17x more. The self-evolution capability is genuinely novel, and the agent teams feature fills a real gap for multi-agent workflows.
But the non-commercial license limits its appeal compared to truly open alternatives like GLM-5.1 (MIT) and Gemma 4 (Apache 2.0). And the ongoing distillation controversy — with MiniMax named as the primary offender in the Frontier Model Forum coalition — casts a shadow over the entire release.
For research and experimentation: M2.7 is a no-brainer at this price. For production: the licensing restrictions and reputational risk make GLM-5.1 or Gemma 4 safer bets.
Sources
MiniMax M2.7: Early Echoes of Self-Evolution — MiniMax (Official Blog)
MiniMax Just Open Sourced M2.7: A Self-Evolving Agent Model — MarkTechPost
MiniMax-M2.7 — Intelligence, Performance & Price Analysis — Artificial Analysis
MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms — NVIDIA Technical Blog
MiniMax Open Sources M2.7, a Self-Evolving Agent Model — Unite.AI