What Just Happened

On March 27, 2026, Z.ai (formerly Zhipu AI) released GLM-5.1 — an incremental upgrade to its flagship GLM-5 model that narrows the gap with Claude Opus 4.6 to just 2.6 points on coding benchmarks. If the numbers hold up under independent scrutiny, this is the closest any open-source model has come to matching the top proprietary coding model.

The timing is significant. Z.ai became the world's first publicly traded foundation model company after its Hong Kong IPO in January 2026, raising $558 million at a $6.6 billion IPO valuation (reaching $7.1 billion by first-day close). GLM-5.1 is the company's bid to prove that open-source models trained entirely on non-American hardware can compete at the frontier.

The Benchmark Numbers

Using Claude Code as the evaluation framework, Z.ai reports the following coding scores:

Model	Coding Score	vs. Opus 4.6
Claude Opus 4.6	47.9	Baseline
GLM-5.1	45.3	94.6%
GLM-5	35.4	73.9%

The jump from GLM-5 (35.4) to GLM-5.1 (45.3) represents a 28% improvement in a single point release — a massive leap that suggests significant post-training optimization rather than just minor tuning.

For broader context, here's how the GLM-5 base model (which GLM-5.1 builds on) performs on established third-party benchmarks:

Benchmark	GLM-5	Claude Opus 4.6	GPT-5.4
SWE-bench Verified	77.8%	81.4%	~80%
AIME 2026	92.7%	—	—
GPQA-Diamond	86.0%	—	—
Artificial Analysis Index	50	53	57

On Artificial Analysis, GLM-5 ranks as the highest-scoring open-weight model on the Intelligence Index. The SWE-bench Verified gap between Opus 4.6 (~81.4%) and GLM-5 (77.8%) is about 3.6 percentage points — a gap that would have been unthinkable from an open-source model six months ago.

Critical Caveat: Benchmarks Are Self-Reported

This needs to be stated clearly: the GLM-5.1 coding benchmark (45.3 points, 94.6% of Opus) is entirely self-reported by Z.ai. As of March 29, 2026, no independent third-party evaluation lab has published corroborating results for GLM-5.1 specifically.

There are additional concerns with the methodology:

The evaluation uses Claude Code as the test harness, which is an unconventional choice that makes cross-benchmark comparison difficult
GLM-5.1 launched just two days ago — there has been no time for the broader research community to replicate results
The specific scoring methodology has not been publicly detailed beyond what Z.ai has shared

That said, Z.ai has a track record of backing up internal numbers. The GLM-5 base model's 77.8% on SWE-bench Verified was externally validated — the highest score among all open-source models on that benchmark. So there is reason to take the GLM-5.1 claims seriously while awaiting confirmation.

Bottom line: Treat the 94.6% figure as a promising preliminary claim, not an established fact. Wait for independent evaluations before making workflow decisions based on it.

Architecture and Training

GLM-5.1 inherits the GLM-5 architecture, which is substantial:

Total parameters: 744 billion
Architecture: Mixture of Experts (MoE) with 256 experts, 8 active per token
Active parameters per inference: ~40–44 billion (~5.4–5.9% sparsity rate)
Context window: 200K tokens
Max output tokens: 131,072
Attention mechanism: DeepSeek Sparse Attention (DSA) for efficient long-context processing
Pre-training data: 28.5 trillion tokens

The most notable aspect of GLM-5's training is the hardware. The entire model family was trained on 100,000 Huawei Ascend 910B chips using the MindSpore framework — with zero NVIDIA GPU involvement. This is particularly significant given that Z.ai was placed on the US Entity List in January 2025, restricting its access to American chips.

The fact that a model trained entirely on non-NVIDIA hardware can reach within 3.6 points of Claude Opus 4.6 on SWE-bench Verified is one of the most significant developments in the AI hardware landscape this year.

Pricing: Where GLM-5.1 Changes the Math

This is where GLM-5.1 gets genuinely disruptive. The cost difference compared to proprietary alternatives is not incremental — it is an order of magnitude.

API Pricing (GLM-5 Base)

Model	Input (per 1M tokens)	Output (per 1M tokens)
GLM-5	$1.00	$3.20
GPT-5.4	$2.50	$15.00
Claude Opus 4.6	$5.00	$25.00

GLM-5 is 5x cheaper on input and nearly 8x cheaper on output compared to Claude Opus 4.6. For high-volume coding workflows, this difference compounds fast.

GLM Coding Plan (Subscription)

Z.ai also offers a subscription model specifically designed for coding workflows:

Plan	Price	Promo Price	Requests (per 5 hours)
Lite	$10/month	$3 first month	120
Pro	$30/month	$15 first month	600
Max	Higher tier	—	Expanded limits

The Coding Plan includes access to GLM-5.1, GLM-5, GLM-5-Turbo, and GLM-4.7, along with features like vision understanding, web search, and web reader — all compatible with Claude Code, Cline, and other popular coding tools.

Compare this to Claude Max at $100–$200/month or Claude Pro at $20/month with usage limits. For developers doing high-volume daily coding, the economics are hard to ignore.

Open Source Under MIT License

GLM-5 is already available on Hugging Face under the MIT license — the most permissive open-source license available. This means unrestricted commercial use, modification, and redistribution with no strings attached.

Z.ai's global head Zixuan Li has confirmed that GLM-5.1 will also be open-sourced under MIT, following the same precedent. The standalone GLM-5.1 API and open weights are expected within weeks, though no specific date has been announced.

For local deployment, the GLM-5 family is already supported by:

vLLM and SGLang for inference
KTransformers and xLLM for local deployment
Ollama — available as of April 2026
NVIDIA NVFP4 quantized version for optimized inference
GGUF format for llama.cpp compatibility
MLX format for Apple Silicon

At 744 billion parameters (1.51TB on Hugging Face), this is not a model you run on a laptop. But for teams with GPU infrastructure or cloud deployments, the MIT license means zero per-token costs beyond your own compute.

Who Is Z.ai?

For readers unfamiliar with the company, here is the quick background:

Founded: 2019, spun out of Tsinghua University by professors Tang Jie and Li Juanzi
Headquarters: Beijing, China (international brand: Z.ai)
IPO: January 8, 2026 on the Hong Kong Stock Exchange — the world's first publicly traded foundation model company
Market cap: ~$31 billion (as of March 2026)
Total funding raised: $1.4 billion+ across 12 rounds, plus $558 million IPO
Investors: Alibaba, Tencent, Ant Group, Meituan, Xiaomi, Saudi Aramco's Prosperity7 Ventures
Revenue: ~$53 million trailing twelve months (as of December 2025), with 325% year-over-year growth
Entity List: Added to the US export control Entity List in January 2025

Z.ai is considered one of China's "AI Tigers" alongside MiniMax and Moonshot AI. The company's stock surged nearly 30% after the GLM-5 release in February 2026, though it later fell 23% amid compute resource shortages that led to user complaints and temporarily restricted new signups.

The Rapid Release Cadence

Z.ai has been shipping at an aggressive pace:

July 2025: GLM-4.5
September 2025: GLM-4.6
December 2025: GLM-4.7
February 11, 2026: GLM-5 (the flagship release)
March 15, 2026: GLM-5-Turbo
March 27, 2026: GLM-5.1

Six model releases in nine months. The February-to-March window alone saw three releases. This cadence reflects both competitive pressure in the Chinese AI market and Z.ai's ambition to establish GLM as a serious alternative to Claude and GPT for coding workflows.

What GLM-5.1 Means for Developers

The Cost Arbitrage Play

Several early adopters and commentators are converging on a practical strategy: use GLM for daily coding tasks, reserve Claude Opus for complex or high-stakes work. At $3–$30/month for the GLM Coding Plan versus $100–$200/month for Claude Max, the cost savings are substantial if GLM-5.1 delivers 90%+ of Opus-level quality for routine work.

The Open-Source Milestone

If GLM-5.1's coding performance is independently confirmed at or near the claimed levels, it represents the first time an open-source model has reached within 5% of the top proprietary coding model. For organizations that need to self-host AI models due to data sovereignty, compliance, or cost requirements, this changes the calculus significantly.

The Hardware Story

For the broader AI industry, the fact that a frontier-competitive model was trained entirely on Huawei Ascend chips — with zero NVIDIA involvement — challenges the assumption that NVIDIA hardware is required for cutting-edge AI training. This has implications for the global chip supply chain, export controls, and the competitive dynamics of the AI industry beyond just model capabilities.

April 2026 Update: GLM-5.1 Tops SWE-Bench Pro Leaderboard

Update (April 7, 2026): Benchmark results are in — and they exceed the original claims. GLM-5.1 scored 58.4% on SWE-Bench Pro, the industry's most rigorous coding evaluation. This makes it the #1 model on the SWE-Bench Pro leaderboard, beating:

GPT-5.4: 57.7%
Claude Opus 4.6: 57.3%
Gemini 3.1 Pro: 54.2%

GLM-5.1 is the first Chinese model — and the first open-source model — to top SWE-Bench Pro. The benchmark evaluates real-world GitHub issue resolution using a 200K token context window.

Additional Verified Benchmarks (April 2026)

Benchmark	GLM-5.1	GPT-5.4	Claude Opus 4.6
SWE-Bench Pro	58.4%	57.7%	57.3%
CyberGym	68.7%	—	66.6%
Terminal-Bench 2.0	63.5% (66.5% with Claude Code scaffold)	—	—
HLE (Reasoning)	31.0%	39.8%	36.7%
GPQA-Diamond	86.2%	92.0%	91.3%
AIME 2026	95.3%	98.7%	—

Also new: GLM-5.1 scores 42.7% on NL2Repo (up from GLM-5's 35.9%) — a benchmark that tests generating entire repositories from natural language specifications.

The pattern: GLM-5.1 leads on practical software engineering and cybersecurity tasks but trails US models on pure reasoning and math. For most coding workflows, this trade-off favors GLM-5.1 — especially at $1.00/$3.20 per million tokens vs Claude's $5/$25.

Leaderboard Context: Claude Mythos Preview at 77.8%

GLM-5.1's time at #1 may be temporary. Claude Mythos Preview — Anthropic's unreleased next-generation model — scores 77.8% on SWE-Bench Pro, a 19.4-point gap. But Mythos isn't generally available, has no public pricing, and is limited to early access testers. For models you can actually use today, GLM-5.1 leads.

8-Hour Autonomous Coding Sessions

GLM-5.1 is also designed for extended agentic work — autonomously maintaining goal alignment for up to 8 hours per task across thousands of tool calls. Z.ai calls this the shift "from vibe coding to agentic engineering." Agentic benchmarks show strong results: BrowseComp 68.0%, MCP-Atlas 71.8%, and τ³-Bench 70.6%.

Independent Verification: Code Arena 1530 Elo (3rd Place)

Update (April 10, 2026): Arena.ai (formerly LMArena, initiated by UC Berkeley) has independently confirmed GLM-5.1 at 1530 Elo on Code Arena — placing it 3rd globally on their agentic webdev leaderboard. This is the first major independent verification of GLM-5.1's coding capabilities, lending credibility to Z.ai's self-reported benchmark claims.

What We Still Don't Know

Several important questions remain unanswered:

Independent benchmark verification — Will third-party evaluations confirm the 94.6% claim? The model is two days old. This is the biggest unknown.
Performance on non-coding tasks — The 45.3 score is coding-specific. How does GLM-5.1 perform on general reasoning, creative writing, analysis, and other tasks compared to GLM-5?
Open-source timeline — Z.ai has confirmed MIT licensing but not a specific release date for GLM-5.1 weights. "Within weeks" is vague.
Compute availability — Z.ai experienced capacity issues after the GLM-5 launch, restricting new signups. Can they handle the demand for GLM-5.1?
Real-world coding quality — Benchmark scores test specific patterns. How does GLM-5.1 perform on real-world codebases, debugging sessions, and multi-file refactoring?

How to Use GLM-5.1 With Your Workflow

GLM-5.1 is compatible with the tools most developers already use:

Claude Code — Supported as a model provider
Cline — Compatible via API
Standard OpenAI-compatible API format — Works with any tool that supports the OpenAI chat completions API

For developers using platforms like Serenities AI with its BYOAI (Bring Your Own AI) model, GLM-5.1 can be connected via API key alongside Claude, GPT, Gemini, DeepSeek, MiniMax, and other providers. This lets you switch between models freely — using GLM for high-volume daily work and Claude or GPT for tasks that require the absolute best reasoning performance — all within the same project.

The Bottom Line

GLM-5.1 is either a landmark moment for open-source AI or an overpromised point release — and we won't know which until independent benchmarks arrive. What we do know:

The base GLM-5 model has externally validated performance that puts it within 3.6 points of Claude Opus 4.6 on SWE-bench Verified
The pricing is 5–8x cheaper than Claude Opus 4.6 on a per-token basis
The MIT license means no vendor lock-in and zero per-token costs for self-hosted deployments
The Huawei-only training pipeline is a geopolitically significant demonstration that frontier AI does not require American silicon
The coding-specific claims (94.6% of Opus) are unverified by third parties and should be treated as preliminary

For developers: watch for independent evaluations over the coming weeks. If the numbers hold, GLM-5.1 at $3/month could become the most cost-effective coding model available. If they don't, the GLM-5 base model is still a formidable open-source option at a fraction of proprietary pricing.

Either way, the gap between open-source and proprietary AI models is closing faster than anyone predicted.

GLM-5.1: Zhipu's Open-Source Model Scores 94.6% of Claude Opus 4.6 in Coding