Every Frontier Model's API Price, Side by Side
API pricing changes fast. This guide covers every major coding-capable model's pricing as of March 30, 2026, verified directly from official documentation and pricing pages.
The Master Pricing Table
All prices are per 1 million tokens (standard context, no caching or batch discounts):
| Model | Input (per MTok) | Output (per MTok) | Context Window | License |
|---|---|---|---|---|
| DeepSeek V3.2 | $0.28 | $0.42 | 164K | MIT |
| Qwen 3.5 Flash | $0.065 | $0.26 | 1M | Apache 2.0 |
| Qwen 3.5 Plus | ~$0.26 | ~$1.56 | 1M | Apache 2.0 |
| Mistral Large 3 | $0.50 | $1.50 | 262K | Apache 2.0 |
| GLM-5 | $1.00 | $3.20 | 200K | MIT |
| GLM-5-Turbo | $1.20 | $4.00 | 200K | Proprietary (API) |
| GLM-5-Code | $1.20 | $5.00 | 200K | Proprietary (API) |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M | Proprietary |
| GPT-5.4 | $2.50 | $15.00 | 1M | Proprietary |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | Proprietary |
| Claude Opus 4.6 | $5.00 | $25.00 | 1M | Proprietary |
Sources: docs.z.ai (GLM), platform.claude.com (Claude), developers.openai.com (GPT), ai.google.dev (Gemini), api-docs.deepseek.com (DeepSeek), openrouter.ai (Qwen, Mistral).
What These Numbers Actually Mean
Raw per-token prices don't tell the full story. What matters is cost per task. A typical coding task involves:
- Small task (fix a bug, write a function): ~2K input + ~1K output = 3K tokens
- Medium task (build a component, refactor a file): ~10K input + ~5K output = 15K tokens
- Large task (architect a feature, multi-file changes): ~50K input + ~20K output = 70K tokens
Cost Per Task Comparison
| Model | Small Task (~3K tok) | Medium Task (~15K tok) | Large Task (~70K tok) |
|---|---|---|---|
| DeepSeek V3.2 | $0.001 | $0.005 | $0.022 |
| Qwen 3.5 Flash | $0.0004 | $0.002 | $0.008 |
| Mistral Large 3 | $0.003 | $0.013 | $0.055 |
| GLM-5 | $0.005 | $0.024 | $0.114 |
| Gemini 3.1 Pro | $0.016 | $0.080 | $0.340 |
| GPT-5.4 | $0.020 | $0.100 | $0.425 |
| Claude Opus 4.6 | $0.035 | $0.175 | $0.750 |
Claude Opus 4.6 costs 34× more per large task than DeepSeek V3.2 and 7× more than GLM-5. That doesn't mean DeepSeek is better — it means you're paying for quality. The question is how much quality you need for each task.
Savings With Caching and Batch APIs
Every major provider offers ways to cut costs for high-volume use:
| Provider | Cache Discount | Batch API Discount | Best Combined Price (Input) |
|---|---|---|---|
| Claude Opus 4.6 | 90% on cache reads | 50% | $0.25/MTok (batch + cache) |
| GPT-5.4 | 50% on cached input | 50% | $0.625/MTok |
| Gemini 3.1 Pro | 90% on cache reads | N/A | $0.20/MTok (cache only) |
| DeepSeek V3.2 | 90% on cache hits | N/A | $0.028/MTok |
| GLM-5 | 80% on cached input | N/A | $0.20/MTok |
With aggressive caching, Claude Opus drops from $5.00 to $0.25 per million input tokens — a 95% reduction. Gemini 3.1 Pro drops from $2.00 to $0.20. If you're building a product with repeated context (system prompts, codebase context), caching changes the economics dramatically.
Quality vs Cost: The SWE-bench Reality Check
Cheaper isn't always better. Here's how these models rank on SWE-bench Verified, the standard benchmark for real-world code editing:
| Model | SWE-bench Verified | Output Cost/MTok | Cost-Efficiency Ratio |
|---|---|---|---|
| Claude Opus 4.6 | ~80.8% | $25.00 | 3.2% per dollar |
| Gemini 3.1 Pro | 78.8% | $12.00 | 6.6% per dollar |
| GPT-5.4 | 78.2% | $15.00 | 5.2% per dollar |
| GLM-5 | 77.8% | $3.20 | 24.3% per dollar |
| Qwen 3.5 | 76.4% | $2.34 | 32.6% per dollar |
| DeepSeek V3.2 | 72–74% | $0.42 | 173% per dollar |
GLM-5 offers the best balance of quality and cost among frontier models — 77.8% SWE-bench at $3.20/MTok output is 7.8× more cost-efficient than Claude Opus. DeepSeek V3.2 is the absolute cheapest but trades off 8+ points of SWE-bench accuracy.
Subscription Plans vs Pay-as-You-Go
For developers who code daily, subscription plans often beat API pricing:
| Plan | Monthly Cost | Best Model | Break-Even vs API |
|---|---|---|---|
| Claude Pro | $20 | Opus 4.6 | ~$0.80 of API usage/day |
| Claude Max 5× | $100 | Opus 4.6 | ~$3.30 of API usage/day |
| ChatGPT Plus | $20 | GPT-5.4 | ~$0.80 of API usage/day |
| GLM Coding Lite | ~$10 | GLM-5.1 | ~$0.33 of API usage/day |
| GLM Coding Pro | ~$30 | GLM-5.1 + GLM-5 | ~$1.00 of API usage/day |
If you use your coding AI for more than 30 minutes a day, a subscription almost always beats pay-as-you-go pricing.
The BYOAI Strategy
If you're building apps on a platform that supports BYOAI (Bring Your Own AI), you can route different tasks to different models based on cost and complexity:
- Boilerplate and simple edits: Qwen 3.5 Flash ($0.065/$0.26) or DeepSeek V3.2 ($0.28/$0.42)
- Complex logic and architecture: Claude Opus 4.6 ($5/$25) or GPT-5.4 ($2.50/$15)
- High-volume agentic tasks: GLM-5 ($1/$3.20) — best frontier-quality-per-dollar
Platforms like Serenities AI support BYOAI with no AI markup — you connect your own API key from any provider and pay only the provider's rates. Combined with batteries-included features (database, auth, storage, automation at $9–$24/month), you can run a full development stack without accumulating separate service subscriptions.
Hidden Costs to Watch
- Long-context surcharges: GPT-5.4 doubles input pricing beyond 272K tokens and adds 1.5× on output. Gemini 3.1 Pro doubles input beyond 200K tokens. Claude removed its long-context premium on March 13, 2026 — the full 1M window is now at standard rates. Budget accordingly for large codebase ingestion.
- Thinking tokens: Gemini 3.1 Pro's chain-of-thought reasoning generates internal tokens billed at output rates. A simple prompt can consume 3–5× more tokens than expected.
- Rate limits: Cheap models may throttle under heavy load. DeepSeek V3.2 gives 5M free tokens on signup with no hard rate limit, but may slow responses during high-traffic periods.
- Speed costs money: Claude Opus 4.6 Fast Mode costs 6× standard ($30/$150 per MTok). GPT-5.4 Pro costs 12× standard ($30/$180). Factor speed requirements into your budget.
Bottom Line
In March 2026, the pricing landscape for AI coding models spans a 100× range — from Qwen 3.5 Flash at $0.065 input to Claude Opus 4.6 Fast Mode at $30 input. The right choice depends on your task complexity, volume, and speed requirements.
For most developers, the sweet spot is a tiered approach: use a cheap model for routine tasks and escalate to a frontier model for hard problems. The GLM-5 family at $1/$3.20 occupies a unique position — frontier-level SWE-bench scores at mid-tier pricing — making it the strongest value proposition for developers who need quality without the Claude/GPT price tag.