Introduction

On April 16, 2026, Anthropic released Claude Opus 4.7 — its most capable generally available AI model to date. Arriving just two months after Opus 4.6 (which itself arrived two months after Opus 4.5), the release maintains Anthropic's consistent upgrade cadence. While the company positions it as second only to the restricted Claude Mythos Preview, Opus 4.7 delivers significant gains in coding, vision, agentic reasoning, and tool use over its predecessor.

In this comprehensive guide, we break down everything you need to know: benchmarks, new features, breaking API changes, pricing, competitive positioning, and migration guidance.

Claude Opus 4.7 at a Glance

SpecDetail
Model IDclaude-opus-4-7 (API), anthropic.claude-opus-4-7 (Bedrock Mantle), anthropic.claude-opus-4-7-v1:0 (Bedrock Runtime)
Context Window1M input tokens / 128K max output tokens
Pricing$5 / 1M input tokens, $25 / 1M output tokens (same as Opus 4.6)
Batch API50% discount on input and output tokens
AvailabilityClaude.ai, Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, GitHub Copilot
Consumer AccessClaude Pro ($20/month) and Max plans via claude.ai and Claude Code
Release DateApril 16, 2026

Benchmark Performance: The Numbers

Claude Opus 4.7 posts strong improvements across virtually every major benchmark. Here's how it stacks up.

Coding Benchmarks

Coding is where Opus 4.7 shines brightest. The gains over Opus 4.6 are substantial:

  • SWE-bench Verified: 87.6% (up from 80.8% on Opus 4.6) — a 6.8 percentage point jump
  • SWE-bench Pro: 64.3% (up from 53.4% on Opus 4.6), ahead of GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%
  • CursorBench: 70% (up from 58% on Opus 4.6), measuring autonomous coding performance in the Cursor AI code editor
  • On Hex's 93-task coding benchmark, Opus 4.7 delivered a 13% resolution improvement over Opus 4.6, including solving four tasks that neither Opus 4.6 nor Sonnet 4.6 could crack

Reasoning and Knowledge

  • GPQA Diamond (graduate-level reasoning): 94.2% — effectively tied with GPT-5.4 Pro (94.4%) and Gemini 3.1 Pro (94.3%). The frontier models have saturated this benchmark, with differences within noise.
  • Finance Agent v1.1: 64.4% (up from 60.7% on Opus 4.6), state-of-the-art at the time of release

Agentic and Tool Use

  • Terminal-Bench 2.0: 69.4% (up from 65.4% on Opus 4.6, though trailing GPT-5.4's 75.1%)
  • MCP-Atlas (scaled tool use): 77.3%
  • OSWorld-Verified (computer use): 78.0%
  • Notion AI observed a 14% improvement over Opus 4.6 on complex multi-step workflows, at fewer tokens and one-third of the tool errors — and called Opus 4.7 "the first model to pass our implicit-need tests"

Vision

  • CharXiv (visual reasoning): 82.1% without tools, 91.0% with tools

Where Opus 4.7 Trails

No model wins everywhere. Here's where competitors still lead:

  • BrowseComp (agentic search): Dropped from 83.7% to 79.3%, trailing Gemini 3.1 Pro (85.9%) and GPT-5.4 Pro (89.3%)
  • Terminal-Bench 2.0: 69.4% versus GPT-5.4's 75.1%
  • Claude Mythos Preview still leads by a wide margin at 77.8% on SWE-bench Pro versus Opus 4.7's 64.3%

Key New Features

1. High-Resolution Image Support (3x Vision Upgrade)

Opus 4.7 is the first Claude model with high-resolution image support. Maximum image resolution jumped from approximately 1.15 megapixels (1568px) to approximately 3.75 megapixels (2576px) — more than 3x the visual capacity of previous Claude models (3.3x the pixel area).

Beyond raw resolution, the model also improves on:

  • Low-level perception: pointing, measuring, and counting
  • Image localization: natural-image bounding-box localization and detection

This makes Opus 4.7 significantly more capable for document analysis, diagram interpretation, UI screenshot understanding, and any workflow that depends on visual detail.

2. Adaptive Thinking (Replaces Extended Thinking)

Extended thinking budgets have been completely removed in Opus 4.7. Adaptive thinking is now the only thinking-on mode.

In Anthropic's internal evaluations, adaptive thinking consistently outperformed the fixed-budget approach because the model allocates reasoning tokens dynamically based on task difficulty — spending more tokens on hard problems and fewer on easy ones.

Key details:

  • Adaptive thinking is off by default — set thinking: {"type": "adaptive"} to enable it
  • Thinking content is omitted from the response by default. Thinking blocks still appear in the stream, but their content will be empty unless the caller explicitly opts in
  • If your product streams reasoning to users, set "display": "summarized" to restore visible progress during thinking

3. New "xhigh" Effort Level

Alongside Opus 4.7, Anthropic rolled out a new "xhigh" effort level sitting between high and max. This gives developers finer control over the trade-off between reasoning depth and latency.

The xhigh level is specifically recommended for coding and agentic use cases where you need more reasoning than high provides but don't want the full cost and latency of max.

A notable finding highlighted on Anthropic's blog: low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6, suggesting meaningful efficiency gains at every effort tier.

4. Task Budgets (Public Beta)

Task budgets entered public beta on the Claude Platform. They let developers cap token spend on autonomous agents to prevent runaway bills on long-running jobs.

A task budget provides a rough estimate of how many tokens to target for a full agentic loop, including thinking, tool calls, tool results, and final output. This is particularly valuable for production deployments where cost predictability matters.

5. New Tokenizer

Opus 4.7 ships with a new tokenizer that contributes to its improved performance across a wide range of tasks. However, there's a cost implication: the new tokenizer may use roughly 1x to 1.35x as many tokens when processing the same text compared to previous models — up to approximately 35% more, varying by content type.

Per-token prices remain flat, but the same prompt may cost more due to higher token counts. Anthropic recommends testing your workloads before switching production traffic and updating max_tokens parameters to give additional headroom.

Breaking API Changes: What Developers Must Know

Opus 4.7 introduces several breaking changes that will require code updates for existing integrations. This is not a drop-in replacement.

Temperature and Sampling Parameters Removed

Starting with Claude Opus 4.7, setting temperature, top_p, or top_k to any non-default value will return a 400 error. The safest migration path is to omit these parameters entirely from your API requests and use prompting to guide the model's behavior instead.

Extended Thinking Budgets Removed

Setting thinking: {"type": "enabled", "budget_tokens": N} will return a 400 error. Migrate to:

  • thinking: {"type": "adaptive"}
  • output_config: {"effort": "high"}

Other Migration Steps

  • Remove beta header: The interleaved-thinking-2025-05-14 beta header is no longer needed — adaptive thinking enables interleaved thinking automatically
  • Migrate output_format: Move from output_format to output_config.format
  • Update max_tokens: Give additional headroom due to the new tokenizer's higher token counts, including compaction triggers

Prefilling Assistant Messages Removed

Prefilling assistant messages (ending the messages array with role: "assistant") now returns a 400 error on Opus 4.7. This change originally landed with Opus 4.6 and has caused real-world breakage across frameworks including CrewAI, LiveKit, and LangChain.

Recommended alternatives:

  • Structured outputs: Use output_config.format with json_schema — more reliable than prefilling on all models
  • System prompt instructions: Move persona and format guidance into the system prompt
  • Native tool use: Use tool calling instead of prefill-based function call patterns

If you use Claude Managed Agents, there are no breaking API changes.

Migration Effort Recommendations

  • For agentic coding, frontend design, tool-heavy workflows, and complex enterprise workflows: start with medium effort
  • If latency is too high, reduce to low
  • If you need higher intelligence, increase to high or xhigh

Cybersecurity Safeguards

Anthropic said it experimented with efforts to "differentially reduce" Claude Opus 4.7's cyber capabilities during training. This is a deliberate choice — scaling back the model's ability to assist with potentially harmful cybersecurity tasks while maintaining its broad utility elsewhere.

The company stated: "We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses."

This approach represents a notable shift in how frontier AI labs handle dual-use capabilities — rather than simply adding post-training filters, Anthropic adjusted the training process itself to reduce risky capabilities at the model level.

Claude Mythos Preview: The Elephant in the Room

Anthropic repeatedly positions Opus 4.7 as "less broadly capable than our most powerful model, Claude Mythos Preview."

Mythos Preview was unveiled earlier in April 2026 under Project Glasswing — named after the glasswing butterfly with its transparent wings, symbolizing the initiative's commitment to transparency and vulnerability disclosure. Launch partners include AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, with access extended to roughly 40 additional organizations responsible for building or maintaining critical software infrastructure — over 50 organizations in total. Anthropic is committing up to $100 million in usage credits and $4 million in direct donations to open-source security organizations. The company has stated that it does not plan to make Mythos Preview generally available.

The performance gap is significant. On SWE-bench Pro alone, Mythos Preview scores 77.8% versus Opus 4.7's 64.3% — a 13.5 percentage point lead. This makes Mythos Preview by far the most capable coding AI model in existence, but one that most developers and companies cannot access.

This unusual strategy — publicly acknowledging a more powerful model while restricting it from general release — reflects Anthropic's broader approach to responsible scaling and its emphasis on safety testing before wide deployment.

Competitive Positioning: Opus 4.7 vs. the Field

In Anthropic's own benchmarking charts, Opus 4.7 beats Opus 4.6, ChatGPT 5.4, and Gemini 3.1 Pro in most key categories.

Where Opus 4.7 Leads (Among GA Models)

  • SWE-bench Verified: 87.6%
  • SWE-bench Pro: 64.3%
  • MCP-Atlas (scaled tool use): 77.3%
  • OSWorld-Verified (computer use): 78.0%
  • Finance Agent v1.1: 64.4%
  • CharXiv (visual reasoning): 82.1% (without tools), 91.0% (with tools)

Where Competitors Lead

  • BrowseComp (agentic web search): GPT-5.4 Pro at 89.3%, Gemini 3.1 Pro at 85.9% — both ahead of Opus 4.7's 79.3%
  • Terminal-Bench 2.0: GPT-5.4 at 75.1% versus Opus 4.7's 69.4%
  • GPQA Diamond: Effectively a three-way tie (94.2% vs 94.4% vs 94.3%) — the frontier has converged here

Pricing and Cost Considerations

Pricing stays flat at $5 per million input tokens and $25 per million output tokens — identical to Opus 4.6. There is no long-context premium for the 1M context window.

Cost-Saving Options

  • Prompt Caching: Up to 90% savings on repeated context. Cache reads are priced at $0.50 per million tokens — 10x cheaper than fresh input tokens. Workloads with long, stable system prompts or reused document context can absorb the tokenizer change and still come out ahead.
  • Batch API: 50% discount on both input and output tokens for asynchronous processing. Opus 4.7 batch pricing: $2.50 per million input, $12.50 per million output. Ideal for nightly summarization, evaluation sweeps, backfills, and anything where a minutes-to-hours SLA is acceptable.

However, the new tokenizer is the main cost wildcard. Since the same text may tokenize into up to 35% more tokens, your effective cost per prompt could increase even though per-token pricing hasn't changed. Anthropic recommends benchmarking your specific workloads to understand the impact before migrating production traffic.

Prompting interventions, task_budget, and the effort parameter can all help control costs — though Anthropic notes these controls may trade off model intelligence.

Who Should Upgrade?

  • Developers building coding agents: The jump from 53.4% to 64.3% on SWE-bench Pro is the headline improvement. If your product relies on autonomous code generation or bug fixing, Opus 4.7 is a significant step forward.
  • Vision-heavy applications: The 3x resolution increase from 1.15MP to 3.75MP is a major upgrade for document analysis, UI understanding, diagram parsing, and any workflow dependent on visual detail.
  • Long-running agentic workflows: Fewer tool errors, better self-verification, and task budgets for cost control make Opus 4.7 meaningfully more reliable for production agents.
  • Anyone currently on Opus 4.6: Same price, better performance across nearly every metric. The only regression is in agentic web search (BrowseComp).
  • Teams that need fine-grained effort control: The new xhigh effort level plus the efficiency finding (low-effort 4.7 ≈ medium-effort 4.6) means you can get the same quality at lower cost, or better quality at the same cost.

Release Cadence and What's Next

Opus 4.7 arrives two months after Opus 4.6, which arrived two months after Opus 4.5 — maintaining Anthropic's remarkably consistent two-month upgrade cycle. If this cadence holds, we can expect the next major Claude model update around June 2026.

The bigger question remains: will any version of Mythos Preview's capabilities eventually make it into a generally available model? Anthropic hasn't committed to a timeline, but the gap between Mythos and the GA lineup suggests there's substantial room for improvement in future releases.

The Bottom Line

Claude Opus 4.7 is a strong, well-rounded upgrade that delivers its biggest gains in coding and vision. The 87.6% on SWE-bench Verified and 64.3% on SWE-bench Pro make it the most capable generally available coding model. The 3x vision upgrade and adaptive thinking represent meaningful architectural improvements, not just benchmark gains.

The breaking API changes — particularly the removal of temperature controls and extended thinking budgets — mean this isn't a zero-effort migration. Developers will need to update their integrations. But at the same price point with better performance across the board, the case for upgrading is straightforward for most use cases.

The shadow of Mythos Preview looms large. Anthropic's willingness to publicly acknowledge a far more capable model while restricting it is an unusual competitive strategy — one that simultaneously demonstrates their technical lead and their commitment to responsible deployment. Whether that resonates as responsible leadership or frustrating gatekeeping will likely depend on which side of the access list you're on.

Share this article

Related Articles

Ready to automate your workflows?

Start building AI-powered automations with Serenities AI today.