OpenAI just dropped GPT-5.4 — and it's not just another incremental update. Released on March 5, 2026, this model combines frontier coding capabilities, native computer-use, and a 1M token context window into a single package designed for professional work. If you build apps, automate workflows, or run AI-powered businesses, here's exactly what changed and why it matters.

TL;DR — What's New in GPT-5.4

FeatureGPT-5.4GPT-5.2 (Previous)
Computer UseNative — operate desktops, browsers, appsNot available
Context WindowUp to 1M tokens128K–256K
Tool Search47% fewer tokens for tool-heavy workflowsAll tools loaded upfront
Knowledge Work (GDPval)83.0% (matches/exceeds professionals)70.9%
OSWorld (Desktop Use)75.0% — surpasses human performance (72.4%)47.3%
Coding (SWE-Bench Pro)57.7%55.6%
Hallucination Reduction33% fewer false claims per responseBaseline
API Pricing (Input)$2.50/M tokens$1.75/M tokens
API Pricing (Output)$15/M tokens$14/M tokens

1. Native Computer Use — The Headline Feature

GPT-5.4 is OpenAI's first general-purpose model with native computer-use capabilities. This isn't a bolted-on feature — it's built into the model itself.

What does that mean practically? GPT-5.4 can:

  • Navigate desktop environments through screenshots and keyboard/mouse actions
  • Write Playwright code to automate browser workflows
  • Issue mouse and keyboard commands in response to what it sees on screen
  • Complete multi-step workflows across different applications

The benchmark results tell the story. On OSWorld-Verified, which measures a model's ability to navigate desktop environments, GPT-5.4 hits 75.0% — exceeding human performance at 72.4% and obliterating GPT-5.2's 47.3%. That's a 59% relative improvement in one generation.

On WebArena-Verified (browser use), it achieves 67.3% using both DOM and screenshot-driven interaction. On Online-Mind2Web, it reaches 92.8% using screenshots alone.

Mainstay, an enterprise customer, reported a 95% first-attempt success rate across ~30K property tax and HOA portals, reaching 100% within three attempts — while completing sessions 3x faster and using 70% fewer tokens than prior CUA models.

2. Tool Search — Finally, Efficient Tool Ecosystems

If you've built MCP servers or worked with large tool ecosystems, you know the pain: every tool definition gets dumped into the prompt upfront, bloating token counts and slowing responses.

GPT-5.4 introduces tool search. Instead of loading all tool definitions into context, the model receives a lightweight list and looks up specific tool definitions only when needed.

The results are dramatic. Testing with 250 tasks from Scale's MCP Atlas benchmark with all 36 MCP servers enabled, tool search reduced total token usage by 47% while achieving the same accuracy. For MCP servers with tens of thousands of tokens in tool definitions, this changes the economics entirely.

This matters for anyone building agentic workflows. Lower tokens = lower cost = faster responses = agents that can actually work with large tool ecosystems without burning through your budget.

3. 1M Token Context Window

GPT-5.4 supports up to 1M tokens of context — the same as Gemini's largest context windows and 4x Claude's current 256K limit. In Codex, this is available experimentally by configuring model_context_window and model_auto_compact_token_limit.

There's a catch: requests exceeding the standard 272K context window count against usage limits at 2x the normal rate. So you'll want to be strategic about when you push past 272K.

Long-context performance looks solid but not perfect. On OpenAI's MRCR v2 8-needle test:

  • 4K–128K: 86–97% accuracy (strong)
  • 128K–256K: 79.3% (good)
  • 256K–512K: 57.5% (moderate drop-off)
  • 512K–1M: 36.6% (significant degradation)

Translation: the 1M context window is real, but performance degrades significantly past 256K. Use it for large codebases and document collections, but don't expect the same precision at 800K tokens as at 100K.

4. Professional Knowledge Work — 83% Match With Experts

On GDPval, which tests agents across 44 occupations from the top 9 U.S. GDP-contributing industries, GPT-5.4 matches or exceeds professionals in 83.0% of comparisons — up from 70.9% for GPT-5.2. Tasks include creating sales presentations, accounting spreadsheets, urgent care schedules, and manufacturing diagrams.

Specific improvements:

  • Spreadsheet modeling: 87.3% mean score on investment banking analyst tasks (vs 68.4% for GPT-5.2)
  • Presentations: Human raters preferred GPT-5.4 outputs 68% of the time over GPT-5.2
  • Factual accuracy: 33% fewer false individual claims, 18% fewer responses containing any errors

Harvey, the legal AI company, reported GPT-5.4 scored 91% on their BigLaw Bench eval for document-heavy legal work.

5. Coding: Matches GPT-5.3-Codex, Adds Speed

GPT-5.4 incorporates the coding capabilities from GPT-5.3-Codex while adding the knowledge work and computer-use improvements. On SWE-Bench Pro, it scores 57.7% vs GPT-5.3-Codex's 56.8% — a marginal improvement, but importantly it doesn't sacrifice coding ability for the new features.

The bigger coding story is speed. Codex's /fast mode delivers up to 1.5x faster token velocity with GPT-5.4. Developers can access the same speeds via the API using priority processing.

The model also introduces Playwright (Interactive) — an experimental Codex skill that lets the model visually debug web and Electron apps, even testing apps as it builds them.

Cursor's VP of Developer Education called GPT-5.4 "the leader on our internal benchmarks," noting it's "more natural and assertive than previous models" and "proactive about parallelizing work."

6. Steerability — Interrupt and Redirect Mid-Response

GPT-5.4 Thinking in ChatGPT now outlines its work with a preamble for complex queries — similar to how Codex outlines its approach. You can add instructions or adjust direction mid-response without starting over.

This is surprisingly useful for long tasks. Instead of waiting for a 2,000-word response and then saying "actually, I wanted it differently," you can course-correct while the model is still working.

API Pricing Breakdown

ModelInput PriceCached InputOutput Price
gpt-5.2$1.75/M tokens$0.175/M tokens$14/M tokens
gpt-5.4$2.50/M tokens$0.25/M tokens$15/M tokens
gpt-5.2-pro$21/M tokens$168/M tokens
gpt-5.4-pro$30/M tokens$180/M tokens

GPT-5.4 costs ~43% more per input token than GPT-5.2 ($2.50 vs $1.75), but OpenAI claims greater token efficiency reduces total tokens required for many tasks. Batch and Flex pricing are available at half the standard rate. Priority processing costs 2x standard.

How does this compare to the competition?

ModelInputOutput
GPT-5.4$2.50/M$15/M
Claude Opus 4.6$5/M$25/M
Claude Sonnet 4.6$3/M$15/M

GPT-5.4 undercuts Claude Opus 4.6 significantly — half the input price and 60% of the output price. It's priced between Sonnet and Opus, which positions it as a strong value play for frontier capabilities.

But here's the real cost story: API pricing is what developers pay. If you're a builder using ChatGPT through a subscription, you're paying a flat monthly rate — check chatgpt.com/pricing for current plan rates. With platforms like Serenities AI, you can connect your existing ChatGPT subscription and leverage these models at your subscription cost rather than per-token API pricing — potentially saving 10-25x on AI costs at scale.

ChatGPT Availability

PlanGPT-5.4 ThinkingGPT-5.4 Pro
FreeLimited accessNo
GoLimited accessNo
PlusExpanded accessNo
ProUnlimited*Yes
BusinessFlexible (credits)Flexible (credits)
EnterpriseFlexible (credits)Flexible (credits)

GPT-5.2 Thinking remains available under "Legacy Models" for three months. It will be retired on June 5, 2026.

Full Benchmark Scorecard

Professional & Knowledge Work

BenchmarkGPT-5.4GPT-5.4 ProGPT-5.2
GDPval (professional tasks)83.0%82.0%70.9%
Investment Banking Tasks87.3%83.6%68.4%
OfficeQA68.1%63.1%

Coding

BenchmarkGPT-5.4GPT-5.3-CodexGPT-5.2
SWE-Bench Pro (Public)57.7%56.8%55.6%
Terminal-Bench 2.075.1%77.3%62.2%

Computer Use & Vision

BenchmarkGPT-5.4GPT-5.2Human
OSWorld-Verified75.0%47.3%72.4%
MMMU Pro (no tools)81.2%79.5%
BrowseComp82.7%65.8%

Tool Use

BenchmarkGPT-5.4GPT-5.2
Toolathlon54.6%45.7%
MCP Atlas67.2%60.6%

Reasoning & Science

BenchmarkGPT-5.4GPT-5.4 ProGPT-5.2
ARC-AGI-2 (Verified)73.3%83.3%52.9%
GPQA Diamond92.8%94.4%92.4%
FrontierMath Tier 1-347.6%50.0%40.7%
Humanity's Last Exam (tools)52.1%58.7%45.5%

What This Means for Builders

GPT-5.4 represents a shift in what's possible with AI models:

  1. Computer use changes the agent game. Models that can directly operate software unlock automation scenarios that were previously impossible without complex custom integrations. Building an agent that navigates your company's internal tools? Now dramatically easier.
  2. Tool search makes MCP practical at scale. If you've been hesitant to connect dozens of MCP servers because of token bloat, tool search removes that barrier.
  3. The "professional work" angle is real. 83% match rate with professionals across 44 occupations isn't a toy demo. Spreadsheet modeling at 87% on investment banking tasks means these models are doing actual work, not just generating plausible-looking output.
  4. Cost efficiency matters more than raw price. Yes, GPT-5.4 costs more per token. But if it uses fewer tokens to complete the same task (which OpenAI claims), the total cost per task may actually decrease.

What's Still Missing

No model launch is perfect. A few things to watch:

  • Long context quality: Performance drops significantly past 256K tokens. The 1M window exists but isn't equally useful throughout.
  • Computer use safety: OpenAI is treating GPT-5.4 as "High cyber capability" under their Preparedness Framework. Expect guardrails — and occasional false positives in blocking.
  • GPT-5.2 retirement: If your workflows depend on GPT-5.2 Thinking, you have until June 5, 2026 to migrate.

FAQ

When was GPT-5.4 released?

GPT-5.4 was released on March 5, 2026, with a gradual rollout across ChatGPT, Codex, and the API.

How much does GPT-5.4 cost?

API pricing: $2.50/M input tokens, $0.25/M cached input, $15/M output tokens. GPT-5.4 Pro: $30/M input, $180/M output. Batch and Flex pricing at half rate. In ChatGPT, access depends on your subscription plan — visit chatgpt.com/pricing for current rates.

Can GPT-5.4 use my computer?

Yes — GPT-5.4 has native computer-use capabilities. It can navigate desktop environments, interact with browser UIs, and operate software through screenshots and keyboard/mouse actions. This is available via the API's computer tool.

How does GPT-5.4 compare to Claude Opus 4.6?

GPT-5.4 is significantly cheaper ($2.50 vs $5/M input, $15 vs $25/M output). GPT-5.4 leads on computer use and tool efficiency. Claude Opus 4.6 has a reputation for stronger creative writing and longer sustained context quality. For coding and agentic work, both are frontier-class — your choice may come down to specific use case performance and cost.

When will GPT-5.2 be retired?

GPT-5.2 Thinking will remain available as a legacy model for three months, retiring on June 5, 2026.

Bottom Line

GPT-5.4 isn't just better — it's a different kind of model. Native computer use, tool search, and 1M context transform what's possible for agents and professional automation. The pricing is competitive, and the benchmarks back up the claims.

If you're building AI-powered applications, the combination of these capabilities with platforms like Serenities AI — where you can connect your ChatGPT subscription instead of paying per-token API costs — means the economics of AI automation just got dramatically better.

The age of AI agents that actually do professional work isn't coming. It just shipped.

Share this article

Related Articles

Ready to automate your workflows?

Start building AI-powered automations with Serenities AI today.