OpenAI just dropped GPT-5.4 — and it's not just another incremental update. Released on March 5, 2026, this model combines frontier coding capabilities, native computer-use, and a 1M token context window into a single package designed for professional work. If you build apps, automate workflows, or run AI-powered businesses, here's exactly what changed and why it matters.
TL;DR — What's New in GPT-5.4
| Feature | GPT-5.4 | GPT-5.2 (Previous) |
|---|---|---|
| Computer Use | Native — operate desktops, browsers, apps | Not available |
| Context Window | Up to 1M tokens | 128K–256K |
| Tool Search | 47% fewer tokens for tool-heavy workflows | All tools loaded upfront |
| Knowledge Work (GDPval) | 83.0% (matches/exceeds professionals) | 70.9% |
| OSWorld (Desktop Use) | 75.0% — surpasses human performance (72.4%) | 47.3% |
| Coding (SWE-Bench Pro) | 57.7% | 55.6% |
| Hallucination Reduction | 33% fewer false claims per response | Baseline |
| API Pricing (Input) | $2.50/M tokens | $1.75/M tokens |
| API Pricing (Output) | $15/M tokens | $14/M tokens |
1. Native Computer Use — The Headline Feature
GPT-5.4 is OpenAI's first general-purpose model with native computer-use capabilities. This isn't a bolted-on feature — it's built into the model itself.
What does that mean practically? GPT-5.4 can:
- Navigate desktop environments through screenshots and keyboard/mouse actions
- Write Playwright code to automate browser workflows
- Issue mouse and keyboard commands in response to what it sees on screen
- Complete multi-step workflows across different applications
The benchmark results tell the story. On OSWorld-Verified, which measures a model's ability to navigate desktop environments, GPT-5.4 hits 75.0% — exceeding human performance at 72.4% and obliterating GPT-5.2's 47.3%. That's a 59% relative improvement in one generation.
On WebArena-Verified (browser use), it achieves 67.3% using both DOM and screenshot-driven interaction. On Online-Mind2Web, it reaches 92.8% using screenshots alone.
Mainstay, an enterprise customer, reported a 95% first-attempt success rate across ~30K property tax and HOA portals, reaching 100% within three attempts — while completing sessions 3x faster and using 70% fewer tokens than prior CUA models.
2. Tool Search — Finally, Efficient Tool Ecosystems
If you've built MCP servers or worked with large tool ecosystems, you know the pain: every tool definition gets dumped into the prompt upfront, bloating token counts and slowing responses.
GPT-5.4 introduces tool search. Instead of loading all tool definitions into context, the model receives a lightweight list and looks up specific tool definitions only when needed.
The results are dramatic. Testing with 250 tasks from Scale's MCP Atlas benchmark with all 36 MCP servers enabled, tool search reduced total token usage by 47% while achieving the same accuracy. For MCP servers with tens of thousands of tokens in tool definitions, this changes the economics entirely.
This matters for anyone building agentic workflows. Lower tokens = lower cost = faster responses = agents that can actually work with large tool ecosystems without burning through your budget.
3. 1M Token Context Window
GPT-5.4 supports up to 1M tokens of context — the same as Gemini's largest context windows and 4x Claude's current 256K limit. In Codex, this is available experimentally by configuring model_context_window and model_auto_compact_token_limit.
There's a catch: requests exceeding the standard 272K context window count against usage limits at 2x the normal rate. So you'll want to be strategic about when you push past 272K.
Long-context performance looks solid but not perfect. On OpenAI's MRCR v2 8-needle test:
- 4K–128K: 86–97% accuracy (strong)
- 128K–256K: 79.3% (good)
- 256K–512K: 57.5% (moderate drop-off)
- 512K–1M: 36.6% (significant degradation)
Translation: the 1M context window is real, but performance degrades significantly past 256K. Use it for large codebases and document collections, but don't expect the same precision at 800K tokens as at 100K.
4. Professional Knowledge Work — 83% Match With Experts
On GDPval, which tests agents across 44 occupations from the top 9 U.S. GDP-contributing industries, GPT-5.4 matches or exceeds professionals in 83.0% of comparisons — up from 70.9% for GPT-5.2. Tasks include creating sales presentations, accounting spreadsheets, urgent care schedules, and manufacturing diagrams.
Specific improvements:
- Spreadsheet modeling: 87.3% mean score on investment banking analyst tasks (vs 68.4% for GPT-5.2)
- Presentations: Human raters preferred GPT-5.4 outputs 68% of the time over GPT-5.2
- Factual accuracy: 33% fewer false individual claims, 18% fewer responses containing any errors
Harvey, the legal AI company, reported GPT-5.4 scored 91% on their BigLaw Bench eval for document-heavy legal work.
5. Coding: Matches GPT-5.3-Codex, Adds Speed
GPT-5.4 incorporates the coding capabilities from GPT-5.3-Codex while adding the knowledge work and computer-use improvements. On SWE-Bench Pro, it scores 57.7% vs GPT-5.3-Codex's 56.8% — a marginal improvement, but importantly it doesn't sacrifice coding ability for the new features.
The bigger coding story is speed. Codex's /fast mode delivers up to 1.5x faster token velocity with GPT-5.4. Developers can access the same speeds via the API using priority processing.
The model also introduces Playwright (Interactive) — an experimental Codex skill that lets the model visually debug web and Electron apps, even testing apps as it builds them.
Cursor's VP of Developer Education called GPT-5.4 "the leader on our internal benchmarks," noting it's "more natural and assertive than previous models" and "proactive about parallelizing work."
6. Steerability — Interrupt and Redirect Mid-Response
GPT-5.4 Thinking in ChatGPT now outlines its work with a preamble for complex queries — similar to how Codex outlines its approach. You can add instructions or adjust direction mid-response without starting over.
This is surprisingly useful for long tasks. Instead of waiting for a 2,000-word response and then saying "actually, I wanted it differently," you can course-correct while the model is still working.
API Pricing Breakdown
| Model | Input Price | Cached Input | Output Price |
|---|---|---|---|
| gpt-5.2 | $1.75/M tokens | $0.175/M tokens | $14/M tokens |
| gpt-5.4 | $2.50/M tokens | $0.25/M tokens | $15/M tokens |
| gpt-5.2-pro | $21/M tokens | — | $168/M tokens |
| gpt-5.4-pro | $30/M tokens | — | $180/M tokens |
GPT-5.4 costs ~43% more per input token than GPT-5.2 ($2.50 vs $1.75), but OpenAI claims greater token efficiency reduces total tokens required for many tasks. Batch and Flex pricing are available at half the standard rate. Priority processing costs 2x standard.
How does this compare to the competition?
| Model | Input | Output |
|---|---|---|
| GPT-5.4 | $2.50/M | $15/M |
| Claude Opus 4.6 | $5/M | $25/M |
| Claude Sonnet 4.6 | $3/M | $15/M |
GPT-5.4 undercuts Claude Opus 4.6 significantly — half the input price and 60% of the output price. It's priced between Sonnet and Opus, which positions it as a strong value play for frontier capabilities.
But here's the real cost story: API pricing is what developers pay. If you're a builder using ChatGPT through a subscription, you're paying a flat monthly rate — check chatgpt.com/pricing for current plan rates. With platforms like Serenities AI, you can connect your existing ChatGPT subscription and leverage these models at your subscription cost rather than per-token API pricing — potentially saving 10-25x on AI costs at scale.
ChatGPT Availability
| Plan | GPT-5.4 Thinking | GPT-5.4 Pro |
|---|---|---|
| Free | Limited access | No |
| Go | Limited access | No |
| Plus | Expanded access | No |
| Pro | Unlimited* | Yes |
| Business | Flexible (credits) | Flexible (credits) |
| Enterprise | Flexible (credits) | Flexible (credits) |
GPT-5.2 Thinking remains available under "Legacy Models" for three months. It will be retired on June 5, 2026.
Full Benchmark Scorecard
Professional & Knowledge Work
| Benchmark | GPT-5.4 | GPT-5.4 Pro | GPT-5.2 |
|---|---|---|---|
| GDPval (professional tasks) | 83.0% | 82.0% | 70.9% |
| Investment Banking Tasks | 87.3% | 83.6% | 68.4% |
| OfficeQA | 68.1% | — | 63.1% |
Coding
| Benchmark | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 |
|---|---|---|---|
| SWE-Bench Pro (Public) | 57.7% | 56.8% | 55.6% |
| Terminal-Bench 2.0 | 75.1% | 77.3% | 62.2% |
Computer Use & Vision
| Benchmark | GPT-5.4 | GPT-5.2 | Human |
|---|---|---|---|
| OSWorld-Verified | 75.0% | 47.3% | 72.4% |
| MMMU Pro (no tools) | 81.2% | 79.5% | — |
| BrowseComp | 82.7% | 65.8% | — |
Tool Use
| Benchmark | GPT-5.4 | GPT-5.2 |
|---|---|---|
| Toolathlon | 54.6% | 45.7% |
| MCP Atlas | 67.2% | 60.6% |
Reasoning & Science
| Benchmark | GPT-5.4 | GPT-5.4 Pro | GPT-5.2 |
|---|---|---|---|
| ARC-AGI-2 (Verified) | 73.3% | 83.3% | 52.9% |
| GPQA Diamond | 92.8% | 94.4% | 92.4% |
| FrontierMath Tier 1-3 | 47.6% | 50.0% | 40.7% |
| Humanity's Last Exam (tools) | 52.1% | 58.7% | 45.5% |
What This Means for Builders
GPT-5.4 represents a shift in what's possible with AI models:
- Computer use changes the agent game. Models that can directly operate software unlock automation scenarios that were previously impossible without complex custom integrations. Building an agent that navigates your company's internal tools? Now dramatically easier.
- Tool search makes MCP practical at scale. If you've been hesitant to connect dozens of MCP servers because of token bloat, tool search removes that barrier.
- The "professional work" angle is real. 83% match rate with professionals across 44 occupations isn't a toy demo. Spreadsheet modeling at 87% on investment banking tasks means these models are doing actual work, not just generating plausible-looking output.
- Cost efficiency matters more than raw price. Yes, GPT-5.4 costs more per token. But if it uses fewer tokens to complete the same task (which OpenAI claims), the total cost per task may actually decrease.
What's Still Missing
No model launch is perfect. A few things to watch:
- Long context quality: Performance drops significantly past 256K tokens. The 1M window exists but isn't equally useful throughout.
- Computer use safety: OpenAI is treating GPT-5.4 as "High cyber capability" under their Preparedness Framework. Expect guardrails — and occasional false positives in blocking.
- GPT-5.2 retirement: If your workflows depend on GPT-5.2 Thinking, you have until June 5, 2026 to migrate.
FAQ
When was GPT-5.4 released?
GPT-5.4 was released on March 5, 2026, with a gradual rollout across ChatGPT, Codex, and the API.
How much does GPT-5.4 cost?
API pricing: $2.50/M input tokens, $0.25/M cached input, $15/M output tokens. GPT-5.4 Pro: $30/M input, $180/M output. Batch and Flex pricing at half rate. In ChatGPT, access depends on your subscription plan — visit chatgpt.com/pricing for current rates.
Can GPT-5.4 use my computer?
Yes — GPT-5.4 has native computer-use capabilities. It can navigate desktop environments, interact with browser UIs, and operate software through screenshots and keyboard/mouse actions. This is available via the API's computer tool.
How does GPT-5.4 compare to Claude Opus 4.6?
GPT-5.4 is significantly cheaper ($2.50 vs $5/M input, $15 vs $25/M output). GPT-5.4 leads on computer use and tool efficiency. Claude Opus 4.6 has a reputation for stronger creative writing and longer sustained context quality. For coding and agentic work, both are frontier-class — your choice may come down to specific use case performance and cost.
When will GPT-5.2 be retired?
GPT-5.2 Thinking will remain available as a legacy model for three months, retiring on June 5, 2026.
Bottom Line
GPT-5.4 isn't just better — it's a different kind of model. Native computer use, tool search, and 1M context transform what's possible for agents and professional automation. The pricing is competitive, and the benchmarks back up the claims.
If you're building AI-powered applications, the combination of these capabilities with platforms like Serenities AI — where you can connect your ChatGPT subscription instead of paying per-token API costs — means the economics of AI automation just got dramatically better.
The age of AI agents that actually do professional work isn't coming. It just shipped.