From 25% to 90%: One Developer's Journey With AI Coding Agents

David Crawshaw, founder of exe.dev and former Tailscale co-founder, just published a follow-up to his viral posts about programming with AI agents. The numbers are staggering: in February 2025, Claude Code wrote about 25% of his code. By February 2026, the latest Opus model writes 90% of it. His time split went from 50-50 (reading vs. writing code) to 95-5.

The post, titled "Eight More Months of Agents," hit 110+ points and 115 comments on Hacker News within a day. It's the third installment in a series that started with "How I Program with LLMs" (919 HN points) and continued with "How I Program with Agents" (615 HN points). Each post has tracked the rapid evolution of AI-assisted development from autocomplete to full agent-driven workflows.

If you've been wondering whether AI coding agents are actually worth integrating into your daily workflow — or if the hype is overblown — Crawshaw's experience offers one of the most honest, data-rich accounts available. Let's break down his key findings and what they mean for developers in 2026.

The Numbers: How AI Agents Transformed One Developer's Output

Metric	February 2025	February 2026
Code written by AI	~25%	~90%
Time split (reading vs. writing)	50-50	95-5
Primary tool	IDE with Copilot	Neovim + Claude Code
Developer role	Writing code	Reading and directing agents

Crawshaw is clear: the code still "all needs to be carefully read, and regularly adjusted." But the key shift is that he now relies on the model to do the adjustments too. The developer's role has fundamentally changed from writing to reviewing and directing.

It's All About the Model, Not the Harness

One of Crawshaw's most provocative claims is that agent harnesses haven't improved much in the past year. He notes that his own agent, Sketch, could do things six months ago that popular agents still can't do today. The innovation space around agent tooling is real, but right now it's secondary.

"Right now, it is all about the model," Crawshaw writes.

He dismisses public benchmarks entirely — "they have all been gamed to death" — but points to the qualitative leap in model capabilities as the most significant change. There's been no single breakthrough moment like GPT-2 first talking back. Instead, it's been a steady, dramatic improvement in frontier models' ability to produce useful code.

This matters for developers choosing tools. If you're spending hours comparing agent frameworks and harnesses, Crawshaw's experience suggests you should focus on which tool gives you access to the best model instead. For practical tips on getting the most from Claude Code specifically, check out our Claude Code tips and tricks guide.

IDEs Are Dead. Long Live Vi.

Perhaps the most surprising takeaway: Crawshaw has abandoned IDEs entirely. In 2021, when Copilot launched, IDEs seemed inevitable. Autocomplete and inline edit made your typing "go 50% further," and that was too powerful to ignore.

Four years later, the story has completely reversed. Agents don't need an IDE — they need a terminal and access to your codebase. The only IDE-like feature Crawshaw still uses is go-to-definition, which neovim handles with minimal configuration.

"So here I am, 2026, and I am back on Vi," he writes. "Vi is turning 50 this year."

The whiplash is real. Developers who invested heavily in Cursor, VS Code extensions, or other IDE-based AI tools may find themselves questioning that investment as terminal-based agents like Claude Code and Codex continue to improve. We explored this IDE-vs-agent tension in our Claude Code vs Codex CLI comparison.

Use Only Frontier Models — Or Learn the Wrong Lessons

Crawshaw makes a strong and somewhat controversial case: using anything other than frontier models is "actively harmful." Not just wasteful — actively counterproductive.

His argument isn't about snobbery. It's about discovery. The limits of AI agents keep moving, and developers need to constantly re-learn what's possible. If you use a cheaper model like Sonnet or a local alternative, you'll develop an inaccurate mental model of what agents can actually do. You'll learn the wrong lessons about when to trust the agent, when to intervene, and what kinds of tasks to delegate.

"Pay through the nose for Opus or GPT-7.9-xhigh-with-cheese," he advises. "Don't worry, it's only for a few years."

He does express genuine hope for local models — he found LLMs "entirely uninteresting" until Mixtral let him run one locally. But he's pragmatic: until local models catch up, using them for serious agent work teaches you the wrong habits.

Sandboxes Are Broken — Use Fresh VMs

A practical pain point Crawshaw highlights: built-in agent sandboxes don't work well. Claude Code's constant "may I run cat foo.txt?" prompts and Codex's inability to build in its sandbox create friction that kills productivity.

His recommendation is straightforward: turn off the built-in sandbox and provide your own. Specifically, use a fresh VM for each session. This gives the agent unconstrained access to do its work without the constant permission dialogs, while keeping your main system safe.

This is exactly the kind of workflow optimization that separates productive agent users from frustrated ones. If you want to tame your agent's behavior without sandboxing headaches, our guide on making AI agents follow rules with AGENTS.md covers how to set clear boundaries that agents actually respect.

More Programs Than Ever Before

The most optimistic section of Crawshaw's post describes an unexpected benefit: he now has far more programs and services than he used to. Ideas that would have ended up in an Apple Note titled "TODO" — forgotten forever — now actually get built.

"I am having more fun programming than I ever have," he writes, "because so many more of the programs I wish I could find the time to write actually exist."

This is the part of the AI coding story that often gets lost in debates about job displacement and code quality. For developers who already know what they want to build, agents remove the bottleneck of implementation time. The creative, architectural work remains human. The typing doesn't need to be.

He's building exe.dev specifically around this concept — a service where you can spin up a VM, give an unconstrained agent a one-liner description, and get a working program back.

What Hacker News Thinks

The HN discussion reveals the deep divide in the developer community. Several themes emerged from the 115+ comments:

The Skeptics Have a Point

Multiple commenters pushed back on Crawshaw's frontier-model-only stance. One noted they get "solid results from Copilot and Haiku/Flash" using optimized prompts and review loops — suggesting that good workflows can compensate for weaker models. Another offered a memorable quip: "Any sufficiently complicated LLM generated program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of an open source project."

The Fear Is Real — And Not Just About Code Quality

A detailed comment laid out five distinct categories of anti-LLM sentiment among developers:

Job loss and income threat — amplified by leadership using AI as leverage for worse working conditions
Happiness loss — developers who enjoy writing code but don't enjoy directing LLMs or debugging LLM-style mistakes
Uncertainty and skepticism — legitimate questions about reliability and ROI
Expertise erosion — concern that reliance on AI will atrophy skills for both new and experienced developers
Training data ethics — the ownership and morality questions around how models were built

The "Gaslit" Developer

One of the most upvoted comments came from a developer who said: "Comments like these make me feel like I'm being gaslit." They described coworkers deep in the AI ecosystem producing "pure slop that doesn't work, buggy, untested adequately." They're not ideologically opposed to AI tools — just not seeing the promised productivity gains in practice.

This tension between power users like Crawshaw (who have optimized their entire workflow around agents) and average users (who see mostly mediocre results) is perhaps the defining challenge of AI coding tools in 2026.

The Bigger Picture: Software Is the Wrong Shape

Crawshaw's most forward-looking insight is that "most software is the wrong shape now." He illustrates this with a Stripe Sigma example: rather than waiting for Stripe's API to be available, he had his agent build an entire ETL pipeline — querying the standard Stripe APIs, building a local SQLite database, and running queries against it. Three sentences of instruction replaced a commercial product.

His emerging philosophy: "The best software for an agent is whatever is best for a programmer." Product managers have long told engineers "you are not the customer." But now every customer has an agent that will write code against your product. Build what programmers love, and everyone follows.

This is a radical reframing. If Crawshaw is right, the entire SaaS industry needs to rethink how products are built — prioritizing APIs, documentation, and developer experience over polished UIs that will increasingly be bypassed by agents.

What This Means for You in 2026

If You're...	Key Takeaway
Already using AI agents	Focus on frontier models and VM-based sandboxes. Invest time in learning what the latest models can do.
Skeptical but curious	Try Claude Code or Codex with the best model available. Cheap models will give you a misleading experience.
A product builder	Prioritize APIs and developer experience. Your customers' agents are your new users.
Worried about job impact	The role is shifting from writing to reviewing and directing. Start building that skill set now.

At Serenities AI, we've been tracking the rapid evolution of AI coding tools throughout 2025 and 2026. Crawshaw's experience aligns with what we're seeing across the industry: the developers who invest time in mastering agent workflows are pulling dramatically ahead. The gap between agent-powered developers and traditional developers is widening every quarter.

Frequently Asked Questions

How much code can AI agents write in 2026?

According to David Crawshaw's real-world experience, frontier AI models like Claude Opus can now write approximately 90% of production code, up from 25% just one year earlier. The developer's role shifts primarily to reading, reviewing, and directing the agent's output rather than writing code directly.

Are AI coding agents actually worth the cost?

Crawshaw argues that using frontier models — despite their higher cost — is essential because cheaper models teach you "the wrong lessons" about what AI agents can do. The productivity gains from having 90% of code written by an agent far outweigh the subscription costs, especially for professional developers and startups.

Do I still need an IDE if I use AI coding agents?

Crawshaw has abandoned IDEs entirely in favor of neovim (Vi) with minimal configuration. Terminal-based agents like Claude Code don't require IDE integration — they need access to your codebase and the ability to run commands. The only IDE feature he still uses is go-to-definition.

What are the biggest problems with AI coding agents in 2026?

The main pain points are: built-in sandboxes that create friction with constant permission prompts, the need to use expensive frontier models for accurate results, the requirement to carefully review all generated code, and a steep learning curve for optimizing agent workflows. Many developers report that coworkers produce "slop" with AI tools due to insufficient review.

Will AI agents replace software developers?

Based on Crawshaw's experience, AI agents are changing the developer role rather than eliminating it. Developers now spend 95% of their time reading and reviewing code rather than writing it. The skill set is shifting toward architectural thinking, code review, and effective agent direction. Developers who adapt are reporting more fun and higher output than ever before.

AI Coding Agents After 8 Months: From 25% to 90% AI-Written Code