Back to Articles
research

Does AI Make You a Worse Coder? New Research Has a Surprising Answer

Anthropic's landmark study reveals AI-assisted developers scored 17% lower on comprehension tests. But the way you use AI matters—some patterns preserve learning while others destroy it.

Serenities Team9 min read
Developer coding on laptop

The $80 Billion Question Nobody Asked

The AI coding revolution is here. GitHub Copilot has over 15 million users. Claude Code is rewriting how developers ship software. Every major tech company is racing to automate programming.

But amid the productivity gold rush, a critical question went unasked: Are we making ourselves worse at the very skills we need?

New research from Anthropic—the company behind Claude—delivers a sobering answer. In a randomized controlled trial with software developers, AI-assisted coders scored 17% lower on comprehension tests than those who coded by hand. That is not a minor gap. It is the equivalent of nearly two letter grades.

The study arrives at a pivotal moment. As AI writes more of our code, humans will still need to catch errors, guide output, and provide oversight. If AI atrophies those exact skills, we have built a system that undermines its own safety net.

The Study: What Anthropic Found

Researchers recruited 52 software engineers—mostly junior developers who had been using Python weekly for at least a year. They were familiar with AI coding tools but unfamiliar with Trio, a Python library for asynchronous programming.

The setup mimicked real-world skill acquisition: participants received a problem description, starter code, and brief conceptual explanations. Half had access to an AI assistant that could produce correct code on demand. Half had to figure it out themselves.

After completing two coding tasks, everyone took a quiz covering concepts they had just used minutes before.

The Results Were Stark

  • AI group average: 50% quiz score
  • Hand-coding group average: 67% quiz score
  • Statistical significance: p = 0.01 (highly significant)
  • Effect size: Cohen's d = 0.738 (large effect)

The largest gap appeared in debugging questions—the ability to identify and diagnose errors in code. This is precisely the skill humans need most for AI oversight. If you cannot spot when AI-generated code is broken, you cannot catch its mistakes.

Why This Happens: Cognitive Offloading

Previous research has shown that when people use AI assistance, they become less engaged with their work and reduce the effort they put into it. Scientists call this "cognitive offloading"—essentially, outsourcing your thinking to AI.

The Anthropic study provides the first controlled evidence that this offloading can prevent skill development in a professional context.

Here is the troubling part: participants in the AI group did not even save significant time. They finished about two minutes faster on average, but this did not reach statistical significance. So they got worse outcomes without meaningful productivity gains.

Why? Because several participants spent up to 11 minutes—30% of their allotted time—just composing queries and waiting for responses. The "faster" tool created new bottlenecks.

Not All AI Use Is Created Equal

Here is where the study gets interesting—and offers hope.

Researchers analyzed screen recordings to understand how participants actually interacted with AI. They found distinct patterns that led to dramatically different outcomes.

Low-Scoring Patterns (Average <40%)

AI Delegation: These participants wholly relied on AI to write code. They finished fastest but learned almost nothing. They encountered few errors because AI handled everything.

Progressive AI Reliance: Started with a few questions, then gradually delegated all coding to AI. They scored particularly poorly on the second task because they never engaged with new concepts.

Iterative AI Debugging: Used AI to debug and verify their code rather than understanding it themselves. Asked more questions but focused on getting solutions, not building comprehension.

High-Scoring Patterns (Average 65%+)

Generation-Then-Comprehension: Generated code with AI, then asked follow-up questions to understand what it did. They took the output but demanded explanations.

Hybrid Code-Explanation: Composed queries asking for both code AND explanations together. Reading explanations took more time but built understanding.

Conceptual Inquiry: Only asked conceptual questions. Wrote all code themselves based on improved understanding. Encountered many errors but resolved them independently. This was the second-fastest approach overall—faster than iterative debugging.

The takeaway: How you use AI matters more than whether you use it.

The Debugging Gap: A Safety Problem

The study's most concerning finding involves debugging. The AI group showed the largest deficit in their ability to identify when code is incorrect and understand why it fails.

This creates a dangerous dynamic. As more code is AI-generated, the skill needed to validate that code is exactly what AI erodes. Junior developers who learn with AI may never develop the debugging intuition their jobs require.

Consider the implications:

  • AI-generated code already has bugs (often subtle ones)
  • AI cannot reliably catch its own mistakes
  • Human oversight is the safety layer
  • But AI tools may be crippling that oversight capacity

We are building systems that make their human safety net less capable.

What This Means for Engineering Teams

The researchers offer pointed advice for managers: "Think intentionally about how to deploy AI tools at scale, and consider systems or intentional design choices that ensure engineers continue to learn as they work."

This is not an argument against AI coding tools. It is an argument for using them wisely.

Practical implications:

For Managers:

  • Do not assume junior developers will skill up naturally with AI
  • Build in opportunities for unassisted work
  • Evaluate engineers on comprehension, not just output
  • Consider rotating between AI-assisted and manual coding

For Individual Developers:

  • When learning something new, resist the urge to generate first
  • Ask "why" questions, not just "how" questions
  • Force yourself to understand before you ship
  • Use learning modes (Claude Code Explanatory, ChatGPT Study Mode)

For Organizations:

  • Audit whether AI tools are building or eroding team capabilities
  • Track comprehension metrics, not just velocity
  • Create space for deliberate practice without AI

The Vibe Coding Problem

In developer communities, there is growing conversation about vibe coding—a pattern where developers generate code through AI prompts without understanding what they are shipping. It feels productive. It looks productive. But it may be building technical debt and skill debt simultaneously.

The Anthropic study puts data behind this concern. If your AI workflow does not include comprehension, you are likely getting worse at your job even as you ship faster.

The solution is not to abandon AI. It is to use AI as a learning partner, not just a code generator.

At Serenities, our Vibe coding companion is designed around this principle. It explains while it generates. It asks if you understand before it moves on. Because producing code without comprehension is not productive—it is just deferred failure.

The Bigger Picture

This study is small (52 participants) and focused (one Python library, immediate testing). The researchers acknowledge they do not know whether effects persist long-term or apply beyond coding.

But it aligns with decades of learning science: struggle is how skills form. The "desirable difficulty" that AI removes may be exactly what builds expertise.

As AI becomes standard in professional settings, we need to think carefully about which cognitive tasks we automate and which we preserve. Not everything that feels like friction is waste. Some friction is how we grow.

The Path Forward

The study ends with a crucial insight: "Productivity benefits may come at the cost of skills necessary to validate AI-written code if junior engineers' skill development has been stunted by using AI in the first place."

This is not a call to abandon AI coding tools. Their benefits are real. But it is a call to be intentional about how we use them—especially when learning.

The developers who thrived with AI in this study were not the ones who used it most. They were the ones who used it as a thinking partner rather than a replacement for thinking.

Use AI to explain, not just to generate. Ask why, not just what. Understand before you ship.

That is not a slower path to productivity. The data shows it is actually faster than iterative AI debugging—and it builds the skills you need to catch the bugs AI will inevitably create.

The future belongs to developers who can work with AI and think independently. This study shows us how to become that developer.

AI
coding
research
Anthropic
developer skills
productivity
Share this article

Related Articles

Ready to automate your workflows?

Start building AI-powered automations with Serenities AI today.