I just spent 4 days debugging a random SegFault. The code was written by an Agent. The diagnosis was driven by an Agent. And it was a masterclass in the difference between Generation and Engineering.

Debugging Crash is a binary reward function
Fig. 1 — Debugging Crash is a binary reward function

The Rabbit Hole

For the first 2 days, the Agent took me on a relentless, exhaustive wild goose chase. It read everything. It searched everything. It generated a convincing hypothesis that Pydantic (a standard, battle-tested library) was causing memory corruption. It convinced us to do a massive refactor, ripping out Pydantic in favor of msgspec (which is good anyway).

The result? The crash disappeared locally, but immediately resurfaced in the customer testing env.

Seg Fault is a binary reward function
Fig. 2 — Seg Fault is a binary reward function

The Diagnosis: The "Binary Reward" Trap

Why did the Agent fail so hard? Because debugging a random crash is a Binary Reward Function.

In normal coding, an Agent gets "partial credit." It writes a function, it compiles, it passes a unit test—it gets a gradient of success to guide its optimization path. But with a SegFault, there is no gradient. You are either crashing (Failure) or fixed (Success). There is no "getting warmer."

The Agent wasn't reasoning; it was flailing in the dark. It couldn't see a path, so it just hallucinated complexity to fill the void.

Overloaded Context Makes Agent Wonder
Fig. 3 — Overloaded Context Makes Agent Wonder

The "Old School" Intervention

I spent years building the Google Cloud Storage backbone. I know that when there is no gradient, you have to create one. I divided and conquered. I narrowed the crash down to a single, isolated function. Even then, the Agent—burdened by the massive context history of its own previous failures—couldn't see the bug.

I almost gave up. I was ready to rewrite the logic from scratch. But I gave it one last shot: I cleared the context. I wiped its memory and showed it only the narrowed-down function.

"Boop. You have an infinite recursion problem."

It solved it in seconds.

In Exhaustive Search, One just needs to be lucky
Fig. 4 — In Exhaustive Search, One just needs to be lucky

The Epiphany: Result-Driven Engineering

This experience clarified exactly what a Coding Agent is. It is not a Senior Engineer. It is an Exhaustive Search Engine.

It reminds me of a story from my early days at Google X. We had an intern on a sister project who produced "groundbreaking" results. How? They burned thousands of TPU for 3 months doing an exhaustive search of the parameter space to find a lucky fit, then retrofitted a research report to explain it. It was Result-Driven Research, not Research-Driven Results. It works, until you have to debug why it works.

Coding Agents are helpful, but they are dangerous. They are "lucky" problem solvers at times. They can generate 1,000 lines of code in a minute, but it might take you 4 days to find the one line that kills your production.

Debug accordingly.

Source Originally published on LinkedIn