Essay

Why Veteran Engineers Stumble When Building AI Agents

2026.06.03 · 2 min read · EN

Something curious has been turning up among programmers lately. When it comes to building AI (artificial intelligence) agents, junior developers often ship something usable faster than the seasoned veterans do. Philipp Schmid, a developer relations engineer at Google DeepMind, recently flagged this as a paradox. An agent, unlike a chatbot that merely answers what it is asked, is a program you hand a goal to — and it works out the steps to get there on its own.

The reason lies in the nature of the thing. The software we are used to is deterministic. Press the same button and you always get the same result; when it deviates, that is a bug to be fixed. An AI agent, whose brain is a large language model that understands ordinary human speech, is probabilistic. Give it the same instruction and it may take a slightly different route each time and land on a different answer.

Schmid likens the difference to driving. Building traditional software, the developer was a traffic controller — owning the lights, the roads and the speed limits, deciding exactly where each car went and how. Working with an agent, you are closer to a dispatcher. You say, “Take this passenger downtown,” and the driver decides the route as they go. When a road is blocked they turn around; now and then they take a baffling detour. This is why an AI coding tool can do something strange in the middle of a task and still get the job done.

The same gap shows up in how human input is handled. Older programs asked through fixed slots: “Approve this plan? Yes / No.” But real people say, “Looks good — just change this one part.” Force that nuance into two boxes and the part that actually matters disappears. An agent takes in the whole sentence and acts on it.

The trouble is that the more senior the engineer, the less they trust this driver. They have spent decades being trained that good programming means stamping out ambiguity. So they try to suppress the probabilistic nature with code, to spell out every possible case in advance — and the more they do, the stiffer the agent gets. The handling shifts, too. Ordinary programs are built to halt the moment an error occurs, because that makes the bug easy to find. But you cannot start over from scratch every time an agent — one that may take several minutes per run — hits an error. So the error is treated not as a stop signal but as a new input, and the agent is left to find a detour on the spot. Even the checking changes: not “did it work once?” but “out of ten runs, how many worked?” For tasks with no single correct answer, a high enough success rate stands in for perfection. Schmid considers an agent ready for production if it succeeds on roughly 45 of 50 runs at good enough quality.

Agents will go on failing in unexpected ways. That is exactly why the field’s center of gravity is moving — away from sealing off errors with code, toward building systems sturdy enough to absorb them and recover. Schmid’s conclusion is plain. Veterans hesitate not for any lack of skill, but because an instinct honed too well is, in a game whose rules have changed, a weight they need to set down for a while.