In 2015, Dan McKinley published an essay called “Choose Boring Technology” and introduced the idea of the “innovation token.” A company, he argued, gets only about three of these tokens, and it spends one every time it adopts an unproven new technology. Boring technology isn’t bad technology; it’s technology whose capabilities and — more importantly — whose failure modes are well understood. When something breaks at three in the morning, you want to be debugging a tool with a decade of Stack Overflow answers behind it, not blazing a trail no one has walked.
Recently Aaron Brethorst revisited that argument for the age of AI and concluded that the principle matters more now, not less. A coding assistant will produce plausible-looking code for any stack you name. The trouble is that if you lean on two technologies you don’t know at once, you have no way to tell whether the code is right or whether the model is simply bluffing. Large language models (LLMs) hallucinate technical details with total confidence. The more unfamiliar technologies you stack, the more the uncertainty multiplies rather than adds.
The sharpest insight in that argument is this: AI has slashed the cost of producing code but left the cost of catching its mistakes untouched. Generating something and recognizing that it is wrong are different skills, and AI makes only the first one cheap. So the bottleneck has moved from writing to verifying — and verifying still demands real fluency in the domain. The reason Brethorst can wield AI powerfully on his own turf isn’t that generation is free; it’s that he can tell when the output is off.
You don’t have to follow the argument all the way to “so use only what you already know,” though. Two things are missing. The first is that AI is a learning accelerator as much as a generation risk. The economics of the innovation token assumed that learning a new technology is expensive. But if an AI can explain the concepts and work through the stuck points alongside you, the cost of adopting something new falls too. The price of the token itself has changed.
The second is that “can I personally review this code?” is too strict a test. Verification doesn’t happen only when a human reads the source line by line. Type systems, compilers, tests, and runtime behavior catch a great deal regardless of anyone’s expertise. Someone touching Rust for the first time is still stopped by the borrow checker before a memory bug ever ships. So the right question isn’t “can I catch it by reading?” but “are the failure modes I can’t catch caught by something else?” By that standard, far more choices are safe than a blanket ban would allow.
What actually strengthens Brethorst’s case is something he doesn’t quite say. A technology earns the label “boring” because documentation, examples, and answered questions have piled up around it — and those well-trodden technologies are exactly the ones an LLM answers most accurately. Models learn from the internet, and the older, more stable, and better-documented a technology is, the more overwhelmingly it shows up in the training data: things like SQL, PostgreSQL, and regular expressions, hardened over decades. The newer a library is, or the more it breaks between versions, the closer it comes to poison for the model. In other words, the risk of hallucination concentrates in the unfamiliar — which means your inexperience and the model’s land in the same place. That is worse than two risks multiplying independently. There is an accident of vocabulary in it, too: the scarce “tokens” McKinley told us to ration and the tokens we now feed the model are decided at one and the same point.
Research, of course, is a different matter. Working with a new algorithm, a new language, and new infrastructure all at once is the very opposite of boring. But the innovation-token frame was always aimed at a company’s production systems, not at research. Translated for a researcher, it reads like this: spend the expensive token only on the one piece where you are genuinely testing something new, and keep everything around it — the build, the deployment, the supporting tools — relentlessly boring. The moment you hand both the core and its surroundings to an AI at once, you lose any hope of knowing what went wrong.
In the end, the real virtue of boring technology isn’t just stability. It is that boring technology is the most verifiable. In an age when an AI can pour out thousands of confident lines for any stack you point it at, the ability to notice when those lines are wrong is more valuable than ever. And the place that ability works best is, paradoxically, on top of the most boring technology of all.