Taming the Genie: "Like Kent Beck"

Does it help? Not by itself but yes with a little more guidance.

Jan 20, 2026

Kent works with a small number of partners each quarter. Interested? Let’s talk →

We’re in the “horseless carriage” stage of coding genies. We absorb every technological innovation by first understanding it in our current frame before we begin to appreciate the fundamental changes it enables.

Horseless carriage → automobile
Wireless telegraph → radio
Electronic mail → messaging

You can’t rush this transition. You have to live with the new technology long enough to grok the second-order implications of it, the reinforcing & inhibiting loops it creates. Then you can shift into second gear (to use a nearly-obsolete metaphor) & find out what the new technology is really capable of.

Okay, so what can you do while you wait? Use the technology. Lots. And so I’ve been augmented coding. Lots.

My first goal is to get the genie to code like me, but better (this may be the wrong goal, but I certainly don’t want it coding worse than me). Along the way someone suggested adding “…like Kent Beck would” to their programming prompt, reporting that the genie’s behavior improved afterwards.

Does it? I wanted to see if I could demonstrate this effect. Fortunately I have a near-zero-cost way to perform experiments: the genie!

(I’ll reveal my secret second goal of this experiment at the end of this post.)

The Experiment: Rope

I wanted a sample project that was big enough to require interesting coding & design decisions, but small & contained enough to be validated by straightforward tests.

I chose the Rope data structure. Say you have a very long string & you’d like to delete a character in the middle of it.

The simplest way is to shift all the (many) characters to the right over by one. This operation is O(n) where n is the length of the string. But what if we want this operation to be constant time?

What if we had an object representing a substring of the big string & another object representing the concatenation of two of our new flavor of string? Deleting a character, then, results in this:

Constant time! For each delete we allocate 3 objects. (Navigating this structure is O(the number of operations) but that’s presumably a smaller number than the unbounded size of the string & we can compress the operations periodically.)

This is the data structure I want the genie to build on my behalf.

Evolution of the Project

Phase 1: The Persona (”Code like Kent Beck”)

I started with a simple hypothesis: asking the model to “Code like Kent Beck” would produce better code or to put it in more formal science-y language the null hypothesis is that appending “code like Kent Beck” won’t make a difference.

The Result: It did, but not in the way I expected. The code style improved—variable names got better, and most importantly, the testing strategy shifted from monolithic scripts to modular, isolated unit tests (TDD style).
The Surprise: The architecture didn’t change. It implemented the Rope as a standard binary tree, ignoring the Composite pattern that I (the real Kent Beck) use.

Phase 2: Design Guidance

We hit some bumps. First, the “Control” group code was getting truncated because it was too verbose, leading to syntax errors. We fixed this by increasing the token limit—a small reminder that “more compute” (or at least more buffer) is often a simple fix. Then, to fix the design, I refined the prompt. I couldn’t just say “be me”; I had to tell it what I would do. I added explicit constraints: “Use the Composite pattern. Break behavior into small, specialized classes.”

I got the design I expected—separate classes for Substring & Concatenation, each simpler than the single class that unguided development produced. Actually I got a simpler design. When I code this one finger at a time I usually end up with a Null Object—EmptyString—and a simple wrapper around the native string. The genie figured out that could just use Substring from 0..size. Nice catch!

Phase 3: The Isolation (4-Group Experiment)

But which of the interventions made the difference? “Act like me” or “compose small classes”? We need to try the cross product:

Control: Standard assistant.
Kent Beck: Persona only.
Composite: Architectural constraints only.
Combined: Persona + Constraints.

Conclusions

The results were stark and educational. We found a clear 2x2 matrix of effects:

Personas Drive Micro-Behavior: The “Kent Beck” prompt reliably improved testing style and naming. It made the code “feel” better but didn’t change the fundamental structural decisions.
Constraints Drive Macro-Architecture: The “Composite Pattern” prompt reliably forced the class hierarchy. It produced a finer-grained design even without the persona.
The Combination Wins: The “Combined” group gave us the best of both worlds—the right architecture (Composite) with the right development habits (TDD/Unit Tests).

The Bitter Lesson Applied

I said I’d tell you my secret agenda before I was done here. I want the genie to do a better job of development, to balance features & futures. I’ve tried to get this effect through meticulous prompting, through paying excruciating attention to the changes the genie proposes, through smaller steps, larger steps, everything I could think of.

Turns out I’m not the first person to go down this path. Rich Sutton in The Bitter Lesson described how 70 years of work on AI demonstrates that leveraging computing gives better results than encoding human expertise. I’ve been trying to encode my style. I suppose I should be glad I’ve finally gotten around to making the same mistake everyone makes.

Are we doomed to clumsy coding genies that copy the same shitty code style they’ve seen in countless repos? I think not. Here’s a way to leverage computation to get a more-effective development style:

Take a large repos.
Have a million genies implement the next feature, but each genie choosing how & how much to tidy first.
Select the genies which succeed at adding the next feature at the lowest cost (time, tokens, electricity, money, whatever).
Do it again for lots of genies & lots of features in lots of repos.

We are “wasting” all that coding but not really.

Jessica Kerr calls this a Design Contest.

MetalMonkey

Jan 20

The infinite monkey approach is one I like the sound of for some reason!

Matteo Vaccari

Mar 18

"Like Kent Beck" is a vague prompt. Since you wrote and did lots of things, the Genie does not know which ones are important to you here and now. And its knowledge of who KB is is not coming from your writings only, but also from the writings about KB of just about everyone who mentioned or commented or refuted or misquoted or misrepresented you! We'd get "the average idea of KB that comes across through the Internet". So it's a bit like asking "use the best practices" without explaining which best practices you have in mind.

What I did when I tried this was to ask the AI to distill the recommendations from your writing that I thought were more relevant in the moment , and then I reviewed it, and then I could point the AI to that file when I wanted that behavior

1 more comment...

Software Design: Tidy First?

Discussion about this post

Ready for more?