> Those are not code problems. They are evaluation problems. > Code becomes precious when it...

trjordan • today at 3:55 PM • 5 replies • view on HN

> Those are not code problems. They are evaluation problems.

> Code becomes precious when it is the only place knowledge lives.

Reading AI code all day is _agonizing_. Just, a horrible way to live, and it melts people's brains at the moment you need them to be the most capable.

Manual programming has this really productive and gratifying feedback loop, where you read the code, write the code, and fix it until it compiles/runs/does what you want. AI code not only does half that for you, but it makes the "click" at the end uninspiring because you're never sure if it's cheated a bit to get to that moment.

Trying to operate with AI-generated code as the only durable artifact of programming is a dead end for the industry. Charity points to (and correct discards) architecture diagrams/specs as an interesting space to work in. My suspicion is that it's closer to the thing that's hand-written: prompts, markdown plans, and other nudges. Focus on the thing that you, as a human, produce, and that's the basis for both the core loop of "did the AI follow my instructions" and it's higher-leverage when you go to code review.

By the time you get to the PR, you've probably typed enough to Claude that you can regenerate the code, but the current industry default is to just throw away all those sessions and ship the code. That's backwards!

Replies

philbo • today at 4:06 PM

If a coworker dumped a 5k-line code review on you, you'd tell them to come back when it's broken down into smaller, reviewable chunks. Large dumps of code are basically unreviewable by humans, but it seems like a lot of people have forgotten about that when it comes to LLMs.

➕ show 6 replies

gavinh • today at 6:27 PM

I agree that reading AI code all day is agonizing. We're relying on code review to develop parts of our mental model of the system that were previously developed through coding. We're having more difficulty comprehending and recall details of the system. This is probably unsurprising; people recall information better that they "generated" than information they read. I am applying some lessons from pedagogy to extend code review. If this resonates with you, I would like to talk.

mooreds • today at 3:57 PM

Are there any products out there that are capturing the prompts/sessions? I imagine you could do it in an adhoc way, asking Claude to write up a summary of the session as part of the commit message. But is there anything else that's more structured/higher level?

➕ show 3 replies

agumonkey • today at 6:32 PM

the act, eval, adjust loop is probably neurologically important.. reading about things you didn't dive into is really a dread

depending on your industry, you might be able to ship half-slop and then fix some bugs downstream though

keybored • today at 6:07 PM

Flintstone Engineering is applying Space Age synthetic intelligence (in a metaphorical sense) technology with code generation. Babysitting, version controlling, etc. generated code should be a thing of the past. But that is what GenAI is.

At the very least apply it at a higher level: specification, proofs, anything but generating Rust/Java/C and then letting yourself or an agent babysit it.

alt Hacker News

Replies