This is a fascinating look into code generated by an LLM that is correct in one sense (passes tests)...

grey-area • today at 8:19 AM • 0 replies • view on HN

This is a fascinating look into code generated by an LLM that is correct in one sense (passes tests) but doesn't meet requirements (painfully slow). Doesn't use is_ipk to identify primary keys, uses fsync on every statement. The problem with larger projects like this even if you are competent is that there are just too many lines of code to read it properly and understand it all. Bravo to the author for taking the time to read this project, most people never will (clearly including the author of it).

I find LLMs at present work best as autocomplete -

The chunks of code are small and can be carefully reviewed at the point of writing

Claude normally gets it right (though sometimes horribly wrong) - this is easier to catch in autocomplete

That way they mostly work as designed and the burden on humans is completely manageable, plus you end up with a good understanding of the code generated. They make mistakes I'd say 30% of the time or so when autocompleting, which is significant (mistakes not necessarily being bugs but ugly code, slow code, duplicate code or incorrect code.

Having the AI produce the majority of the code (in chats or with agents) takes lots of time to plan and babysit, and is harder to review, maintain and diagnose; it doesn't seem like much of a performance boost, unless you're producing code that is already in the training data and just want to ignore the licensing of the original code.

alt Hacker News