logoalt Hacker News

jacquesmyesterday at 11:17 PM2 repliesview on HN

You're using it as a 'super compiler', effectively a code generator and your .md file is the new abstraction level at which you code.

But there is a price to pay: the code that you generate is not the code that you understand and when things go pear shaped you will find that that deterministic element that made compilers so successful is missing from code generated from specs dumped into an AI. If you one-shot it you will find that the next time you do this your code may come out quite different if it isn't a model that you maintain. It may contain new bugs or revive old ones. It may eliminate chunks of the code and you'll never know and so on.

There is a reason that generated code always had a bit of a smell to it and AI generated code is no different. How much time do you spend on verifying that it actually does what's written on the tin?

Do you write your own tests? Do you let the AI write the tests and the code? Are you familiar with the degree to which AIs can be manipulated to do stuff that you thought they weren't supposed to? (A friend of mine just proved this to his boss by bribing an AI with a 'nice batch of pure random data' to put a piece of unreviewed code into production by giving itself the privileges required to do so...)


Replies

CharlieDigitalyesterday at 11:22 PM

We have human reviews on every PR.

Quality and consistency are going up, not down. Partially because the agents follow the guidance much more closely than humans do and there is far less variance. Shortcuts that a human would make ("I'll just write a one-off here"), the agent does not...so long as our rules guide it properly ("Let me find existing patterns in the codebase.").

Part of it is the investment in docs we've made. Part of it is that we were already meticulous about commenting code. It turns out that when the agents stumble on this code randomly, it can read the comments (we can tell because it also updates them in PRs when it makes changes).

We are also delivering the bulk of our team level capabilities via remote MCP over HTTP so we have centralized telemetry via OTEL on tool activation, docs being read by the agents, phantom docs the agent tries to find (we then go and fill in those docs).

show 2 replies
operatingthetanyesterday at 11:26 PM

>A friend of mine just proved this to his boss by bribing an AI with a 'nice batch of pure random data' to put a piece of unreviewed code into production by giving itself the privileges required to do so...

Okay that's pretty hilarious. Everyone has a vice!

show 1 reply