logoalt Hacker News

daxfohlyesterday at 8:06 PM3 repliesview on HN

I still find in these instances there's at least a 50% chance it has taken a shortcut somewhere: created a new, bigger bug in something that just happened not to have a unit test covering it, or broke an "implicit" requirement that was so obvious to any reasonable human that nobody thought to document it. These can be subtle because you're not looking for them, because no human would ever think to do such a thing.

Then even if you do catch it, AI: "ah, now I see exactly the problem. just insert a few more coins and I'll fix it for real this time, I promise!"


Replies

gtoweyyesterday at 9:00 PM

The value extortion plan writes itself. How long before someone pitches the idea that the models explicitly almost keep solving your problem to get you to keep spending? Would you even know?

show 3 replies
wvenableyesterday at 10:11 PM

> These can be subtle because you're not looking for them

After any agent run, I'm always looking the git comparison between the new version and the previous one. This helps catch things that you might otherwise not notice.

charcircuityesterday at 9:19 PM

You are using it wrong, or are using a weak model if your failure rate is over 50%. My experience is nothing like this. It very consistently works for me. Maybe there is a <5% chance it takes the wrong approach, but you can quickly steer it in the right direction.

show 1 reply