logoalt Hacker News

t43562today at 9:31 AM2 repliesview on HN

Code may have to compile but that's a lowish bar and since the AI is writing the tests it's obvious that they're going to pass.

In all areas where there's less easy ways to judge output there is going to be correspondingly more value to getting "good" people. Some AI that can produce readable reports isn't "good" - what matters is the quality of the work and the insight put into it which can only be ensured by looking at the workers reputation and past history.


Replies

nthjtoday at 2:47 PM

We’ve had the sycophant problem for as long as people have held power over other people, and the answer has always been “put 3-5 workers in a room and make them compete for the illusion of favor.”

I have been doing this with coding agents across LLM providers for a while now, with very successful results. Grok seems particularly happy to tell Anthropic where it’s cutting corners, but I get great insights from O3 and Gemini too.

naaskingtoday at 11:23 AM

> since the AI is writing the tests it's obvious that they're going to pass

That's not obvious at all if the AI writing the tests is different than the AI writing the code being tested. Put into an adversarial and critical mode, the same model outputs very different results.

show 2 replies