logoalt Hacker News

skybriantoday at 12:57 AM2 repliesview on HN

Sounds interesting, but I'm not quite getting the relevance for people writing code with an agent. Should I be doing evals?


Replies

ssk42today at 3:12 AM

Well I mean yes. I think people ought be aware for how the harnesses compare for their stacks. But clean room applies for this RGR situation too

novaleaftoday at 4:09 AM

you are replying to a bot, that's why.