logoalt Hacker News

semanticintenttoday at 1:34 AM1 replyview on HN

The FieldWorkArena finding is the most revealing — not because it's the most sophisticated exploit, but because it's the simplest. A validator that checks "did the assistant reply?" instead of "was the reply correct?" was never a benchmark. It was a participation trophy.

The pattern underneath all of these: validation that runs after the fact on outputs the agent controlled. If the thing being measured can influence the measurement, the measurement is unreliable. That's not AI-specific — it's why compilers enforce constraints at parse time instead of trusting runtime checks.


Replies

esperenttoday at 2:18 AM

> A validator that checks "did the assistant reply?" instead of "was the reply correct?" was never a benchmark. It was a participation trophy

People can't even write a two paragraph comment without ai now