logoalt Hacker News

johnfnyesterday at 8:57 PM1 replyview on HN

I was expecting the traditional AI-written slop about AI, but this is actually really good. In particular, the "As instruction count increases, instruction-following quality decreases uniformly" section and associated graph is truly fantastic! To my mind, the ability to follow long lists of rules is one of the most obvious ways that virtually all AI models fail today. That's why I think that graph is so useful -- I've never seen someone go and systematically measure it before!

I would love to see it extended to show Codex, which to my mind is by far the best at rule-following. (I'd also be curious to see how Gemini 3 performs.)


Replies

0xblacklightyesterday at 9:18 PM

I looked when I wrote the post but the paper hasn’t been revisited with newer models :/