logoalt Hacker News

storusyesterday at 8:40 PM1 replyview on HN

> to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked

That's likely coming from the 3:1 ratio of linear to quadratic attention usage. The latest DeepSeek also suffers from it which the original R1 never exhibited.


Replies

nltoday at 6:08 AM

There is no way you can diagnose this like that. Correlation isn't causation and much more likely is a common source of reinforcement training data.