logoalt Hacker News

kahnclusionstoday at 2:02 AM0 repliesview on HN

What? Yes they do take shortcuts and hacks. They change the tests case to make it pass. As the context gets longer it is less reliable at following earlier instructions. I literally had Claude hallucinate nonexistent APIs and then admitted “You caught me! I didn’t actually know, let me do a web search” and then after the web search it still mixes deprecated patterns and APIs against instructions.

I’m much more worried about the reliability of software produced by LLMs.