logoalt Hacker News

orbital-decaylast Sunday at 3:38 AM0 repliesview on HN

>what you can't see from the map is many of the chains start in English but slowly descend into Neuralese

That's just natural reward hacking when you have no training/constraints for readability. IIRC R1 Zero is like that too, they retrained it with a bit of SFT to keep it readable and called it R1. Hallucinating training examples if you break the format or prompt it with nothing is also pretty standard behavior.