What does quadratic attention mean?
I've so far found that intuition travels between models of a similar generation remarkably well. The conformance suite trick (find a 9,200 test existing conformance suite and tell an agent to build a fresh implementation that passes all those tests) I first found with GPT-5.2 turned out to work exactly as well against Claude Opus 4.5, for example.
https://arxiv.org/abs/2209.04881