Uh, they invented multilatent attention and since the method for creating o1 was never published, th...

bugglebeetle • 01/21/2025 • 0 replies • view on HN

Uh, they invented multilatent attention and since the method for creating o1 was never published, they’re the only documented example of producing a model of comparable quality. They also demonstrated massive gains to the performance of smaller models through distillation of this model/these methods, so no, not really. I know this is the internet, but we should try to not just say things.

alt Hacker News