Uh, they invented multilatent attention and since the method for creating o1 was never published, they’re the only documented example of producing a model of comparable quality. They also demonstrated massive gains to the performance of smaller models through distillation of this model/these methods, so no, not really. I know this is the internet, but we should try to not just say things.
Uh, they invented multilatent attention and since the method for creating o1 was never published, they’re the only documented example of producing a model of comparable quality. They also demonstrated massive gains to the performance of smaller models through distillation of this model/these methods, so no, not really. I know this is the internet, but we should try to not just say things.