>reasoning capabilities in latest models are rapidly approaching superhuman levels and continue to scale with compute
I still have a pretty hard time getting it to tell me how many sisters Alice has. I think this might be a bit optimistic.
They plugged the hole for "how many 'r''s in 'strawberry'", but I just asked it how many "l"s in "lemolade" (spelling intentional) and it told me 1. If you make it close to, but not exactly a word it would be expecting it falls over.
They plugged the hole for "how many 'r''s in 'strawberry'", but I just asked it how many "l"s in "lemolade" (spelling intentional) and it told me 1. If you make it close to, but not exactly a word it would be expecting it falls over.