Sonnet's strength was always comprehending the problem and its context. It happened to also be pretty good at generating code, but what it actually made it its first really useful model was that it understood _what_ to code and how to communicate.
Exactly - it works better in the real world, where there's a lot less context than a clinical benchmark, and you're just trying to get the answer without writing an essay.
Exactly - it works better in the real world, where there's a lot less context than a clinical benchmark, and you're just trying to get the answer without writing an essay.