Is it a frontier player though, or perhaps a new benchmaxxed model? People were saying similar things about Grok but it ultimately amounted to little.
"preferred by humans over Sonnet 4.6" makes it pretty clearly not benchmaxxed though.
At least when you define benchmaxxed as "good in benchmarks but not human preference".
"preferred by humans over Sonnet 4.6" makes it pretty clearly not benchmaxxed though.
At least when you define benchmaxxed as "good in benchmarks but not human preference".