"preferred by humans over Sonnet 4.6" makes it pretty clearly not benchmaxxed though.
At least when you define benchmaxxed as "good in benchmarks but not human preference".