I think the headline oversells this a little? The reported variance in Sonnet 4.6's estimates...

NiloCK • today at 1:10 PM • 0 replies • view on HN

I think the headline oversells this a little?

The reported variance in Sonnet 4.6's estimates here are actually quite low, and in general terms, not so bad across models. Damn paella.

This does seem like a task well suited to a for-purpose training run against a bunch of labelled data. Is there any reason they wouldn't improve at it?

alt Hacker News