logoalt Hacker News

aydynyesterday at 6:58 PM1 replyview on HN

Then publish the results of those internal evals. Public benchmark saturation isn't an excuse to be un-quantitative.


Replies

verdvermyesterday at 7:03 PM

How would published numbers be useful without knowing what the underlying data being used to test and evaluate them are? They are proprietary for a reason

To think that Anthropic is not being intentional and quantitative in their model building, because they care less for the saturated benchmaxxing, is to miss the forest for the trees

show 1 reply