logoalt Hacker News

andy99last Tuesday at 11:44 PM2 repliesview on HN

What do you mean about not doing evals? Just literally that you don’t run any benchmarks or do you have something against them?


Replies

danielmarkbruceyesterday at 5:41 AM

He's just saying anecdotally these models are good. A reasonable response might be "have you systematically evaluated them?". He has pre-answered - no.

woodsonyesterday at 12:46 AM

Not OP, but perhaps they mean not putting too much faith in common benchmarks (thanks to benchmaxxing).