What do you mean about not doing evals? Just literally that you don’t run any benchmarks or do you have something against them?
He's just saying anecdotally these models are good. A reasonable response might be "have you systematically evaluated them?". He has pre-answered - no.
Not OP, but perhaps they mean not putting too much faith in common benchmarks (thanks to benchmaxxing).
He's just saying anecdotally these models are good. A reasonable response might be "have you systematically evaluated them?". He has pre-answered - no.