Does that mean we should use a larger model as judge for evals, not a smaller one?
That was always the advice. Use the best model you can afford.
But some problems are easy and you can get away with a smaller model.
That was always the advice. Use the best model you can afford.
But some problems are easy and you can get away with a smaller model.