Everyone should have their own private evals for models. If I ask a question and a model flat out gets it wrong sometimes I will put it in my test questions bank.