Most straightforward would be to ask the model to generate different evaluation metrics (which they already seem to do) and use each one as one of the dimensions