> gaming the evaluation
Co-evolution is the answer here. The evaluator itself must be evolving.
Co-evolving Parasites Improve Simulated Evolution as an Optimization Procedure Danny Hillis, 1991
https://csmgeo.csm.jmu.edu/geollab/complexevolutionarysystem...
And in Reinforcement Learning:
POET (Paired Open-Ended Trailblazer): https://www.uber.com/en-DE/blog/poet-open-ended-deep-learnin...
SCoE (Scenario co-evolution): https://dl.acm.org/doi/10.1145/3321707.3321831