Most of what you describe here is overfitting:
https://sohl-dickstein.github.io/2022/11/06/strong-Goodhart....