This has been happening for the past year on verifiable problems (did the change you made in your codebase work end-to-end, does this mathematical expression validate, did I win this chess match, etc...). The bulk of data, RL environment, and inference spend right now is on coding agents (or broadly speaking, tool use agents that can make their own tools).
Recent advances in mathematical/physics research have all been with coding agents making their own "tools" by writing programs: https://openai.com/index/new-result-theoretical-physics/