This is happening already. The LLM vendors are all competing on coding ability, and the best tool they have for that is synthetic data: they can train only on code that passes automated tests, and they can (and do) augment their training data with both automatically and manually generated code to help fill gaps they have identified in that training data.
Qwen notes here - they ran 20,000 VMs to help run their synthetic "agent" coding environments for reinforcement learning: https://simonwillison.net/2025/Jul/22/qwen3-coder/