I wouldn’t want the LLM-based agent to hyperspecialize its solution to a subset of the data. That’s a basic tenet of machine learning.
Steelmanning your question though, I guess you could come up with some sort of tiered experimentation scheme where you slowly expose it to more data and more compute based on prior success or failures.