Maybe have a second model that is configured to nudge the first model in the direction of exploration, and have the two of them work in tandem?