logoalt Hacker News

porridgeraisin04/04/20250 repliesview on HN

Yep. Offline RL is especially full of these types of papers too. The sheer number of alternatives to the KL divergence to prevent the offline distribution from diverging too far from the collected data distribution... There's probably one method for each person on earth.