Training RL policies on edge cases by using humans to collect and instrument previously closed data ...

AndrewKemendo • today at 3:03 PM • 0 replies • view on HN

Training RL policies on edge cases by using humans to collect and instrument previously closed data systems.

alt Hacker News