logoalt Hacker News

iFirelast Wednesday at 7:17 PM0 repliesview on HN

Are you aware of the generate trajectories (like 8 different plans), rank and then judge workflow from reinforcement learning?

I noticed it was giving me better results and allowed me greater variety even though I won't use the remaining plans.

https://gist.github.com/fire/17c4962827139822b3d2a96a0c479e4...

Note that the rule doesn't make much sense out of context and the math is wrong... oops :D