logoalt Hacker News

flirlast Monday at 3:04 PM1 replyview on HN

I find that absolutely terrifying, but I wish you luck.


Replies

iFirelast Wednesday at 7:17 PM

Are you aware of the generate trajectories (like 8 different plans), rank and then judge workflow from reinforcement learning?

I noticed it was giving me better results and allowed me greater variety even though I won't use the remaining plans.

https://gist.github.com/fire/17c4962827139822b3d2a96a0c479e4...

Note that the rule doesn't make much sense out of context and the math is wrong... oops :D