I find that absolutely terrifying, but I wish you luck.
Are you aware of the generate trajectories (like 8 different plans), rank and then judge workflow from reinforcement learning?
I noticed it was giving me better results and allowed me greater variety even though I won't use the remaining plans.
https://gist.github.com/fire/17c4962827139822b3d2a96a0c479e4...
Note that the rule doesn't make much sense out of context and the math is wrong... oops :D
Are you aware of the generate trajectories (like 8 different plans), rank and then judge workflow from reinforcement learning?
I noticed it was giving me better results and allowed me greater variety even though I won't use the remaining plans.
https://gist.github.com/fire/17c4962827139822b3d2a96a0c479e4...
Note that the rule doesn't make much sense out of context and the math is wrong... oops :D