Not totally wrong. Self play works well with if your problem can be easily simulated in an RL enviro...

rdedev • yesterday at 9:24 PM • 2 replies • view on HN

Not totally wrong. Self play works well with if your problem can be easily simulated in an RL environment where the model can easily explore different states. RLHF or similar techniques is not that since we don't have exactly have a simulation environment for language modelling

Right now there are companies which hire software devs or data scientists to just solve a bunch of random problems so that they can generate training data for an LLM model. Why would they be in business if self play can work out so well?

Replies

notpachet • yesterday at 9:51 PM

> Right now there are companies which hire software devs or data scientists to just solve a bunch of random problems so that they can generate training data for an LLM model.

Sounds like Macrodata Refinement.

vidarh • yesterday at 9:46 PM

> Why would they be in business if self play can work out so well?

Because it is still cheaper.

alt Hacker News

Replies