I don't think this is a correct explanation of how things work these days. RL has really change...

maxrmk • today at 4:53 AM • 1 reply • view on HN

I don't think this is a correct explanation of how things work these days. RL has really changed things.

Replies

Models based on RL are still just remixers as defined above, but their distribution can cover things that are unknown to humans due to being present in the synthetic training data, but not present in the corpus of human awareness. AlphaGo's move 37 is an example. It appears creative and new to outside observers, and it is creative and new, but it's not because the model is figuring out something new on the spot, it's because similar new things appeared in the synthetic training data used to train the model, and the model is summoning those patterns at inference time.

➕ show 3 replies

alt Hacker News

Replies