logoalt Hacker News

isaacfrond11/07/20242 repliesview on HN

I think there is a philosophical angle to this. I mean, my world map was constructed by chance interactions with the real world. Does this mean that the my world map is a close to the real world map, as their NN's map is to Manhattan? Is my world map full of non-existent streets, exits that are at the wrong place, etc. The NN map of Manhattan works almost 100% correctly when used for normal navigation but breaks apart badly when it has to plan a detour. How brittle is my world map?


Replies

gwern11/08/2024

One of the things about offline imitation learning like OP or LLMs in general is that the more important the error in their world model, the faster it'll correct itself. If you think you can teleport across a river, you'll make & execute plans which exploit that fact first thing to save a lot of time - and then immediately hit the large errors in that plan and observe a new trajectory which refutes an entire set of errors in your world model. And then you retrain and now the world model is that much more accurate. The new world model still contains errors, and then you may try to exploit those too right away, and then you'll fix those too. So the errors get corrected when you're able to execute online with on-policy actions. The errors which never turn out to be relevant won't get fixed quickly, but then, why do you care?

cen411/07/2024

Also things are not static in the real world.