>You're implicitly assuming that all data is factual and that therefore training an LLM on cryptographically random data will create an intelligence that learns properties of the real world.
No, that’s a complete strawman. I’m not saying the data is "The Truth TM". I’m saying the data is real physical signal in a lot of cases.
If you train a LLM on cryptographically random data, it learns exactly what is there. It learns that there is no predictable structure. That is a property of that "world." The fact that it doesn't learn physics from noise doesn't mean it isn't modeling the data directly, it just means the data it was given has no physics in it.
>If you feed flat earth books into the LLM, you will not be told that earth is a sphere and yet that is what you're claiming here.
If you feed a human only flat-earth books from birth and isolate them from the horizon, they will also tell you the earth is flat. Does that mean the human isn't "modeling the world"? No, it means their world-model is consistent with the (limited) data they’ve received.