Whether it is text or an image, it is just bits for a computer. A token can represent anything.

10xDev • today at 11:06 AM • 1 reply • view on HN

Replies

Sure, but don't conflate the representation format with the structure of what's being represented.

Everything is bits to a computer, but text training data captures the flattened, after-the-fact residue of baseline human thought: Someone's written description of how something works. (At best!)

A world model would need to capture the underlying causal, spatial, and temporal structure of reality itself -- the thing itself, that which generates those descriptions.

You can tokenize an image just as easily as a sentence, sure, but a pile of images and text won't give you a relation between the system and the world. A world model, in theory, can. I mean, we ought to be sufficient proof of this, in a sense...

➕ show 1 reply

alt Hacker News

Replies