logoalt Hacker News

10xDevtoday at 11:06 AM1 replyview on HN

Whether it is text or an image, it is just bits for a computer. A token can represent anything.


Replies

A_D_E_P_Ttoday at 11:13 AM

Sure, but don't conflate the representation format with the structure of what's being represented.

Everything is bits to a computer, but text training data captures the flattened, after-the-fact residue of baseline human thought: Someone's written description of how something works. (At best!)

A world model would need to capture the underlying causal, spatial, and temporal structure of reality itself -- the thing itself, that which generates those descriptions.

You can tokenize an image just as easily as a sentence, sure, but a pile of images and text won't give you a relation between the system and the world. A world model, in theory, can. I mean, we ought to be sufficient proof of this, in a sense...

show 1 reply