LLMs are increasingly trained on images for multi-modal learning, so they too would have seen one ob...

gkbrk • yesterday at 10:31 AM • 1 reply • view on HN

LLMs are increasingly trained on images for multi-modal learning, so they too would have seen one object, then two.

Replies

They never saw any kind of object, they only saw labeled groups of pixels – basic units of a digital image, representing a single point of color on a screen or in a digital file. Object is a material thing that can be seen and touched. Pixels are not objects.

➕ show 2 replies

alt Hacker News

Replies