logoalt Hacker News

gkbrkyesterday at 10:31 AM1 replyview on HN

LLMs are increasingly trained on images for multi-modal learning, so they too would have seen one object, then two.


Replies

gloosxyesterday at 5:24 PM

They never saw any kind of object, they only saw labeled groups of pixels – basic units of a digital image, representing a single point of color on a screen or in a digital file. Object is a material thing that can be seen and touched. Pixels are not objects.

show 2 replies