logoalt Hacker News

deburoyesterday at 6:06 PM0 repliesview on HN

A token is probably not a single char, and an image is probably decomposed into tokens as well (and god knows how many tokens an image is decomposed into) which probably map to similar float-hungry vectors. Your counterargument could use a bit more flesh.

And we're talking about images of texts, not images that represent complex imagery such as a very detailed scene or what have you.