logoalt Hacker News

tartorantoday at 2:16 AM4 repliesview on HN

How could they be any good at visuals? They are trained on text after all.


Replies

comextoday at 2:25 AM

Supposedly the frontier LLMs are multimodal and trained on images as well, though I don't know how much that helps for tasks that don't use the native image input/output support.

Whatever the cause, LLMs have gotten significantly better over time at generating SVGs of pelicans riding bicycles:

https://simonwillison.net/tags/pelican-riding-a-bicycle/

But they're still not very good.

show 1 reply
astrangetoday at 2:30 AM

Claude is multimodal and can see images, though it's not good at thinking in them.

msephtontoday at 2:18 AM

Shapes can be described as text or mathematical formulas.

tempest_today at 2:26 AM

An SVG is just text.