How could they be any good at visuals? They are trained on text after all.

tartoran • today at 2:16 AM • 4 replies • view on HN

Replies

Supposedly the frontier LLMs are multimodal and trained on images as well, though I don't know how much that helps for tasks that don't use the native image input/output support.

Whatever the cause, LLMs have gotten significantly better over time at generating SVGs of pelicans riding bicycles:

https://simonwillison.net/tags/pelican-riding-a-bicycle/

But they're still not very good.

➕ show 1 reply

astrange • today at 2:30 AM

Claude is multimodal and can see images, though it's not good at thinking in them.

msephton • today at 2:18 AM

Shapes can be described as text or mathematical formulas.

tempest_ • today at 2:26 AM

An SVG is just text.

alt Hacker News

Replies