Supposedly the frontier LLMs are multimodal and trained on images as well, though I don't know how much that helps for tasks that don't use the native image input/output support.
Whatever the cause, LLMs have gotten significantly better over time at generating SVGs of pelicans riding bicycles:
Supposedly the frontier LLMs are multimodal and trained on images as well, though I don't know how much that helps for tasks that don't use the native image input/output support.
Whatever the cause, LLMs have gotten significantly better over time at generating SVGs of pelicans riding bicycles:
https://simonwillison.net/tags/pelican-riding-a-bicycle/
But they're still not very good.