logoalt Hacker News

simonwyesterday at 4:45 PM7 repliesview on HN

  llm install llm-mistral
  llm mistral refresh
  llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"
https://tools.simonwillison.net/svg-render#%3Csvg%20xmlns%3D...

Pretty good for a 123B model!

(That said I'm not 100% certain I guessed the correct model ID, I asked Mistral here: https://x.com/simonw/status/1998435424847675429)


Replies

Jimmc414yesterday at 6:57 PM

We are getting to the point that its not unreasonable to think that "Generate an SVG of a pelican riding a bicycle" could be included in some training data. It would be a great way to ensure an initial thumbs up from a prominent reviewer. It's a good benchmark but it seems like it would be a good idea to include an additional random or unannounced similar test to catch any benchmaxxing.

show 6 replies
baqyesterday at 6:09 PM

but can it recreate the spacejam 1996 website? https://www.spacejam.com/1996/jam.html

show 3 replies
willahmadyesterday at 5:20 PM

I think this benchmark could be slightly misleading to assess coding model. But still very good result.

Yes, SVG is code, but not in a sense of executable with verifiable inputs and outputs.

show 2 replies
iberatoryesterday at 7:37 PM

Where did you get llm tool from?!

show 1 reply
cpursleyyesterday at 4:52 PM

Skipped the bicycle entirely and upgraded to a sweet motorcycle :)

show 2 replies
felixg3yesterday at 5:33 PM

Is it really an svg if it’s just embedded base64 of a jpg

show 1 reply
breedmesmnyesterday at 6:48 PM

Impressive! I'm really excited to leverage this in my gooning sessions!

show 1 reply