llm install llm-mistral
llm mistral refresh
llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"
https://tools.simonwillison.net/svg-render#%3Csvg%20xmlns%3D...Pretty good for a 123B model!
(That said I'm not 100% certain I guessed the correct model ID, I asked Mistral here: https://x.com/simonw/status/1998435424847675429)
but can it recreate the spacejam 1996 website? https://www.spacejam.com/1996/jam.html
I think this benchmark could be slightly misleading to assess coding model. But still very good result.
Yes, SVG is code, but not in a sense of executable with verifiable inputs and outputs.
Skipped the bicycle entirely and upgraded to a sweet motorcycle :)
Is it really an svg if it’s just embedded base64 of a jpg
Impressive! I'm really excited to leverage this in my gooning sessions!
We are getting to the point that its not unreasonable to think that "Generate an SVG of a pelican riding a bicycle" could be included in some training data. It would be a great way to ensure an initial thumbs up from a prominent reviewer. It's a good benchmark but it seems like it would be a good idea to include an additional random or unannounced similar test to catch any benchmaxxing.