Wow. I get that "how well can it make SVGs" isn't the (or a) gold standard for how useful a model is or isn't, but the fact the Gemma 4 26B A4B I'm running locally can blow it out of the water doesn't give me high confidence for the model. Maybe an unfair comparison, but...
It's so bad I don't want to spend the 18 EUR just to test it for a month. It can't even create an SVG of the facebook logo. There should be plenty of examples of that around.
Gemini fast could do that in under 5 seconds.
I'm curios: are you doing a real apples to apples comparison, or are you running a harness that already curates prompts? There's a far and wide margin how any of these models respond based on already loaded context. Most models are pretty much hot garbage until their context is curated appropiately.
It sounds like they focussed performance on not drawing svgs. Which honestly, makes a lot of sense to me.