Antigravity may well Top the whatever benchmark but:
My Antigravity (forced) replacement for Gemini CLI requires me to log on via browser every time I use it, and my Antigravity IDE won't update at all, so:
If it's ok I'd prefer they just work on reaching a baseline acceptable rollout before worrying about being Top in anything.
Ps actual title:
OpenSCAD LLM Benchmark: Building the Pantheon
I've run a tons of benchmarks for OpenSCAD for all kinds of models and setups, and what I realised is:
- Models are very jagged (might excel in one type of 3d model, but not another)
- Gemini models are the least jagged in my experience and have the best image understanding
- Gemini models are also the most creative (which may be undesirable if you want precise CAD part)
- Overall this benchmark doesn't prove much because one 3d model (and one attempt) is just not enough. I am usually testing on at least a dozen models each generated 3 times, but should really do much more, but it's too pricey for a solo dev.
Still, thanks for publishing this. Will be definitely run flash 3.5 soon to see how it performs.
Creating a single real-world object and declaring it a benchmark? No, it doesn't work that way for a robust tool. You need to do something like Iron Chef, with a Greek architecture theme and and a panel or judge that declares the winner. This is just seeing which tool subjectively makes the best looking Pantheon.
Still a long way from shorting Autodesk.
As a side note Autodesk released an agentic assistant back in December for Fusion. Six months later it is still quite bad.
I have been using GPT 5.5 to build a video game. Benchmark sounds about right. It generates assets and sprite good enough, if not closer to AAA level games. Will check antigravity now.
Going to try it. just downloaded. will see how it is compared to Claude Code
It's crazy how I can see articles like this, but in my practical every day use antigravity is a horrible consumer experience. The TUI is broken. You cannot type input while the model is outputting text, otherwise both get messed up and the the TUI renders a sickly blob of text. There are no keyboard shortcuts to switch between planning and execution mode, or a way to directly load skills.
The usage limits are too aggressive, too. I tried to generate a quick Deno Fresh website to act as a a redirect to my GitHub from socials (literally the simplest possible thing I could have asked of it) and it chewed through my five hour limit in tokens from scaffolding.
To me, as a developer of CLI developer tooling, its obvious not a lot of thought or testing went into this product, but as Google has said before: the models are the product".
So, does it mean Antigravity is better than Claude code with opus model? Given this benchmark. I once tried Antigravity and it was just disappointing.
Claude Code 2.1 / Opus 4.7 looks best to me: Dome and ceiling structure is correcter than the others.
Why is this medium ranked, and not on par with the best two?
The only thing faster moving that AI these days are the goalposts. Three years ago we would have been amazed if models were able to produce anything, now we have the luxury of nitpicking. Even the worst entries in the benchmark are quite impressive.
Why are specialized CAD making LLM models not showing up? In future are we going to have same model for everything? from programming to creative writing to CADs?
This would be the same Antigravity 2.0 that "surprise, no longer an IDE, did I forget to mention that? Lolol."
Why Codex GPT-5.5 High instead of Extra High, I wonder?
To be brutally honest, I'm disappointed with antiGravity. It feels incredibly unGoogle-like. The AI billing models are fragmented, and the AntiGravity IDE is currently tripping over something as trivial as a basic Electron deployment config bug.
Don't get me wrong, I don't think AI coding is a bad thing. For East Asians like myself, it levels the playing field with Westerners, so as long as you rigorously review the AI's output, it's a perfectly viable tool.
However, the absolute farce we just witnessed with the antiGravity2.0 update really raises doubts about whether 'vibe coding' can actually be trusted. If even a behemoth like Google is dropping the ball like this, it says a lot.
And yet 300+140=460. A very jagged surface indeed. https://gemini.google.com/share/c2a187275e26
Next month they'll be beaten again.
And next year Google will probably sunset Antigravity.
If it doesn't make Google billions, don't trust them.
[flagged]
[flagged]
[dead]
[flagged]
[dead]
Why are half of the comments on Hackernews stereotypical AI-bros whose lives revolve around tech, and the other half sceptical commentators whose lives also revolve around tech but they are disappointed with its performance?!
Where are the normal people :/
google..no thanks
Last weekend I bought my wife a bike off marketplace. It was in good condition but was missing one of the internal cable routing grommets. I gave Claude pictures of the pill-shaped hole by itself and with my digital calipers in the long and short directions.
Gave it a short prompt and it gave me an openscad model with everything parametrized. I printed with no changes in tpu and it was nearly perfect on the first try. Claude put in a 0.3mm subtraction in the x/y dimensions and I lowered it to 0.1 and it's perfect.
Much easier shape than ancient Roman architecture but still very cool how easy it was.