Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

186 points • by jetter • today at 10:38 AM • 78 comments • view on HN

Comments

Last weekend I bought my wife a bike off marketplace. It was in good condition but was missing one of the internal cable routing grommets. I gave Claude pictures of the pill-shaped hole by itself and with my digital calipers in the long and short directions.

Gave it a short prompt and it gave me an openscad model with everything parametrized. I printed with no changes in tpu and it was nearly perfect on the first try. Claude put in a 0.3mm subtraction in the x/y dimensions and I lowered it to 0.1 and it's perfect.

Much easier shape than ancient Roman architecture but still very cool how easy it was.

➕ show 4 replies

mellosouls • today at 11:13 AM

Antigravity may well Top the whatever benchmark but:

My Antigravity (forced) replacement for Gemini CLI requires me to log on via browser every time I use it, and my Antigravity IDE won't update at all, so:

If it's ok I'd prefer they just work on reaching a baseline acceptable rollout before worrying about being Top in anything.

Ps actual title:

OpenSCAD LLM Benchmark: Building the Pantheon

➕ show 6 replies

ponyous • today at 2:05 PM

I've run a tons of benchmarks for OpenSCAD for all kinds of models and setups, and what I realised is:

- Models are very jagged (might excel in one type of 3d model, but not another)

- Gemini models are the least jagged in my experience and have the best image understanding

- Gemini models are also the most creative (which may be undesirable if you want precise CAD part)

- Overall this benchmark doesn't prove much because one 3d model (and one attempt) is just not enough. I am usually testing on at least a dozen models each generated 3 times, but should really do much more, but it's too pricey for a solo dev.

Still, thanks for publishing this. Will be definitely run flash 3.5 soon to see how it performs.

1970-01-01 • today at 1:40 PM

Creating a single real-world object and declaring it a benchmark? No, it doesn't work that way for a robust tool. You need to do something like Iron Chef, with a Greek architecture theme and and a panel or judge that declares the winner. This is just seeing which tool subjectively makes the best looking Pantheon.

➕ show 1 reply

dhfbshfbu4u3 • today at 11:32 AM

Still a long way from shorting Autodesk.

As a side note Autodesk released an agentic assistant back in December for Fusion. Six months later it is still quite bad.

➕ show 3 replies

debarshri • today at 11:51 AM

I have been using GPT 5.5 to build a video game. Benchmark sounds about right. It generates assets and sprite good enough, if not closer to AAA level games. Will check antigravity now.

➕ show 1 reply

Onplana • today at 1:45 PM

Going to try it. just downloaded. will see how it is compared to Claude Code

u8 • today at 2:00 PM

It's crazy how I can see articles like this, but in my practical every day use antigravity is a horrible consumer experience. The TUI is broken. You cannot type input while the model is outputting text, otherwise both get messed up and the the TUI renders a sickly blob of text. There are no keyboard shortcuts to switch between planning and execution mode, or a way to directly load skills.

The usage limits are too aggressive, too. I tried to generate a quick Deno Fresh website to act as a a redirect to my GitHub from socials (literally the simplest possible thing I could have asked of it) and it chewed through my five hour limit in tokens from scaffolding.

To me, as a developer of CLI developer tooling, its obvious not a lot of thought or testing went into this product, but as Google has said before: the models are the product".

anony-123 • today at 1:49 PM

So, does it mean Antigravity is better than Claude code with opus model? Given this benchmark. I once tried Antigravity and it was just disappointing.

a3w • today at 12:03 PM

Claude Code 2.1 / Opus 4.7 looks best to me: Dome and ceiling structure is correcter than the others.

Why is this medium ranked, and not on par with the best two?

➕ show 2 replies

ReptileMan • today at 11:16 AM

The only thing faster moving that AI these days are the goalposts. Three years ago we would have been amazed if models were able to produce anything, now we have the luxury of nitpicking. Even the worst entries in the benchmark are quite impressive.

➕ show 3 replies

faangguyindia • today at 11:41 AM

Why are specialized CAD making LLM models not showing up? In future are we going to have same model for everything? from programming to creative writing to CADs?

➕ show 2 replies

megiddo • today at 12:12 PM

This would be the same Antigravity 2.0 that "surprise, no longer an IDE, did I forget to mention that? Lolol."

➕ show 1 reply

dilap • today at 2:36 PM

Why Codex GPT-5.5 High instead of Extra High, I wonder?

jdw64 • today at 11:57 AM

To be brutally honest, I'm disappointed with antiGravity. It feels incredibly unGoogle-like. The AI billing models are fragmented, and the AntiGravity IDE is currently tripping over something as trivial as a basic Electron deployment config bug.

Don't get me wrong, I don't think AI coding is a bad thing. For East Asians like myself, it levels the playing field with Westerners, so as long as you rigorously review the AI's output, it's a perfectly viable tool.

However, the absolute farce we just witnessed with the antiGravity2.0 update really raises doubts about whether 'vibe coding' can actually be trusted. If even a behemoth like Google is dropping the ball like this, it says a lot.

➕ show 2 replies

nycdatasci • today at 12:07 PM

And yet 300+140=460. A very jagged surface indeed. https://gemini.google.com/share/c2a187275e26

➕ show 2 replies

spiderfarmer • today at 11:23 AM

Next month they'll be beaten again.

And next year Google will probably sunset Antigravity.

If it doesn't make Google billions, don't trust them.

➕ show 1 reply

rizkimurtadha • today at 3:18 PM

[flagged]

robert_ddsbos • today at 2:00 PM

[flagged]

MarStudio • today at 1:10 PM

[dead]

eddyaipt • today at 1:10 PM

[flagged]

hacker_mar • today at 2:03 PM

[dead]

bobbycastorama • today at 12:08 PM

Why are half of the comments on Hackernews stereotypical AI-bros whose lives revolve around tech, and the other half sceptical commentators whose lives also revolve around tech but they are disappointed with its performance?!

Where are the normal people :/

➕ show 5 replies

beanjuiceII • today at 11:44 AM

google..no thanks

alt Hacker News

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Comments