Space Jam website design as an LLM benchmark. This article is a bit negative. Claude gets close , ...

999900000999 • last Sunday at 6:05 PM • 4 replies • view on HN

Space Jam website design as an LLM benchmark.

This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.

I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.

Replies

GeoAtreides • last Sunday at 7:02 PM

>which is something OP can manually fix

what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong? that's the main issue: if it fails here, it will fail with other things, in not such obvious ways.

➕ show 2 replies

smallnix • last Sunday at 6:20 PM

That's not the point of the article. It's about Claude/LLM being overconfident in recreating pixel perfect.

➕ show 1 reply

bigstrat2003 • last Sunday at 7:25 PM

> it just can't get the order right which is something OP can manually fix.

If the tool needs you to check up on it and fix its work, it's a bad tool.

➕ show 4 replies

thecr0w • last Sunday at 6:07 PM

ya, this is true. Another commenter also pointed out that my intention was to one-shot. I didn't really go too deeply into trying to try multiple iterations.

This is also fairly contrived, you know? It's not a realistic limitation to rebuild HTML from a screenshot because of course if I have the website loaded I can just download the HTML.

➕ show 2 replies

alt Hacker News

Replies