logoalt Hacker News

999900000999last Sunday at 6:05 PM4 repliesview on HN

Space Jam website design as an LLM benchmark.

This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.

I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.


Replies

GeoAtreideslast Sunday at 7:02 PM

>which is something OP can manually fix

what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong? that's the main issue: if it fails here, it will fail with other things, in not such obvious ways.

show 2 replies
smallnixlast Sunday at 6:20 PM

That's not the point of the article. It's about Claude/LLM being overconfident in recreating pixel perfect.

show 1 reply
bigstrat2003last Sunday at 7:25 PM

> it just can't get the order right which is something OP can manually fix.

If the tool needs you to check up on it and fix its work, it's a bad tool.

show 4 replies
thecr0wlast Sunday at 6:07 PM

ya, this is true. Another commenter also pointed out that my intention was to one-shot. I didn't really go too deeply into trying to try multiple iterations.

This is also fairly contrived, you know? It's not a realistic limitation to rebuild HTML from a screenshot because of course if I have the website loaded I can just download the HTML.

show 2 replies