Space Jam website design as an LLM benchmark.
This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.
I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.
That's not the point of the article. It's about Claude/LLM being overconfident in recreating pixel perfect.
> it just can't get the order right which is something OP can manually fix.
If the tool needs you to check up on it and fix its work, it's a bad tool.
ya, this is true. Another commenter also pointed out that my intention was to one-shot. I didn't really go too deeply into trying to try multiple iterations.
This is also fairly contrived, you know? It's not a realistic limitation to rebuild HTML from a screenshot because of course if I have the website loaded I can just download the HTML.
>which is something OP can manually fix
what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong? that's the main issue: if it fails here, it will fail with other things, in not such obvious ways.