I failed to recreate the 1996 Space Jam website with Claude

433 points • by thecr0w • yesterday at 5:18 PM • 361 comments • view on HN

Comments

Well this was interesting. As someone who was actually building similar website in the late 90's I threw this into the Opus 4.5. Note the original author is wrong about the original site however:

"The Space Jam website is simple: a single HTML page, absolute positioning for every element, and a tiling starfield GIF background.".

This is not true, the site is built using tables, not positioning at all, CSS wasn't a thing back then...

Here was its one-shot attempt at building the same type of layout (table based) with a screenshot and assets as input: https://i.imgur.com/fhdOLwP.png

➕ show 4 replies

stared • today at 9:18 AM

Just use Playwright Skill (https://github.com/lackeyjb/playwright-skill). It is a game changer. Otherwise it is Claude the Blind, as OP mentioned.

thuttinger • yesterday at 7:44 PM

Claude/LLMs in general are still pretty bad at the intricate details of layouts and visual things. There are a lot of problems that are easy to get right for a junior web dev but impossible for an LLM. On the other hand, I was able to write a C program that added gamma color profile support to linux compositors that don't support it (in my case Hyprland) within a few minutes! A - for me - seemingly hard task, which would have taken me at least a day or more if I didn't let Claude write the code. With one prompt Claude generated C code that compiled on first try that:

- Read an .icc file from disk

- parsed the file and extracted the VCGT (video card gamma table)

- wrote the VCGT to the video card for a specified display via amdgpu driver APIs

The only thing I had to fix was the ICC parsing, where it would parse header strings in the wrong byte-order (they are big-endian).

➕ show 3 replies

smoghat • yesterday at 7:12 PM

Ok, so here is an interesting case where Claude was almost good enough, but not quite. But I’ve been amusing myself by taking abandoned Mac OS programs from 20 years ago that I find on GitHub and bringing them up to date to work on Apple silicon. For example, jpegview, which was a very fast and simple slideshow viewer. It took about three iterations with Claude code before I had it working. Then it was time to fix some problems, add some features like playing videos, a new layout, and so on. I may be the only person in the world left who wants this app, but well, that was fine for a day long project that cooked in a window with some prompts from me while I did other stuff. I’ll probably tackle scantailor advanced next to clean up some terrible book scans. Again, I have real things to do with my time, but each of these mini projects just requires me to have a browser window open to a Claude code instance while I work on more attention demanding tasks.

➕ show 2 replies

yosito • today at 4:04 AM

This has been my experience with almost everything I've tried to create with generative AI, from apps and websites, to photos and videos, to text and even simple sentences. At first glance, it looks impressive, but as soon as you look closer, you start to notice that everything is actually just sloppy copy.

That being said, sloppy copy can make doing actual work a lot faster if you treat it with the right about of skepticism and hand-holding.

It's first attempt at the Space Jam site was close enough that it probably could have been manually fixed by an experienced developer in less time than in takes to write the next prompt.

➕ show 1 reply

badlogic • today at 9:02 AM

Loved the fun write up. Now that we know that LLM-based vision is lossy, here's a different challenge:

Give the LLM access to the site's DOM and let it recreate the site with modern CSS. LLMs are much better with source code, aka text, right? :)

sqircles • yesterday at 8:43 PM

> The Space Jam website is simple: a single HTML page, absolute positioning for every element...

Absolute positioning wasn't available until CSS2 in 1998. This is just a table with crafty use of align, valign, colspan, and rowspan.

➕ show 2 replies

charcircuit • today at 1:25 AM

>I'd like to preserve this website forever and there's no other way to do it besides getting Claude to recreate it from a screenshot.

There are other ways such as downloading an archive and the preserving the file in one or more cloud storages.

https://archive.is/download/cXI46.zip

➕ show 1 reply

ettsvensktlogin • today at 8:52 AM

This was very interesting. I've tried to create an "agent" Claude Code based system to generate design from screenshots, using Playwright and other tools to take screenshots for iterative improvements. So far I have failed despite weeks of struggles.

Thanks to this post I now have a deeper understanding as to why. Thank you.

sigseg1v • yesterday at 6:20 PM

Curious if you've tested something such as:

- "First, calculate the orbital radius. To do this accurately, measure the average diameter of each planet, p, and the average distance from the center of the image to the outer edge of the planets, x, and calculate the orbital radius r = x - p"

- "Next, write a unit test script that we will run that reads the rendered page and confirms that each planet is on the orbital radius. If a planet is not, output the difference you must shift it by to make the test pass. Use this feedback until all planets are perfectly aligned."

➕ show 4 replies

p0w3n3d • today at 7:41 AM

LLM stands for large LANGUAGE models, so I guess you could succeed if you had a correct LANGUAGE. Maybe radial coordinates? Or turtle graphics? I myself tried to generate an SVG with twelve radial dots as in a clock in chatgpt, and failed (a year ago). Now I think it would succeed, however still the question is does it succeed because people trained it to do so.

Also I have noticed that AI generates things close to what you want, and it sticks really hard to that "close" qualifier, not wanting to cross any borders to get too close, so I'd be happy with the effect you have shown, as it is what AI does

voodooEntity • today at 7:40 AM

THanks for sharing this. Partly because i forgot about this great website :D also because i would never thought of giving this as an LLM task because its so simple that i prolly just had hacked it down myself :D

I recently experimented alot with agentic coding (mostly with gemini+ intellij plugin, copilot intellij plugin and intellij's own junie) and also condsidered to give it a try and feed images to the AI, but than all tasks i tried so far were pure backend-ish so it never came to the point.

Im really curious how especially junie will act and i will give it a try with the very same task you gave it. We gonne see how it ends :D

999900000999 • yesterday at 6:05 PM

Space Jam website design as an LLM benchmark.

This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.

I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.

➕ show 4 replies

Wowfunhappy • yesterday at 5:57 PM

Claude is not very good at using screenshots. The model may technically be multi-modal, but its strength is clearly in reading text. I'm not surprised it failed here.

➕ show 3 replies

manlymuppet • yesterday at 11:13 PM

Couldn’t you just feed Claude all the raw, inspect element HTML from the website and have it “decrypt” that?

The entire website is fairly small so this seems feasible.

Usually there’s a big difference between a website’s final code and its source code because of post processing but that seems like a totally solvable Claude problem.

Sure LLMs aren’t great with images, but it’s not like the person who originally wrote the Space Jam website was meticulously messing around with positioning from a reference image to create a circular orbit — they just used the tools they had to create an acceptable result. Claude can do the same.

Perhaps the best method is to re-create, rather than replicate the design.

➕ show 4 replies

soared • yesterday at 7:36 PM

I got quite close with Gemini 3 pro in AI studio. I uploaded a screenshot (no assets) and the results were similar to OP. It failed to follow my fix initially but I told it to follow my directions (lol) and it came quite close (though portrait mode distorted it, landscape was close to perfect.

“Reference the original uploaded image. Between each image in the clock face, create lines to each other image. Measure each line. Now follow that same process on the app we’ve created, and adjust the locations of each image until all measurements align exactly.”

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

liampulles • today at 7:31 AM

It seems to me that Claude's error here (which is not unique to it) is self-sycophancy. The model is too eager to convince itself it did a good job.

I'd be curious to hear from experienced agent users if there is some AGENTS.md stuff to make the LLM more clear speaking? I wonder if that would impact the quality of work.

➕ show 1 reply

mxfh • today at 3:21 AM

Everything feels wrong with that approach too me, starting with calling a perfectly time-appropriate website anachronistic.

Anachronistic would be something like creating an apparent flash website for a fictional 90s internet related movie.

➕ show 1 reply

pfix • yesterday at 7:47 PM

I checked the source of the original (like maybe many of you) to check how they actually did it and it was... simpler than expected. I drilled myself so hard to forget tables as layout... And here it is. So simple it's a marvel.

➕ show 1 reply

buchwald • today at 12:09 AM

Claude is surprisingly bad at visual understanding. I did a similar thing to OP where I wanted Claude to visually iterate on Storybook components. I found outsourcing the visual check to Playwright in vision mode (as opposed to using the default a11y tree) and Codex for understanding worked best. But overall the idea of a visual inspection loop went nowhere. I blogged about it here: https://solbach.xyz/ai-agent-accessibility-browser-use/

➕ show 1 reply

jdironman • today at 3:46 AM

I am going to give this a shot, but using a method I have been using lately with subagents. Basically, what I do is have it create an Architect, Executor, Adjudicator subagents. Architect breaks any ask down into atomic and testable subtasks that take 1-3 minutes 'dev' time. Executor (can spawn more than one) implements them. Then adjudicator reviews that they are to spec / requirements. This all happens in subagent files + a runbook.json in the .claude folder of a project. Its based on a paper* that was featured on here a while back actually [1].

[1] https://arxiv.org/abs/2511.09030

ErrantX • yesterday at 10:04 PM

I just feel this is a great example of someone falling into the common trap of treating an LLM like a human.

They are vastly less intelligent than a human and logical leaps that make sense to you make no sense to Claude. It has no concept of aesthetics or of course any vision.

All that said; it got pretty close even with those impediments! (It got worse because the writer tried to force it to act more like a human would)

I think a better approach would be to write a tool to compare screenshots, identity misplaced items and output that as a text finding/failure state. claude will work much better because your dodging the bits that are too interpretive (that humans rock at and LLMs don't)

➕ show 2 replies

anorwell • yesterday at 10:00 PM

The article does not say at any point which model was used. This is the most basic important information when talking about the capabilities of a model, and probably belongs in the title.

➕ show 1 reply

torginus • yesterday at 11:33 PM

Not sure how good Claude is nowadays, but I remember using Claude 3.5 to do some fiction writing and for a while I thought it was amazing at coming up with plots, setting ideas, writing witty dialogue - then after a short while I noticed it kept recycling the same ideas, phrases etc, quickly becoming derivative, and having 'tells', similar to the group of 3 quirk, with some otherwise decent writing patterns showing up with great frequency.

I've heard the same thing about it doing frontends - it produces gorgeous websites but it has similar 'tells', it does CSS and certain features the same way, and if you have a very concrete idea of what you want out of it, you'll end up fighting an uphill battle with it constantly trying to do things its own way.

Which is part of the 'LLM illusion' - I guess. To an unskilled individual, or when starting from scratch, it seems great, but the more complex the project gets, the harder it becomes to have it contribute meaningfully, leading to an ever-mounting frustration, and eventually me just giving up and doing it by hand.

➕ show 1 reply

ajasmin • today at 5:05 AM

I'm actually surprised Claude was about to do that much.

I hadn't even considered handing it a visual mockup to work from. Event though that workflow is par for the course for any web design team.

I would assume there must be at least some prior work into locating individual assets in a larger canvas. It just needs to be integrated into the pipeline.

daemonologist • yesterday at 6:35 PM

Interesting - these models are all trained to do pixel-level(ish) measurement now, for bounding boxes and such. I wonder if you could railroad it into being accurate with the right prompt.

➕ show 2 replies

handedness • today at 4:35 AM

A site in '96 would have been built largely with tables, not CSS. CSS didn't become a thing until a couple of years later.

I know this because I'm still salty about the transition. For all of CSS's advantages, we lost something when we largely moved away from tables.

➕ show 2 replies

1970-01-01 • today at 4:34 AM

This is a great under the radar test for AI. I would put money on it failing to recreate the majority of 90s movie websites as it wasn't trained on them. The old cowboy webmasters that built and ultimately abandoned them didn't write many books on the topic.

960design • yesterday at 9:15 PM

Claude argued with me about the quadratic equation the other day. It vehemently felt a -c was required whereas a c was the correct answer. I pointed this out showing step by step and it finally agreed. I tried Grok to see if it could get it right. Nope, the exact same response as Claude, but Grok never backed down; even after the step by step explanation of the maths.

➕ show 2 replies

shortformblog • yesterday at 8:45 PM

Claude can't properly count the number of characters in a sentence. It's asking a lot to assume it can get pixel perfect.

pluc • yesterday at 7:18 PM

I like how the author calls a script on the internet "him".

➕ show 3 replies

stwsk • yesterday at 8:09 PM

>Look, I still need this Space Jam website recreated.

Now that's a novel sentence if I've ever read one.

victorbuilds • yesterday at 9:50 PM

Building something similar - using Claude API to generate mini games from text descriptions (https://codorex.com, still pretty rough).

Can confirm: Claude is weirdly good at generating functional game logic from vague prompts, but spatial precision is a constant battle. Anything involving exact pixel positions needs validation/correction layers on top.

The suggestion upthread about having it write its own measurement tools seems promising - haven't tried that approach yet.

➕ show 1 reply

mr_windfrog • today at 2:51 AM

Maybe we could try asking Claude to generate code using <table>, <tr>, <td> for layout instead of relying on div + CSS. Feels like it could simplify things a lot.

Would this actually work, or am I missing something?

➕ show 1 reply

simonw • yesterday at 8:54 PM

I wonder if Gemini 3 Pro would do better at this particular test? They're very proud of its spatial awareness and vision abilities.

Aeolun • yesterday at 10:25 PM

I think claude could have easily used a script to calculate the positions of the planets exactly here, instead of trying to use the frankly horrible image recognition.

➕ show 1 reply

rickcarlino • yesterday at 9:32 PM

I look forward to an alternative reality where AI vendors race to have the model with the best Space Jam Bench scores.

➕ show 1 reply

nickdothutton • yesterday at 8:13 PM

I have recently been working on something "fun" in the terminal that mingles plain ASCII, ANSI "graphics", actual bitmaps (Sixel), and Nerdfonts in a TUI framework (Charm etc). After a week of smashing Claude's head against a wall, which is better than smashing my own, I've had to significantly alter my hopes and expectations.

vmg12 • yesterday at 8:00 PM

We don't know how to build it anymore

manmal • yesterday at 10:18 PM

I would put Claude into a loop and let it make screenshots itself, diffing them against the original screenshot, until it has found the right arrangement at the planets‘ starting position (pixel perfect match).

➕ show 2 replies

syassami • yesterday at 7:10 PM

We've lost the capability to build such marvels.

https://knowyourmeme.com/memes/my-father-in-law-is-a-builder...

➕ show 1 reply

johnfn • yesterday at 8:37 PM

Context is king. The problem is that you are the one currently telling Claude how close it is and what to do next. But if you give it the tools to do that itself, it will make a world of difference.

Give Claude a way to iteratively poke at what it created (such as a playwright harness), and screenshot of what you want, and maybe a way to take a screenshot in Playwright and I think you will get much closer. You might even be able to one shot it.

I’ve always wondered what would happen if I gave it a screenshot and told it to iterate until the Playwright screenshot matched the mock screenshot, pixel perfect. I imagine it would go nuts, but after a few hours I think it would likely get it. (Either that or minor font discrepancies and rounding errors would cause it to give up…)

➕ show 1 reply

jacobsenscott • yesterday at 7:40 PM

> here's no other way to do it besides getting Claude to recreate it from a screenshot

And

> I'm an engineering manager

I can't tell if this is an intentional or unintentional satire of the current state of AI mandates from management.

➕ show 4 replies

phplovesong • today at 7:31 AM

This basically boils down to AI being unable to "center a div". I see this very often, AI generated slop is has LOTS of "off by one" kind of bugs.

subleq • yesterday at 11:50 PM

What if you gave it an image comparison tool that would xor two screenshots to check its work?

➕ show 1 reply

bdcravens • yesterday at 6:33 PM

A comparison would Codex would be good. I haven't done it with Codex, but when working through problems using ChatGPT, it does a great job when given screenshots.

RagnarD • yesterday at 9:47 PM

Why not just feed it the actual instructions that create the site - the page source code, the HTML, CSS, JS if any?

➕ show 1 reply

sema4hacker • yesterday at 9:06 PM

> The total payload is under 200KB.

Just out of curiosity, how big was what you considered Claude's best attempt to be?

micromacrofoot • yesterday at 6:26 PM

I wouldn't call it entirely defeated, it got maybe 90% of the way there. Before LLMs you couldn't get 50% of the way there in an automated way.

> What he produces

I feel like personifying LLMs more than they currently are is a mistake people make (though humans always do this), they're not entities, they don't know anything. If you treat them too human you might eventually fool yourself a little too much.

➕ show 2 replies

neuroelectron • yesterday at 10:24 PM

My web-dev friend saw the original Space Jam site. I asked him what it would cost to build something like that today. He paused and said:

We can’t. We don’t know how to do it.

alt Hacker News

I failed to recreate the 1996 Space Jam website with Claude

Comments

🔗 View 31 more comments