I actually did read your post and it read to me as if it put in the background to get around some sort of issue with how you took the screenshot, but had still placed the images and text on top. What you wrote was:
So it kind of cheated, though it clearly felt angst about
it. After trying a few ways to get the stars to line up
perfectly, it just gave up and copied the screenshot in
as the background image, then overlaid the rest of the
HTML elements on top.
Which is only "kind of" cheating if only the background was wrong. Everything being invisible and in the wrong spot doesn't seem like merely "kind of" cheating. You didn't come anywhere close to addressing the problem that the original post was about.
The index_tiled.html version correctly positions the original assets, and to me looks as close as you can get to the screenshot while using the original assets (except for the red text).
The version with the screenshot as a background is where it was asked to create an exact match for screenshot that had been scaled/compressed, which isn't really possible any other way. The article acknowledges this one as cheating.
Better I think would've been to retake the screenshot without the scaling/compression, to see if it can create a site that is both an exact match and using the original assets.
I think it probably gets you 80% but the last 20% of pixel perfection seems to evade Claude. But I'm pretty new to writing prompts so if you can nail it let me know and I'll link you in the post.
Oh what the heck. That worked really well for you. Would you be willing to recreate all the html and push it up to github? I'll drop the repo at the top of the blog post. It would be really cool for me to see this completely done and a great way to finish out the blog post. I obviously couldn't do it.
spacejam-1996.png is a full screenshot of the Space Jam 1996
landing page. We want to recreate this landing page as faithfully
as possible, matching the screenshot exactly.
The asset directory contains images extracted from the original
site. One of the images is tiled as the background of the landing
page. The other images should appear once in the screenshot. Use
these images as assets.
Precise positioning is very important for this project, so you
should writing a script that finds the precise location of each
asset image in screenshots. Use the tool to detect precise
positions in the target and fine tune the generated webpage. Be
sure to generate diagnostic images that can be easily reviewed by
a human reviewer.
Use python 3.13 and uv to create a venv while working.
I just let Claude (Opus 4.5) do anything it wanted to do as it went.
At this point all the image assets are pixel perfect but the footer is in the wrong place and I had to hold Claude's hand a bit to get the footer into the approximately correct spot:
I noticed you were struggling to find the position of the footer
text. You could try rendering two versions of the generated page, the
second time with the footer text black. Subtracting those two images
should give you a clean view of the footer text.
At this point Claude was having trouble because its hadn't got a clean view of the target text location in the original screenshot (it was creating scripts that look at the red channel in the bottom half of the image to pull out the text but that was also grabbing part of the site map logo. Interestingly it made a comment about this but didn't do anything about it). So I gave it this additional hint:
You are getting confused with the site map when analyzing the
original screenshot. You could blank out the positions of assets
so that they are not interfering with your analysis.
This got the footer in the correct location but the fonts/font sizes etc are not correct yet.
It's now got everything close after adding this final prompt:
We are very close. The footer is positioned in roughly the correct location
but the fonts, font sizes, font color and line spacings are all slightly
off.
This took quite a while and it build a few more tools to get there. And this was fine from a distance but it was using a san-serif when the screenshot has a serif etc. So I decided to push. From here it got very messy...
One of the issues is that Claude's text detection was getting tripped up by writing scripts using RGB space instead of something more hue-aware. It knew the text was red but was trying to isolate it by just looking at the red channel. But the grey dots from the background show up bright in the red channel so Claude would think those were center dots between the links that needed to be reproduced in the text. I gave it a hint:
I think dots from the background image are causing issues. Are you detecting the text
by looking only at the red channel in RGB space? The red channel will be bright on
white pixels in RGB. You could try using hue to separate text from background or use
distance from the target RGB value.
Claude decided to switch to HSV space. But it took quite a bit of effort to keep Claude remembering to use HSV because tools it had already written were still RGB and not updated (as were intermediate images that were not updated). Then it would try to step back and get a big picture as a sanity check and "discover" it had missed the dots that are obviously there. And when you would tell it there are no dots, you get the "You're absolutely right! They're vertical bars!" So it was a struggle. This is the closest I got:
Again, the top image stuff was done in the first shot with the prompt in the first one. Everything else has been about the footer. Claude has been writing a lot of clever scripts to measure font metrics and pick fonts etc, but it keeps falling over those dots. I could probably get it to work better with adding directives for text handling to CLAUDE.md and nuking context and some of the scripts it created.
Asking your favorite LLM actually helps a lot. They generally are well trained on LLM papers unsurprisingly. In this case though it’s important to realize the LLM is incapable of seeing or hearing or reading. Everything has to be transformed into a vector space. Images are generally cut into patches (like 16x16) which are themselves transformed by several neural networks to convert them into a semantic space represented by the models parameters.
But this isn’t hugely different than your vision. You don’t see the pixel grid either. You have to use tools to measure things. You have the ability over time to iteratively interact with the image by perhaps counting grid lines but the LLM does not - it’s a one shot inference against this highly transformed image. They’ve gotten better at complex visual tasks including types of counting, but it’s not able to examine the image in any analytical way or even in its original representation. It’s just not possible.
It can however make tools that can. It’s very good at working with PIL and other image processing libraries or even writing image processing code de novo, and then using those to ground itself. Likewise it can not do math, but it can write a calculator that can do highly complex mathematics on its behalf.
It does (unless the previous comment was edited? Currently it says Opus 4.1): https://www.anthropic.com/news/claude-opus-4-1. You can see it in the 'more models' list on the main Claude website, or in Claude Console.
reply