Hacker Newsnew | past | comments | ask | show | jobs | submit | thecr0w's commentslogin

yeah that sucks. I also linked it to the top of my post so folks can take a look.

Hey! You did it! I'm going to update my original blog post linking this one.

This version just pastes the screenshot as background.

Yes, anyone who reads the post will find that out because it says it in the post.

I actually did read your post and it read to me as if it put in the background to get around some sort of issue with how you took the screenshot, but had still placed the images and text on top. What you wrote was:

    So it kind of cheated, though it clearly felt angst about
    it. After trying a few ways to get the stars to line up 
    perfectly, it just gave up and copied the screenshot in 
    as the background image, then overlaid the rest of the 
    HTML elements on top.
Which is only "kind of" cheating if only the background was wrong. Everything being invisible and in the wrong spot doesn't seem like merely "kind of" cheating. You didn't come anywhere close to addressing the problem that the original post was about.

>I actually did read your post [...]

>What you wrote was: [...]

>You didn't [...]

It's not my post.


Sorry!

Oh you're right. I read it a bit too quickly this morning and thought it had just done that initially to compare planet placement. Too bad.

The index_tiled.html version correctly positions the original assets, and to me looks as close as you can get to the screenshot while using the original assets (except for the red text).

The version with the screenshot as a background is where it was asked to create an exact match for screenshot that had been scaled/compressed, which isn't really possible any other way. The article acknowledges this one as cheating.

Better I think would've been to retake the screenshot without the scaling/compression, to see if it can create a site that is both an exact match and using the original assets.


If you look at the diff you'll see that all the planets are off too. So the OP mentioned the starfield but that doesn't explain the planets

Wasn’t the OG Space Jam website an image map? Not so different

I think it probably gets you 80% but the last 20% of pixel perfection seems to evade Claude. But I'm pretty new to writing prompts so if you can nail it let me know and I'll link you in the post.

Oh what the heck. That worked really well for you. Would you be willing to recreate all the html and push it up to github? I'll drop the repo at the top of the blog post. It would be really cool for me to see this completely done and a great way to finish out the blog post. I obviously couldn't do it.

I got pretty far with this initial prompt:

    spacejam-1996.png is a full screenshot of the Space Jam 1996
    landing page. We want to recreate this landing page as faithfully
    as possible, matching the screenshot exactly.

    The asset directory contains images extracted from the original
    site. One of the images is tiled as the background of the landing
    page. The other images should appear once in the screenshot. Use
    these images as assets.

    Precise positioning is very important for this project, so you
    should writing a script that finds the precise location of each
    asset image in screenshots. Use the tool to detect precise
    positions in the target and fine tune the generated webpage. Be
    sure to generate diagnostic images that can be easily reviewed by
    a human reviewer.

    Use python 3.13 and uv to create a venv while working.
I just let Claude (Opus 4.5) do anything it wanted to do as it went.

At this point all the image assets are pixel perfect but the footer is in the wrong place and I had to hold Claude's hand a bit to get the footer into the approximately correct spot:

    I noticed you were struggling to find the position of the footer
    text. You could try rendering two versions of the generated page, the
    second time with the footer text black. Subtracting those two images
    should give you a clean view of the footer text.
At this point Claude was having trouble because its hadn't got a clean view of the target text location in the original screenshot (it was creating scripts that look at the red channel in the bottom half of the image to pull out the text but that was also grabbing part of the site map logo. Interestingly it made a comment about this but didn't do anything about it). So I gave it this additional hint:

    You are getting confused with the site map when analyzing the
    original screenshot. You could blank out the positions of assets
    so that they are not interfering with your analysis.
This got the footer in the correct location but the fonts/font sizes etc are not correct yet.

> Interestingly it made a comment about this but didn't do anything about it

Classic.

This is awesome. Great work. Please follow up again if you happen to nail it.


It's now got everything close after adding this final prompt:

    We are very close. The footer is positioned in roughly the correct location
    but the fonts, font sizes, font color and line spacings are all slightly
    off.
This took quite a while and it build a few more tools to get there. And this was fine from a distance but it was using a san-serif when the screenshot has a serif etc. So I decided to push. From here it got very messy...

One of the issues is that Claude's text detection was getting tripped up by writing scripts using RGB space instead of something more hue-aware. It knew the text was red but was trying to isolate it by just looking at the red channel. But the grey dots from the background show up bright in the red channel so Claude would think those were center dots between the links that needed to be reproduced in the text. I gave it a hint:

    I think dots from the background image are causing issues. Are you detecting the text
    by looking only at the red channel in RGB space? The red channel will be bright on 
    white pixels in RGB. You could try using hue to separate text from background or use
    distance from the target RGB value.
Claude decided to switch to HSV space. But it took quite a bit of effort to keep Claude remembering to use HSV because tools it had already written were still RGB and not updated (as were intermediate images that were not updated). Then it would try to step back and get a big picture as a sanity check and "discover" it had missed the dots that are obviously there. And when you would tell it there are no dots, you get the "You're absolutely right! They're vertical bars!" So it was a struggle. This is the closest I got:

https://imgur.com/a/79Iv1jO

Again, the top image stuff was done in the first shot with the prompt in the first one. Everything else has been about the footer. Claude has been writing a lot of clever scripts to measure font metrics and pick fonts etc, but it keeps falling over those dots. I could probably get it to work better with adding directives for text handling to CLAUDE.md and nuking context and some of the scripts it created.


ha this is awesome. I'm going to link this comment in the post. Great work. lmk if you end up pushing it up to github.

This is really cool.

I learned recently that this is still how a lot of email html get generated.

Apparently Outlook (the actual one, not the recent pretender) still uses some ancient WordHTML version as the renderer, so there isn’t much choice.

Fun fact: until Office 2007, outlook used IE’s engine for rendering html.

Oh yeah, recently I had to update a newsletter design like that and older versions of outlook still didn’t render properly.

Yeah, still trying to build my intuition. Experiments/investigations like this help me. Any other blogs or experiments you'd suggest?

Asking your favorite LLM actually helps a lot. They generally are well trained on LLM papers unsurprisingly. In this case though it’s important to realize the LLM is incapable of seeing or hearing or reading. Everything has to be transformed into a vector space. Images are generally cut into patches (like 16x16) which are themselves transformed by several neural networks to convert them into a semantic space represented by the models parameters.

But this isn’t hugely different than your vision. You don’t see the pixel grid either. You have to use tools to measure things. You have the ability over time to iteratively interact with the image by perhaps counting grid lines but the LLM does not - it’s a one shot inference against this highly transformed image. They’ve gotten better at complex visual tasks including types of counting, but it’s not able to examine the image in any analytical way or even in its original representation. It’s just not possible.

It can however make tools that can. It’s very good at working with PIL and other image processing libraries or even writing image processing code de novo, and then using those to ground itself. Likewise it can not do math, but it can write a calculator that can do highly complex mathematics on its behalf.


Great, thanks for that suggestion!

Whoops, I'm very dumb. It's Opus 4.1. I updated the blog post and credited you for the correction. Thank you!

That model does not exist. Do you mean Opus 4.5?

> That model does not exist.

It does (unless the previous comment was edited? Currently it says Opus 4.1): https://www.anthropic.com/news/claude-opus-4-1. You can see it in the 'more models' list on the main Claude website, or in Claude Console.


yep, this is what I used.

Opus GPT 4.1 Pro Maverick DeepK2

Thanks, my friend. I added a strike through of the error, a correction, and credited you.

I'm keeping it in for now because have made some good jokes about the mistake in the comments and I want to keep that context.


I thought for sure I was going to see an image map when I looked at the source. Pleasant surprise!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: