Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC

176 points • by embedding-shape • yesterday at 1:13 PM • 90 comments • view on HN

Comments

This is a notably better demonstration of a coding agent generated browser than Cursor's FastRender - it's a fraction of the size (20,000 lines of Rust compared to ~1.6m), uses way fewer dependencies (just system libraries for rendering images and text) and the code is actually quite readable - here's the flexbox implementation, for example: https://github.com/embedding-shapes/one-agent-one-browser/bl...

Here's my own screenshot of it rendering my blog - https://bsky.app/profile/simonwillison.net/post/3mdg2oo6bms2... - it handles the layout and CSS gradiants really well, renders the SVG feed icon but fails to render a PNG image.

I thought "build a browser that renders HTML+CSS" was the perfect task for demonstrating a massively parallel agent setup because it couldn't be productively achieved in a few thousand lines of code by a single coding agent. Turns out I was wrong!

➕ show 4 replies

embedding-shape • yesterday at 1:44 PM

I set some rules for myself: three days of total time, no 3rd party Rust crates, allowed to use commonly available OS libraries, has to support X11/Windows/macOS and can render some websites.

After three days, I have it working with around 20K LOC, whereas ~14K is the browser engine itself + X11, then 6K is just Windows+macOS support.

Source code + CI built binaries are available here if you wanna try it out: https://github.com/embedding-shapes/one-agent-one-browser

➕ show 3 replies

mwcampbell • yesterday at 7:43 PM

Impressive work.

I wonder if you've looked into what it would take to implement accessibility while maintaining your no-Rust-dependencies rule. On Windows and macOS, it's straightforward enough to implement UI Automation and the Cocoa NSAccessibility protocols respectively. On Unix/X11, as I see it, your options are:

1. Implement AT-SPI with a new from-scratch D-Bus implementation.

2. Implement AT-SPI with one of the D-Bus C libraries (GLib, libdbus, or sdbus).

3. Use GTK, or maybe Qt.

sosodev • yesterday at 9:40 PM

The browser works shockingly well considering it was created in 72 hours. It can render Wikipedia well enough to read and browse articles. With some basic form handling and browser standards (url bar, history, bookmarks, etc) it would be a viable way to consume text based content.

➕ show 1 reply

jacquesm • yesterday at 5:11 PM

This post is far more interesting than many others on the same subject, not because of what is built but because of how it it is built. There is a ton of noise on this subject and most of it seems to focus on the thing - or even on the author - rather than on the process, the constraints and the outcome.

➕ show 1 reply

QuadmasterXLII • yesterday at 8:00 PM

The rendering is pretty chaotic when I tried it- not that far off from just the text in the html tags, in some size, color, and placement on the screen. This sounds like unfairness, but there is some motte-and-bailey where if you claim to be a browser, I get to evaluate on stuff like links being consistently blue and underlined ( as is, they are sometimes blue and sometimes underlined, without a clear pattern- if they were never formatted differently from standard text, I would just buy this as a feature not implemented yet). It may be that some of the rendering is not supported on windows- the back button certainly isn't. I guess if I want to make my criticism actually legitimate I should make a "one human and no agent browser" post that just regexes out stuff that looks like content and formats it at random. The binary I downloaded definitely overperforms at the hacker news homepage and simonw's blog.

➕ show 1 reply

happytoexplain • yesterday at 9:07 PM

What kind of time frame do you ballpark this would have taken you on your own?

I know it's a little apples-and-oranges (you and the agent wouldn't produce the exact same thing), but I'm not asking because I'm interested in the man-hour savings. Rather, I want to get a perspective on what kind of expertise went into the guidance (without having to read all the guidance and be familiar with browser implementation myself). "How long this would have taken the author" seems like one possible proxy for "how much pre-existing experience went into this agent's guidance".

➕ show 2 replies

fabrice_d • yesterday at 10:24 PM

This is a cool project, and to render Simon's blog will likely become the #1 goal of AI produced "web browsers".

But we're very far from a browser here, so that's not that impressive. Writing a basic renderer is really not that hard, and matches the effort and low LoC from that experiment. This is similar to countless graphical toolkits that have been written since the 70s.

I know Servo has a "no AI contribution" policy, but I still would be more impressed by a Servo fork that gets missing APIs implemented by an AI, with WPT tests passing etc. It's a lot less marketable I guess. Go add something like WebTransport for instance, it's a recent API so the spec should be properly written and there's a good test suite.

rahimnathwani • yesterday at 4:49 PM

This is awesome. Would you be willing to share more about your prompts? I'm particularly interested in how you prompted it to get the first few things working.

➕ show 1 reply

hedgehog • yesterday at 9:18 PM

This looks pretty solid. I think you can make this process more efficient by decomposing the problem into layers that are more easily testable, e.g. testing topological relationships of DOM elements after parse, then spatial after layout, then eventually pixels on things like ACID2 or whatever the modern equivalent is. The models can often come up with tests more accurately than they get the code right the first time. There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.

➕ show 1 reply

pulkas • yesterday at 8:37 PM

The Mythical Man-Month, revisited

avmich • yesterday at 8:50 PM

Next thing would probably be an OS. With different APIs, the browser could be not constrained by existing standards. Generation of a good set of applications making working in the OS convenient - starting with GNU set? And then we can approach CPU architecture - again, without constraint to existing languages or instruction sets. That should be interesting to play with.

➕ show 1 reply

forgotpwd16 • yesterday at 8:47 PM

Impressive. Very nice. (Let's see Paul Allen's browser. /s) Can say is Brooks's law in action. What one human and one agent can do in 3d, one human and hundreds of agents can do in few weeks. A modern retake of the old joke.

>without using any 3rd party libraries

Seems to be an easier for coding agents to implement from scratch over using libraries.

barredo • yesterday at 9:18 PM

The binaries are only around 1 MB for Linux, Mac and Windows. Very impressive https://github.com/embedding-shapes/one-agent-one-browser/re...

➕ show 2 replies

storystarling • yesterday at 8:09 PM

How did you handle the context window for 20k lines? I assume you aren't feeding the whole codebase in every time given the API costs. I've struggled to keep agents coherent on larger projects without blowing the budget, so I'm curious if you used a specific scoping strategy here.

➕ show 3 replies

rvz • yesterday at 7:35 PM

> I'm going to upgrade my prediction for 2029: I think we're going to get a production-grade web browser built by a small team using AI assistance by then.

That is Ladybird Browser if that was not already obvious.

➕ show 2 replies

tonyhart7 • yesterday at 11:55 PM

>one human

>one agent

>one browser

>one million nvidia gpu

➕ show 1 reply

deadbabe • yesterday at 11:45 PM

This is not that impressive, there are numerous examples of browsers for training data to reference.

➕ show 3 replies

Imustaskforhelp • yesterday at 8:40 PM

I feel like I have talked to Embedding-shape on Hackernews quite a lot that I recognize him. So it was a proud like moment when I saw his hackernews & github comments on a youtube video [0]about the recent cursor thing

It's great to see him make this. I didn't know that he had a blog but looks good to me. Bookmarked now.

I feel like although Cursor burned 5 million$, we saw that and now Embedding shapes takeaway

If one person with one agent can produce equal or better results than "hundreds of agents for weeks", then the answer to the question: "Can we scale autonomous coding by throwing more agents at a problem?", probably has a more pessimistic answer than some expected.

Effectively to me this feels like answering the query which was being what if we have thousands of AI agents who can build a complex project autonomously with no Human. That idea seems dead now. Humans being in the loop will have a much higher productivity and end result.

I feel like the lure behind the Cursor project was to find if its able to replace humans completely in a extremely large project and the answer's right now no (and I have a feeling [bias?] that the answer's gonna stay that way)

Emsh I have a question tho, can you tell me about your background if possible? Have you been involved in browser development or any related endeavours or was this a first new one for you? From what I can feel/have talked with you, I do feel like the answer's yes that you have worked in browser space but I am still curious to know the answer.

A question which is coming to my mind is how much would be the difference between 1 expert human 1 agent and 1 (non expert) say Junior dev human 1 agent and 1 completely non expert say a normal person/less techie person 1 agent go?

What are your guys prediction on it?

How would the economics of becoming an "expert" or becoming a jack of all trades (junior dev) in a field fare with this new technology/toy that we got.

how much productivity gains could be from 1 non expert -> junior dev and the same question for junior -> senior dev in this particular context

[0] Cursor Is Lying To Developers… : https://www.youtube.com/watch?v=U7s_CaI93Mo

➕ show 1 reply

TalkWithAI • today at 1:57 AM

[dead]

augusteo • yesterday at 8:11 PM

[flagged]

➕ show 3 replies

alt Hacker News

Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC

Comments