The real problem this thread exposes is that we're duct-taping browser automation (Playwright, CDP, MCP wrappers) onto an interface designed for humans — the DOM. Every approach discussed here is fighting the same battle: too many tokens to represent page state, flaky selectors, hallucinated DOM structures, massive context cost.
What we actually need is a standard for websites to expose a machine-readable interaction layer alongside the human one. Something like robots.txt but for agent capabilities — declaring available actions, their parameters, authentication requirements, and response schemas. Not scraping the DOM and hoping the AI figures out which button to click.
The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces. A lightweight standard — call it agents.json or whatever — where sites declare "here are the actions you can take, here are the endpoints, here's the auth flow" would eliminate 90% of the token waste, security concerns, and fragility discussed in this thread.
Until that exists, we'll keep building increasingly clever hacks on top of a 30-year-old document format that was never designed for machine consumption.
I feel like the fact tha HTML is end result is exactly why the Web is so successful. Yes, structured APIs sound great, until you realize the API owners will never give you the data you actually want via their APIs. This is why HTML has done so well. Why extensions exist. And why it's better for browser automation.
They’re trying to solve it by making it easier to get Markdown versions of websites.
For example, you can get a markdown out of most OpenAI documentation by appending .md like this: https://developers.openai.com/api/docs/libraries.md
Not definitive, but still useful.
Fully agree. Will take some time though as immediate incentive not clear for consumer facing companies to do extra work to help ppl bypass website layer. But I think consumers will begin to demand it, once they experience it through their agent. Eg pizza company A exposes an api alongside website and pizza company B doesn’t, and consumer notices their agent is 10x+ faster interacting with company A and begins to question why.
Is this just a well-documented API?
> interface designed for humans — the DOM.
Citation needed.
> The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces.
To me, sites that "only have human interfaces" are more likely that not be that way totally on purpose, attempting to maximize human retention/engagement and are more likely to require strict anti-bot measures like Proof-of-Work to be usable at all.
> expose a machine-readable interaction layer alongside the human one
Which is called ARIA and has been a thing forever.
> What we actually need is a standard for websites to expose a machine-readable interaction layer alongside the human one.
We had this 20 years ago with the Semantic Web movement, XHTML, and microformats. Sadly, it didn't pan out for various reasons, most of them non-technical. There's remnants of it today with RSS feeds, which is either unsupported or badly supported by most web sites.
Once advertising became the dominant business model on the web, it wasn't in publishers' interest to provide a machine-readable format of their content. Adtech corporations took control of the web, and here we are. Nowadays even API access is tightly controlled (see Reddit, Twitter, etc.).
So your idea will never pan out in practice. We'll have to continue to rely on hacks and scraping will continue to be a gray area. These new tools make automated scraping easier, for better or worse, but publishers will find new ways to mitigate it. And so it goes.
Besides, if these new tools are "superintelligent", surely they're able to navigate a web site. Captchas are broken and bot detection algorithms (or "AI" themselves) are unreliable. So I'd say the leverage is on the consumer side, for now.
The ultimate conflict of interest here is that the sites people want to crawl the most are the ones that want to be crawled by machines the least (e.g. Youtube). So people will end up emulating genuine human users one way or another.