Title: Show HN: PageAgent, A GUI agent that lives inside your web app
Hi HN,
I'm building PageAgent, an open-source (MIT) library that embeds an AI agent directly into your frontend.
I built this because I believe there's a massive design space for deploying general agents natively inside the web apps we already use, rather than treating the web merely as a dumb target for isolated bots.
Currently, most AI agents operate from external clients or server-side programs, effectively leaving web development out of the AI ecosystem. I'm experimenting with an "inside-out" paradigm instead. By dropping the library into a page, you get a client-side agent that interacts natively with the live DOM tree and inherits the user's active session out of the box, which works perfectly for SPAs.
To handle cross-page tasks, I built an optional browser extension that acts as a "bridge". This allows the web-page agent to control the entire browser with explicit user authorization. Instead of a desktop app controlling your browser, your web app is empowered to act as a general agent that can navigate the broader web.
I'd love to start a conversation about the viability of this architecture, and what you all think about the future of in-app general agents. Happy to answer any questions!
This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:
- GitHub: https://github.com/alibaba/page-agent
- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)
- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...
I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!
> Data processed via servers in Mainland China
Appreciate the transparency, but maybe you could add some European (preferably) alternatives ?
I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?
The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?
Firefox support?
Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?
Very cool!
I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!
Curious - how does it perform with captchas and other "are you human" stuff on the web?
Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?
Confusing name because of the existence of pageant, the putty agent.
Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,
> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.
https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...
"Interesting architecture — embedding the agent inside the app context rather than outside it makes sense for session-aware tasks. One question: how do you handle output validation before the agent acts on the DOM? Client-side agents acting on live state without a certification layer seems like a reliability risk in production. We've been building ARU (aru-runtime.com) as a runtime certification layer for exactly this — curious if you've thought about that boundary."