Show HN: PageAgent, A GUI agent that lives inside your web app

62 points • by simon_luv_pho • today at 5:01 PM • 34 comments • view on HN

Title: Show HN: PageAgent, A GUI agent that lives inside your web app

Hi HN,

I'm building PageAgent, an open-source (MIT) library that embeds an AI agent directly into your frontend.

I built this because I believe there's a massive design space for deploying general agents natively inside the web apps we already use, rather than treating the web merely as a dumb target for isolated bots.

Currently, most AI agents operate from external clients or server-side programs, effectively leaving web development out of the AI ecosystem. I'm experimenting with an "inside-out" paradigm instead. By dropping the library into a page, you get a client-side agent that interacts natively with the live DOM tree and inherits the user's active session out of the box, which works perfectly for SPAs.

To handle cross-page tasks, I built an optional browser extension that acts as a "bridge". This allows the web-page agent to control the entire browser with explicit user authorization. Instead of a desktop app controlling your browser, your web app is empowered to act as a general agent that can navigate the broader web.

I'd love to start a conversation about the viability of this architecture, and what you all think about the future of in-app general agents. Happy to answer any questions!

Comments

moehj • today at 10:49 PM

"Interesting architecture — embedding the agent inside the app context rather than outside it makes sense for session-aware tasks. One question: how do you handle output validation before the agent acts on the DOM? Client-side agents acting on live state without a certification layer seems like a reliability risk in production. We've been building ARU (aru-runtime.com) as a runtime certification layer for exactly this — curious if you've thought about that boundary."

simon_luv_pho • today at 5:07 PM

This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:

- GitHub: https://github.com/alibaba/page-agent

- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)

- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...

I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!

➕ show 1 reply

mentalgear • today at 6:59 PM

> Data processed via servers in Mainland China

Appreciate the transparency, but maybe you could add some European (preferably) alternatives ?

➕ show 2 replies

general_reveal • today at 7:18 PM

I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?

The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?

➕ show 1 reply

jadbox • today at 10:21 PM

Firefox support?

dzink • today at 6:53 PM

Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?

➕ show 1 reply

pscanf • today at 5:59 PM

Very cool!

I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!

➕ show 2 replies

Mnexium • today at 7:21 PM

Curious - how does it perform with captchas and other "are you human" stuff on the web?

➕ show 1 reply

coreylane • today at 6:52 PM

Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?

➕ show 1 reply

MeteorMarc • today at 6:41 PM

Confusing name because of the existence of pageant, the putty agent.

➕ show 2 replies

popalchemist • today at 7:32 PM

Does it support long-click / click-and-drag?

➕ show 1 reply

jauntywundrkind • today at 5:43 PM

Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,

> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.

https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...

➕ show 2 replies

alt Hacker News

Show HN: PageAgent, A GUI agent that lives inside your web app

Comments