logoalt Hacker News

A2UI: A Protocol for Agent-Driven Interfaces

150 pointsby makeramenlast Tuesday at 9:16 AM67 commentsview on HN

Comments

codethieflast Tuesday at 11:04 AM

> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.

(emphasis mine)

Sounds like agents are suddenly able to do what developers have failed at for decades: Writing platform-independent UIs. Maybe this works for simple use cases but beyond that I'm skeptical.

show 5 replies
aweilast Tuesday at 3:53 PM

I see how useful a universal UI language working across platforms is, but when I look at some examples from this protocol, I have the feeling it will eventually converge to what we already have, html. Instead of making all platforms support this new universal markup language, why not make them support html, which some already do, and which llms are already trained on.

Some examples from the documentation: { "id": "settings-tabs", "component": { "Tabs": { "tabItems": [ {"title": {"literalString": "General"}, "child": "general-settings"}, {"title": {"literalString": "Privacy"}, "child": "privacy-settings"}, {"title": {"literalString": "Advanced"}, "child": "advanced-settings"} ] } } }

{ "id": "email-input", "component": { "TextField": { "label": {"literalString": "Email Address"}, "text": {"path": "/user/email"}, "textFieldType": "shortText" } } }

show 1 reply
skybrianyesterday at 7:41 AM

It seems like latency will be poor if you have to wait for a server-side round trip to an LLM to update the UI whenever you press a button?

In a context where you're chatting with an LLM, I suppose the user would expect some lag, but it would be unwelcome in regular apps.

This also means that a lot of other UI performance issues don't matter - form submission is going to be slow anyway, so just be transparent about the delay.

mbossielast Tuesday at 10:40 AM

So there's MCP-UI, OpenAI's ChatKit widgets and now Google's A2UI, that I know of. And probably some more...

How many more variants are we introducing to solve the same problem. Sounds like a lot of wasted manhours to me.

show 8 replies
pedrozieglast Tuesday at 12:04 PM

We’ve had variations of “JSON describes the screen, clients render it” for years; the hard parts weren’t the wire format, they were versioning components, debugging state when something breaks on a specific client, and not painting yourself into a corner with a too-clever layout DSL.

The genuinely interesting bit here is the security boundary: agents can only speak in terms of a vetted component catalog, and the client owns execution. If you get that right, you can swap the agent for a rules engine or a human operator and keep the same protocol. My guess is the spec that wins won’t be the one with the coolest demos, but the one boring enough that a product team can live with it for 5-10 years.

wongarsulast Tuesday at 11:36 AM

I wouldn't want this anywhere near production, but for rapid prototyping this seems great. People famously can't articulate what they want until they get to play around with it. This lets you skip right to the part where you realize they want something completely different from what was first described without having to build the first iteration by hand

show 1 reply
jy14898last Tuesday at 10:23 AM

I never want to unknowingly use an app that's driven this way.

However, I'm happy it's happening because you don't need an LLM to use the protocol.

tasoeurlast Tuesday at 10:19 AM

In an ideal world, people would be implementing UI/UX accessibility in the first place, and a lot of those problems would be solved in the first place. But one can also hope that having the motivation to get agents running on those things could actually bring a lot of accessibility features to newer apps.

qsortlast Tuesday at 10:22 AM

This is very interesting if used judiciously, I can see many use cases where I'd want interfaces to be drawn dynamically (e.g. charts for business intelligence.)

What scares me is that even without arbitrary code generation, there's the potential for hallucinations and prompt injection to hit hard if a solution like this isn't sandboxed properly. An automatically generated "confirm purchase" button like in the shown example is... probably something I'd not make entirely unsupervised just yet.

jadelcastillolast Tuesday at 5:46 PM

I think this is a good and pragmatic way to approach the use of LLM systems. By translating to an intermediate language, and then processing further symbolically. But probably you can be prompt injected also if you expose sensible "tools" to the LLM.

ceuklast Tuesday at 1:52 PM

A few days ago I was predicting to some colleagues a revival of ideas around "server-driven UI" (which never really seemed to catch on) in order to facilitate agentic UIs.

Feels good to have been on the money, but I'm also glad I didn't start a project only to be harpooned by Google straight away

show 1 reply
uptownhrlast Tuesday at 3:36 PM

My approach/prototype using XState with websockets from an MCP server https://github.com/uptownhr/mcp-agentic-ui

iristenteijelast Tuesday at 12:53 PM

I think ultimately GenUI can be integrated into apps more seamlessly, but even if today it's more in context of chat interfaces with prompts, I think it's clear that a wall of text isn't always the best UX/output and it's already a win.

oddrationalelast Tuesday at 3:46 PM

Seems similar to [Adaptive Cards](https://adaptivecards.io/). Both have a JSON-based UI builder system.

barbazoolast Tuesday at 3:19 PM

This sounds like a way to have the LLM client render dynamic UI. Is this for use during the chat session or yet another way to build actual applications?

show 1 reply
_pdp_last Tuesday at 11:45 AM

I am fan of using markdown to describe the UI.

It is simple, effective and feels more native to me than some rigid data structure designed for very specific use-cases that may not fit well into your own problem.

Honestly, we should think of Emacs when working with LLMs and kind of try to apply the same philosophy. I am not a fan of Emacs per-se but the parallels are there. Everything is a file and everything is a text in a buffer. The text can be rendered in various ways depending on the consumer.

This is also the philosophy that we use in our own product and it works remarkably well for diverse set of customers. I have not encountered anything that cannot be modelled in this way. It is simple, effective and it allows for a great degree of flexibility when things are not going as well as planned. It works well with streaming too (streaming parsers are not so difficult to do with simple text structures and we have been doing this for ages) and LLMs are trained very well how to produce this type of output - vs anything custom that has not been seen or adopted yet by anyone.

Besides, given that LLMs are getting good at coding and the browser can render iframes in seamless mode, a better and more flexible approach would be to use HTML, CSS and JavaScript instead of what Slack has been doing for ages with their block kit API which we know is very rigid and frustrating to work with. I get why you might want to have a data structures for UI in order to cover CLI tools as well but at the end of the day browsers and clis are completely different things and I don not believe you can meaningfully make it work for both of them unless you are also prepared to dumb it down and target only the lowest common dominator.

raybblast Tuesday at 10:45 AM

Is there a standard protocol for the way things like Cline sometimes give you multiple choice buttons to click on? Or how does that compare to something like this?

evalstatelast Tuesday at 11:05 AM

I quite like the look of this one - seems to fit somewhere between the rigid structure of MCP Elicitations and the freeform nature of MCP-UI/Skybridge.

zwaraglast Tuesday at 3:30 PM

Could this be the link that allows designers to design a UI in Figma and let an agent build it via A2UI?

mentalgearlast Tuesday at 12:29 PM

The way to do this would be to come together and design a common W3C-like standard.

verdvermlast Tuesday at 7:13 PM

Am I reading (7) of the data flow correctly?

1. Establish SSE connection

... user event

7. send updates over origin SSE connection

So the client is required to maintain an SSE capable connection for the entire chat session? What if my network drops or I switch to another agent?

Seems an onerous requirement to maintain a connection for the life-time of a session, which can span days (as some people have told us they have done with agents)

empath75last Tuesday at 2:30 PM

I couldn't get this to work with the default model because it's overloaded, but I tried flash-lite, which at least gave me a response, but it only presents an actual UI 1/3rd of the time that I tried the suggested questions in the demo, and otherwise it attempts to ask me a question which doesn't present a ui at all or even do anything in the app -- i had to look at the logs to see what it was trying to do.

nsonhalast Tuesday at 12:24 PM

What's agent/AI specific about this? Seems just backend-driven UI

lowsonglast Tuesday at 11:22 AM

> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.

Why the hell would anyone want this? Why on earth would you trust an LLM to output a UI? You're just asking for security bugs, UI impersonation attacks, terrible usability, and more. This is a nightmare.

show 1 reply
mannanjlast Tuesday at 4:58 PM

I want instead of being told “here’s what I think you want to see, now look at it”, “what do you want to see?” And be shown that.

Yes yes we claim the user doesn’t know what they want. I think that’s largely used as an excuse to avoid rethinking how things should meet the users needs and keep status quo where people are made to rely on systems and walled gardens. The goal of this article is UIs should work better for the user. What better way then to let them imagine (or even nudge them with example actions, buttons, text to click to render specific views) in the UI! I’ve been wanting to build something where I just ask in English from options I know I have or otherwise play and hit edges to discover what’s possible and not.

Anyone else thinking along this direction or think I’m missing something obvious here?

alexgotoilast Tuesday at 4:58 PM

So we're reinventing SOAP but for AI agents. Not saying that's bad - sometimes you need to remake old mistakes before you figure out what actually works.

The real question: do UIs even make sense for agents? Like the whole point of a UI is to expose functionality to humans with constraints (screens, mice, attention). Agents don't have those constraints. They can read JSON, call APIs directly, parse docs. Why are we building them middleware to click buttons?

I think this makes sense as a transition layer while we figure out what agent-native architecture looks like. But long-term it's probably training wheels.

Will include this in my https://hackernewsai.com/ newsletter.

show 1 reply