logoalt Hacker News

hackgicianyesterday at 9:24 PM2 repliesview on HN

accessibility (a11y) trees are super helpful for LLMs; we use them extensively in stagehand! the context is nice for browsers, since you have existing frameworks like selenium/playwright/puppeteer for actually acting on nodes in the a11y tree.

what does that analog look like in more traditional computer use?


Replies

ctothyesterday at 9:34 PM

There are a variety of accessibility frameworks from MSAA (old, windows-only) IA2, JAB, UIA (newer). NVDA from NV Access has an abstraction over these APIs to standardize gathering roles and other information from the matrix of a11y providers, though note the GPL license depending on how you want to use it.

show 1 reply
eduntemanyesterday at 11:39 PM

another currently unanswered question in Muscle Mem is how to more cleanly express the targeting of named entities.

Currently, a user would have to explicitly have their @engine.tool call take an element ID as an argument to a click_element_by_name(id) in order for it to be reused. This works, but for Muscle Mem would lead to the codebase getting littered with hyper-specific functions that are there just to differentiate tools for Muscle Mem, which goes against the agent agnostic thesis of the project.

Still figuring out how to do this.