Something I'm desperately keen to see is AI-assisted accessibility testing.
I'm not convinced at all by most of the heuristic-driven ARIA scanning tools. I don't want to know if my app appears to have the right ARIA attributes set - I want to know if my features work for screenreader users.
What I really want is for a Claude Code style agent to be able to drive my application in an automated fashion via a screenreader and record audio for me of successful or failed attempts to achieve goals.
Think Playwright browser tests but for popular screenreaders instead.
Every now and then I check to see if this is a solved problem yet.
I think we are close. https://www.guidepup.dev/ looks extremely promising - though I think it only supports VoiceOver on macOS or NVDA on Windows, which is a shame since asynchronous coding agent tools like Codex CLI and Claude Code for web only run Linux.
What I haven't seen yet is someone closing the loop on ensuring agentic tools like Claude Code can successfully drive these mechanisms.
There are thousands of blind people on the net. Can't you hire one of them to test for you? Please?
I’m doing a PoC at work with Workback.ai, which is essentially what you’re asking about. So far it’s early but it seems ok at first brush. We have a firm we pay for traditional accessibility assessments, remediation, and VPATs and my expectation is that the AI tooling does not replace them due to how business needs and product design interact with accessibility.
I.e. ChatGPT and Cursor can probably remediate adding screen reader support for a solving a Captia for the blind, but do you want to really do that? There’s likely a better design for the blind.
Either way, I agree. This is a big area where there can be real impact in the industry. So far we’ve gotten scans back in record time compared to human in the loop scans.
A more viable path might actually be agentic testing via agents that simply use a browser or screen reader that can work off high level test scenarios.
I've done some UI testing via the agent mode in chat gpt and I got some pretty decent feedback out of that. I've been trying to do more of that.
Accessibility testing might require a bit more additional tooling than comes with chat gpt by default. But otherwise, this could work.
Can screen readers emit their narration as text instead of / in addition to audio?
Guidepup also includes a Virtual Screenreader[1].
Rather than improving testing for fallible accessibility assists, why not leverage AI to eliminate the need for them? An agent on your device can interpret the same page a sighted or otherwise unimpaired person would giving you as a disabled user the same experience they would have. Why would that not be preferable? It also puts you in control of how you want that agent to interpret pages.
There might be a great use case here, but the economics make me nervous. Don't the same problems apply here for why we don't have great accessibility? Who is paying for it? How do I justify the investment (AI or not) to management?
... I just saw Guidepup has an official GitHub Actions setup action, so that's great news! https://github.com/guidepup/setup-action
On macOS it can record audio too.
The agent-driving-a-screenreader footgun is that it's quite easy to build UI that creates a reasonable screereader UX, but unintentionally creates accessibility barriers for other assistive technologies, like voice control.
Ex: a search control is built as <div aria-label="Search"><input type="search"/></div>. An agent driving a screenreader is trying to accomplish a goal that requires search. Perhaps it tries using Tab to explore, in which case it will hear "Search, edit blank" when the input is focused. Great! It moves on.
But voice control users can't say "click Search". That only works if "Search" is in the control's accessible name, which is still blank, the outer div's aria-label has no effect on components it wraps. Would an agent catch that nuance? Would you?
You could realign to "I want to know if my features work for screenreader, voice control, switch, keyboard, mobile keyboard [...] users", but you can imagine the inverse, where an improvement to the voice control UX unintentionally degrades the screenreader UX. Accessibility is full of these tensions, I worry an multi-agent approach would result in either the agents or you getting bogged down by them.
I think a solution needs to incorporate some heuristics, if you consider WCAG a heuristic. For all its faults, a lot of thought went into rules that balance the tensions reasonably well. I used to be more of the "forget WCAG, just show me the UX" attitude, but over the years I've come to appreciate the baseline it sets. To the example above, 2.5.3 Label in Name clearly guides you towards setting an accessible name (not description) on the search itself, patching up support for screenreaders and voice control.
Not that WCAG concerns itself with the details of ARIA (it packs all that complexity into the vague "accessibility supported"[1]). We do need more seamless ways of quickly evaluating whether ARIA or whatever markup pattern has the intended rendering in screen readers, voice control, etc, but at a level that's already constrained. In the example, WCAG should apply its constraints first. Only then should we start running real screen readers and analyzing their audio, and to avoid the footguns that analysis should be at a low level ("does the audio rendering of this control reasonably convey the expected name, state, value, etc?"), not a high level ("does the audio rendering contain the info necessary to move to the next step?").
Unfortunately both agents and heuristic-driven accessibility scanning tools struggle to apply WCAG today. Agents can go deeper than scanners, but they're inconsistent and in my experience really have trouble keeping 55+ high level rules in mind all the time. In the example, an agent would need to use the accessibility tree to accomplish its goal and need to reject a node with label "Search" containing a role=textbox as an option for searching (or at least flag it), which is made trickier by the fact that sometimes it _is_ ok to use container labels to understand context.
I think the answer might be to bundle a lot of those concerns into an E2E test framework, have the agent write a test it thinks can accomplish the goal, and enter a debug loop to shake out issues progressively. Ex: if the agent needs to select the search control for the task, and the only allowed selector syntax requires specifying the control's accessible name (i.e. Playwright's getByRole()), will the agent correctly get blocked? And say the control does have an accessible name, but has some ARIA misconfigurations — can the test framework automatically run against some real screenreaders and report an issue that things aren't working as expected? We've done some explorations on the framework side you might be interested in [2].
[1] https://www.w3.org/TR/WCAG22/#dfn-accessibility-supported [2] https://assistivlabs.com/articles/automating-screen-readers-...
What about just AI assisted accessibility? Like stop requiring apps to do anything at all. The AI visually parses the app UI for the user, explains it, and interacts.
Accessible is an also-have at best for the vast majority of software. This would open a lot more software to blind users than is currently available.
can we just hire disabled people as testers please
Not a joke. If truly you want a properly functioning website for blind/bad sight/ Step 1 would probably be to put on a blindfold and go through your website with a screenreader (cmd + f5 on a mac).