I'm building something that fixes this exact problem[1].
The landing page doesn't advertise it yet, but essentially, I give agents a small set of tools to explore apps' surfaces, and then an API over common macOS functions, especially those related to accessibility.
The agent explores the app, then writes a repeatable workflow for it. Then it can run that workflow through CLI: `invoke chrome pinTab`
Why accessibility? Well, turns out that it's just a good DOM in general. It's structure for apps. Not all apps implement it perfectly, but enough do to make it wildly useful.
[1] https://getinvoke.com - note that the landing page is targeted towards creatives right now and doesn't talk about this use case yet
If agents is what it finally takes to get good a11y I'll take it. I'll bitch about it, but I'll take it.
This is a good solution, instead of everyone blowing tokens on repeating the same computer use task, come up with a way to share the workflows. I think you'd need to make sure there aren't workflows shared that extract user information (passwords).
Does https://github.com/webmachinelearning/webmcp overlap ?
Isn't that basically what browser base does. I've found the hardest part of browser use to be stealth first then client change management then browser comprehension (which gets better with every new model).
If you're on macOS and interested in this space, I highly recommend you open up the system-provided Accessibility Inspector.app and play around with apps and browsers. See how the green cells might guide an LLM to only need to read/OCR specific parts of a screen, how much text is already natively available to the accessibility engine, and how this could lead to really effective hybrid systems - not just MCPs, but code generators that can build and run their own scripts to crawl your accessibility hierarchy for your workflow!
I think this is very fertile ground - big labs need to use approaches that can work on multiple platforms and arbitrary workflows, and full-page vision is the lowest common denominator. Platform-specific approaches are a really exciting open space!