If I'm reading this correctly, it's limited to browser use, not general computer use (eg, you won't be able to orchestrate KiCAD workflows with it). Not disparaging, just noticing the limitation.
I've been playing with the Qwen3-VL-30B model using Playwright to automate some common things I do in browsers, and the LLM does "reasonably well", in that it accelerates finding the right ways to wrangle a page with Playwright, but then you want to capture that in code anyway for repeated use.
I wonder how this compares -- supposedly purpose made for the task, but also significantly smaller.
Well, you could emulate things and run them in a browser via WASM. I think it's more of a security limitation than a model limitation. In the browser they get to lean on the sand boxing model.
This is in my area of interest. Can you recommend any related tools/resources? Did you publish any code?
Correct, this only works in the browser w/ Playwright as far as I can tell from a quick test.
> but then you want to capture that in code anyway for repeated use.
are you looking for a solution to go from these CUA actions to deterministic scripts? check out https://docs.stagehand.dev/v3/best-practices/caching