logoalt Hacker News

visargayesterday at 4:07 PM1 replyview on HN

Interesting, a computer use environment. I made a CUA benchmark too, 200 web tasks with internal code based evaluation. You can integrate them if you want.

https://github.com/UiPath/uipath_enterprise_benchmark

https://arxiv.org/abs/2511.17131


Replies

frabonacciyesterday at 4:39 PM

Hey visarga - I'm the founder of Cua, we might have met at the CUA ICML workshop? The OS-agnostic VNC approach of your benchmark is smart and would make integration easy. We're open to collaborating - want to shoot me an email at [email protected]?