for agent agents we have ACP [0] surely their time would be better spent builing this sort of abstraction for computer use then simple teaching an AI to use a mouse?
The computer UI is the way it is because that is optimal for humans, if your plan is to replace humans why not just replace the whole stack os and all to something these models already know how to use?