Isn't "computer use" just interaction with a shell-like environment, which is routine for current agents?
> Almost every organization has software it can’t easily automate: specialized systems and tools built before modern interfaces like APIs existed. [...]
> hundreds of tasks across real software (Chrome, LibreOffice, VS Code, and more) running on a simulated computer. There are no special APIs or purpose-built connectors; the model sees the computer and interacts with it in much the same way a person would: clicking a (virtual) mouse and typing on a (virtual) keyboard.
Interesting question! In this context, "computer use" means the model is manipulating a full graphical interface, using a virtual mouse and keyboard to interact with applications (like Chrome or LibreOffice), rather than simply operating in a shell environment.
No their definition of "computer use" now means:
> where the model interacts with the GUI (graphical userinterface) directly.
This is being downvoted but it shouldn't be.
If the ultimate goal is having a LLM control a computer, round-tripping through a UX designed for bipedal bags of meat with weird jelly-filled optical sensors is wildly inefficient.
Just stay in the computer! You're already there! Vision-driven computer use is a dead end.
No.
Computer use (to anthropic, as in the article) is an LLM controlling a computer via a video feed of the display, and controlling it with the mouse and keyboard.