logoalt Hacker News

zozbot234yesterday at 9:18 PM5 repliesview on HN

Isn't "computer use" just interaction with a shell-like environment, which is routine for current agents?


Replies

vineyardmikeyesterday at 9:29 PM

No.

Computer use (to anthropic, as in the article) is an LLM controlling a computer via a video feed of the display, and controlling it with the mouse and keyboard.

show 3 replies
michaeltyesterday at 9:30 PM

> Almost every organization has software it can’t easily automate: specialized systems and tools built before modern interfaces like APIs existed. [...]

> hundreds of tasks across real software (Chrome, LibreOffice, VS Code, and more) running on a simulated computer. There are no special APIs or purpose-built connectors; the model sees the computer and interacts with it in much the same way a person would: clicking a (virtual) mouse and typing on a (virtual) keyboard.

https://www.anthropic.com/news/claude-sonnet-4-6

jpalepuyesterday at 9:34 PM

Interesting question! In this context, "computer use" means the model is manipulating a full graphical interface, using a virtual mouse and keyboard to interact with applications (like Chrome or LibreOffice), rather than simply operating in a shell environment.

show 1 reply
zmmmmmyesterday at 9:30 PM

No their definition of "computer use" now means:

> where the model interacts with the GUI (graphical userinterface) directly.

lukevyesterday at 10:08 PM

This is being downvoted but it shouldn't be.

If the ultimate goal is having a LLM control a computer, round-tripping through a UX designed for bipedal bags of meat with weird jelly-filled optical sensors is wildly inefficient.

Just stay in the computer! You're already there! Vision-driven computer use is a dead end.

show 3 replies