LLMs don't need browser automation though. Multimodal models with vision input can operate a

Nextgrid • yesterday at 10:47 PM • 2 replies • view on HN

LLMs don't need browser automation though. Multimodal models with vision input can operate a real computer with "real" user inputs over USB, where the computer itself returns a real, plausible browser fingerprint because it is a real browser being operated by something that behaves humanly.

Replies

Ekaros • today at 7:22 AM

But will they behave like same user in past? I would guess there is lot of difference between how bot accesses page and real user has historically accessed them. Like opening multiple tabs at one time, possibly how long going through next set takes. How they navigate and so on.

There might be lot of modelling that could be done simply based on words used in searches and behaviour of opening pages. All trivially tracked to user's logged in session.

➕ show 1 reply

drum55 • today at 12:41 AM

Sure, the cost of that goes way up though, especially if it has to emulate real world inputs like a mouse, type in a way that’s plausible, and browse a website in a way that’s not always the direct happy path.

alt Hacker News

Replies