I have to think they need to know enough of the guides for the game for it to work out, how do they ...

giancarlostoro • yesterday at 3:37 PM • 1 reply • view on HN

I have to think they need to know enough of the guides for the game for it to work out, how do they know whats on screen?

Replies

soulofmischief • yesterday at 3:42 PM

In my project I rigged up an in-browser emulator and directly fed captured images of the screen to local multimodal models.

So it just looks right at what's going on, writes a description for refinement, and uses all of that to create and manage goals, write to a scratchpad and submit input. It's minimal scaffolding because I wanted to see what these raw models are capable of. Kind of a benchmark.

➕ show 1 reply

alt Hacker News

Replies