I've had lots of success with generating coordinates and answering questions using the UI-TARS ...

withinrafael • yesterday at 9:39 PM • 1 reply • view on HN

I've had lots of success with generating coordinates and answering questions using the UI-TARS model https://github.com/bytedance/UI-TARS.

theturtletalks • today at 12:09 AM

I’d also checkout midscene, you can set the model and UI-TARS works but you can also use qwen vision models and it works.

alt Hacker News