Fara-7B: An efficient agentic model for computer use

173 points • by maxloh • yesterday at 7:10 PM • 77 comments • view on HN

Comments

Buried the lead. Microsoft fine tuned Qwen2.5-VL-7B. That’s the big conversation starter here. Have any of the big providers done this before?

“The model is based on Qwen2.5-VL-7B and trained with supervised fine-tuning.”

sreejithr • today at 12:57 AM

Its just Qwen2.5-VL with a sticker on it. Chinese are leading now!

➕ show 2 replies

blutoot • today at 7:06 AM

Buried the lede - new benchmark for web tasks: https://huggingface.co/datasets/microsoft/WebTailBench

pogue • today at 12:35 AM

Why does Microsoft keep releasing models trained on synthetic data? Is it possible their contract with OpenAI won't let them do anything else?

I would think Microsoft, of all companies, would want to be working on their own LLM behind the scenes, even if they're relying on OpenAI for the bulk of their work.

Meta seems to be the only US company releasing big 'open source' models, while Chinese companies continue to release many completely open source LLMs.

➕ show 9 replies

btbuildem • today at 3:23 AM

If I'm reading this correctly, it's limited to browser use, not general computer use (eg, you won't be able to orchestrate KiCAD workflows with it). Not disparaging, just noticing the limitation.

I've been playing with the Qwen3-VL-30B model using Playwright to automate some common things I do in browsers, and the LLM does "reasonably well", in that it accelerates finding the right ways to wrangle a page with Playwright, but then you want to capture that in code anyway for repeated use.

I wonder how this compares -- supposedly purpose made for the task, but also significantly smaller.

➕ show 3 replies

bilekas • today at 4:28 PM

I don't understand the use case here.. We've had this kind of automation for years now without needing a heavy GPU and without risk of going rouge. The worst that might happen is an interface changes once every year or two and you need to update your scripts.

Microsoft so hell bent on throwing all of their AI-SH*T and seeing what sticks.

➕ show 1 reply

A4ET8a8uTh0_v2 • today at 12:16 AM

Looking at the table, I will admit that I don't get most of the use cases ( maybe with exception of comparison shopping ( gather info ), but are people really 'outsourcing' shopping? Am I really that much outside what 'normal' consumers do these days?

Task Segment Tasks SoM GPT-4o-0513 SoM o3-mini SoM GPT-4o GLM-4.1V-9B OAI Comp-Use UI-TARS-1.5 Fara-7B Single-Site Tasks Shopping 56 62.5 71.4 38.1 31.0 42.3 41.1 52.4 Flights 51 60.1 39.2 11.1 10.5 17.6 10.5 37.9 Hotels 52 68.6 56.4 31.4 19.9 26.9 35.3 53.8 Restaurants 52 67.9 59.6 47.4 32.1 35.9 22.4 47.4 Activities 80 70.4 62.9 41.7 26.3 30.4 9.6 36.3 Ticketing 57 58.5 56.7 37.4 35.7 49.7 30.4 38.6 Real Estate 48 34.0 17.4 20.1 16.0 9.0 9.7 23.6 Jobs/Careers 50 49.3 44.0 32.7 22.7 20.7 20.7 28.0 Multi-Step Tasks Shopping List (2 items) 51 66.0 62.7 17.0 7.8 34.0 20.9 49.0 Comparison Shopping 57 67.3 59.1 27.5 22.8 1.2 8.8 32.7 Compositional Tasks 55 51.5 39.4 26.7 17.0 10.3 9.1 23.0 Overall

➕ show 4 replies

codezero • yesterday at 10:54 PM

Are there any agentic models like this that would work for controlling input in arbitrary video games? I've been wanting to have an AI play Kerbal Space Program because I think it would just be pretty hilarious.

➕ show 4 replies

stan_kirdey • yesterday at 10:59 PM

* fine tuned Qwen-7B

➕ show 2 replies

titzer • today at 2:59 PM

I find it kind of hilarious that a 7 billion parameter AI model is necessary to automate the clicking of webpages. I mean, how broken is the software stack if we can't script things? We jumped the shark, clearly.

➕ show 3 replies

maartenh • yesterday at 11:09 PM

How much VRAM would this require, if I would want to run this locally?

I bought a 12GB Nvidia card a year ago. In general I'm having a hard time to find the actual required hardware specs for any self hosted AI model. Any tips/suggestions/recommended resources for that?

➕ show 6 replies

eisbaw • today at 1:02 PM

Seems like SoM GPT-4o is the one to beat. Also table and plot does not seem to agree

lemonish97 • today at 2:50 AM

It's great to see how we went from the first iteration of Claude Computer Use, to now being able to run it locally with just 7B params.

alwinaugustin • today at 12:28 PM

It is not working on my Mac Mini

ghrjfjfnnfn • today at 12:40 AM

Forgive me if I can't keep up with the latest AI bubble mania buzzwords, but what is "agentic" even supposed to mean? As far as I can tell it doesn't have a precise definition, and doesn't even sound like proper English.

➕ show 9 replies

Andrew-Tate • today at 1:51 PM

[dead]

alt Hacker News

Fara-7B: An efficient agentic model for computer use

Comments