logoalt Hacker News

antvesyesterday at 5:40 PM0 repliesview on HN

I think one main point is that not all "computer use" is the same, the harness and agentic experience matters a lot. A poorly designed API experience can actually be _less_ efficient than a well designed browser or computer use experience

In particular, the vision-based approach used in the evaluation has clear limitations with regard to efficiency due to its nature (small observation window, heterogeneous modality)

At Smooth we use an hybrid DOM/vision approach and we index very strongly on small models. An interesting fact is that UIs are generally designed to minimize ambiguity and supply all and only the necessary context as token-efficient as possible, and the UX is cabled up to abstract the APIs in well-understood interface patterns e.g. dropdowns or autocompletes. This makes navigation easier and that's why small models can do it, which is another dimension that must be considered

We typically recommend using APIs/MCP where available and well designed, but it's genuinely surprising how token-efficient agentic browser navigation can actually be