logoalt Hacker News

A4ET8a8uTh0_v2today at 12:16 AM4 repliesview on HN

Looking at the table, I will admit that I don't get most of the use cases ( maybe with exception of comparison shopping ( gather info ), but are people really 'outsourcing' shopping? Am I really that much outside what 'normal' consumers do these days?

Task Segment Tasks SoM GPT-4o-0513 SoM o3-mini SoM GPT-4o GLM-4.1V-9B OAI Comp-Use UI-TARS-1.5 Fara-7B Single-Site Tasks Shopping 56 62.5 71.4 38.1 31.0 42.3 41.1 52.4 Flights 51 60.1 39.2 11.1 10.5 17.6 10.5 37.9 Hotels 52 68.6 56.4 31.4 19.9 26.9 35.3 53.8 Restaurants 52 67.9 59.6 47.4 32.1 35.9 22.4 47.4 Activities 80 70.4 62.9 41.7 26.3 30.4 9.6 36.3 Ticketing 57 58.5 56.7 37.4 35.7 49.7 30.4 38.6 Real Estate 48 34.0 17.4 20.1 16.0 9.0 9.7 23.6 Jobs/Careers 50 49.3 44.0 32.7 22.7 20.7 20.7 28.0 Multi-Step Tasks Shopping List (2 items) 51 66.0 62.7 17.0 7.8 34.0 20.9 49.0 Comparison Shopping 57 67.3 59.1 27.5 22.8 1.2 8.8 32.7 Compositional Tasks 55 51.5 39.4 26.7 17.0 10.3 9.1 23.0 Overall


Replies

tyretoday at 5:43 AM

Not necessarily consumers. Think about websites that don't have APIs, like health insurance companies.

PunchyHamstertoday at 12:56 PM

LLM getting a bunch of products out of a category and generating summary for me seeems like pretty useful task

doug_durhamtoday at 1:03 AM

I can't imagine having an AI agent book anything our purchase anything in the same way that I wouldn't have someone I don't know personally do that for me. It should do the research and take me to the place where I need to take over.

m00xtoday at 4:38 AM

I use AI to shop for wine at my local stores for me.