logoalt Hacker News

int32_64yesterday at 6:52 PM6 repliesview on HN

Is there a "good enough" endgame for LLMs and AI where benchmarks stop mattering because end users don't notice or care? In such a scenario brand would matter more than the best tech, and OpenAI is way out in front in brand recognition.


Replies

xbmcuseryesterday at 7:38 PM

Google biggest advantage over time will be costs. They have their own hardware which they can and will optimise for their LLMS. And Google has experience of getting market share over time by giving better results, performance or space. ie gmail vs hotmail/yahoo. Chrome vs IE/Firefox. So don't discount them if the quality is better they will get ahead over time.

show 1 reply
crazygringoyesterday at 7:12 PM

For average consumers, I think very much yes, and this is where OpenAI's brand recognition shines.

But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.

And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)

show 3 replies
rfw300yesterday at 7:12 PM

That might be true for a narrow definition of chatbots, but they aren't going to survive on name recognition if their models are inferior in the medium term. Right now, "agents" are only really useful for coding, but when they start to be adopted for more mainstream tasks, people will migrate to the tools that actually work first.

fullstickyesterday at 7:47 PM

I doubt anyone I know who is using llms outside of work knows that there are benchmark tests for these models.

holleryesterday at 6:57 PM

this. I don't know any non-tech people who use anything other than chatgpt. On a similar note, I've wondered why Amazon doesn't make a chatgpt-like app with their latest Alexa+ makeover, seems like a missed opportunity. The Alexa app has a feature to talk to the LLM in chat mode, but the overall app is geared towards managing devices.

show 3 replies
jay_kyburzyesterday at 7:09 PM

This is why both google and microsoft are pushing Gemini and Copilot in everyone's face.