logoalt Hacker News

crazygringoyesterday at 7:12 PM3 repliesview on HN

For average consumers, I think very much yes, and this is where OpenAI's brand recognition shines.

But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.

And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)


Replies

smashedyesterday at 7:32 PM

I haven't seen any LLM tech shine "where every detail matters".

In fact so far, they consistently fail in exactly these scenario, glossing over random important details whenever you double check results in depth.

You might have found models, prompts or workflows that work for you though, I'm interested.

bitpushyesterday at 7:27 PM

> OpenAI's brand recognition shines.

We've seen this movie before. Snapchat was the darling. Infact, it invented the entire category and was dominating the format for years. Then it ran out of time.

Now very few people use Snapchat, and it has been reduced to a footnote in history.

If you think I'm exaggerating, that just proves my point.

show 1 reply