logoalt Hacker News

sothatsittoday at 12:57 AM2 repliesview on HN

I tend to think that the reason people over-index on complex use-cases for LLMs is actually reliability, not a lack of interest in boring projects.

If an LLM can solve a complex problem 50% of the time, then that is still very valuable. But if you are writing a system of small LLMs doing small tasks, then even 1% error rates can compound into highly unreliable systems when stacked together.

The cost of LLMs occasionally giving you wrong answers is worth it for answers to harder tasks, in a way that it is not worth it for smaller tasks. For those smaller tasks, usually you can get much closer to 100% reliability, and more importantly much greater predictability, with hand-engineered code. This makes it much harder to find areas where small LLMs can add value for small boring tasks. Better auto-complete is the only real-world example I can think of.


Replies

a_bonobotoday at 1:40 AM

>If an LLM can solve a complex problem 50% of the time, then that is still very valuable

I'd adjust that statement - If an LLM can solve a complex problem 50% of the time and I can evaluate correctness of the output, then that is still very valuable. I've seen too many people blindly pass on LLM output - for a short while it was a trend in the scientific literature to have LLMs evaluate output of other LLMs? Who knows how correct that was. Luckily that has ended.

show 2 replies
raincoletoday at 6:58 AM

Yeah. Is it even proven that LLMs don't hallucinate for smaller tasks? The author seems to imply that. I fail to see how it could be true.