logoalt Hacker News

gcgbarbosatoday at 7:08 AM8 repliesview on HN

"the intelligence is clearly there"

I wonder if I am using the same models as everyone else. To me, LLMs still give good answers 80% of the time, but 20% it fails in such a miserable way that makes it obvious that the "intelligence" is not there.


Replies

kzrdudetoday at 11:00 AM

That's a better score than I'd give my own thinking.

coldteatoday at 7:35 AM

It might be extra demand for rigor that's not equally applied to humans. One could argue that other coders in our teams, or even ourselves, often fail in "a miserable way", say about 20% of the time. But we block this out, or consider it "regular functioning", or just a one-off based on something we got wrong, "just a try" we redo, etc.

But when an LLM does it on an area we know, we notice and suddenly it's too much.

show 3 replies
21asdffdsa12today at 7:18 AM

It really depends on the field you are in and the tasks you set and how much of it was in the training set? A webdeveloper will find it succeeding in all taks - while some c++ exotic physics simulation developer will find it lacking.

The "works for me" is telling more about the field of the LLM reviewer, then the LLM.

show 2 replies
hodgehog11today at 8:58 AM

I get about the same success rate with my problems (scientific computing usually), but they're often _much_ easier to check than to write, so an 80% success rate becomes game-changing.

scotty79today at 10:24 AM

GPT-5.5, 100% so far for all of my problems that actually have an anwser.

weird-eye-issuetoday at 10:39 AM

In my experience of hiring and managing people, I would have been very happy if they gave good answers or produced good results 80% of the time.