logoalt Hacker News

skydhashtoday at 3:01 PM1 replyview on HN

> The same kind of checks that we use for people are needed for them

Those checks works for people because humans and most living beings respond well to rewards/punishment mechanisms. It’s the whole basis of society.

> not the same kind of checks we use for software.

We do have systems that are non deterministic (computer vision, various forecasting models…). We judge those by their accuracy and the likely of having false positive or false negatives (when it’s a classifier). Why not use those metrics?


Replies

wizzwizz4today at 3:48 PM

Because by those metrics, LLMs aren't very good.

LLM code completion compares unfavourably to the (heuristic, nigh-instant) picklist implementations we used to use, both at the low-level (how often does it autocomplete the right thing?) and at the high-level (despite many believing they're more effective, the average programmer is less effective when using AI tools). We need reasons to believe that LLMs are great and do all things, therefore we look for measurements that paint it in a good light (e.g. lines of code written, time to first working prototype, inclination to output Doom source code verbatim).

The reason we're all using (or pretending to use) LLMs now is not because they're good. It's almost entirely unrelated.