logoalt Hacker News

swingboytoday at 11:36 AM2 repliesview on HN

Do these scores actually mean anything? Isn’t the LLM just making up something? If you ran the exact same prompt through 10 times would you get those same scores every single time?


Replies

grey-areatoday at 11:54 AM

Yes I'd be interested in that answer too - these scores are most likely just generated in an arbitrary way, given how LLMs work. Given how they work in generating text it didn't actually keep a score and add to it each time it found a plus point in the skill as a human might in evaluating something.

At this point I'd discount most advice given by people using LLMs, because most of them don't recognise the inadequacies and failure modes of these machines (like the OP here) and just assume that because output is superficially convincing it is correct and based on something.

Do these skills meaningfully improve performance? Should we even need them when interacting with LLMs?

show 1 reply
crustycodertoday at 12:05 PM

No of course you wouldn't because LLMs are nondeterministic. But the scores would likely be in the same ballpark. The scores I posted are the result of a much more detailed analysis done by the LLM, which was far too long to post. I eyeballed it, most of the points seemed fair so I asked it to summarise and convert into scores.