logoalt Hacker News

MattRogishyesterday at 4:36 PM2 repliesview on HN

I'm not saying they are not trying - I'm saying we're inventing new problems faster than any Lab can:

1) Identify the gaps

2) Determine how to fix them

3) Implement a fix (especially if that fix is: identify and find experts)

4) And judge the result

How do they know [person] is an expert in [some field]? How do they find that person? How many experts are necessary to give the right information? How do we evaluate the results, especially if it's novel?

You can find a lot of people who disagree on many topics, and those turtles go all the way down.

I'm not in disagreement that your work will help reduce hallucinations and improve model performance! It is.

I predict (I hope I'm wrong!) that we're going to hit some asymptote that is not at 0% hallucinations (and I would even put a substantial nonzero probability that "overall" hallucination rate bottoms out at some minimum and then slowly grows because we just can't keep up with the new garbage we throw at it).


Replies

srousseyyesterday at 5:48 PM

> How do they know [person] is an expert in [some field]? How do they find that person?

You just stumbled upon billion dollar businesses: Mercor, micro1, Scale AI, Surge AI, etc

jmalickiyesterday at 4:53 PM

> How do they know [person] is an expert in [some field]? How do they find that person?

They have a PhD from a top school, they are a licensed attorney, they are a licensed physician, a board certified cardiologist, etc.

They are constantly recruiting from these populations with well-paying side gigs.

> 4) And judge the result

That's what they pay the experts for. And to have experts review the other experts with peer review.

> You can find a lot of people who disagree on many topics, and those turtles go all the way down.

Which is why everything has to be well-calibrated and not just a hot take - a well reasoned opinion any expert would find fair.

Noone is really caring about hallucinations on point facts these days though, it is much more about complex reasoning tasks. Can they move the bar on the complexity of software LLMs do on their own? Can they get to a point where LLMs can begin to replace physicians? Financial advisors? Actuaries? etc.

show 3 replies