logoalt Hacker News

ExtremisAndyyesterday at 12:11 PM1 replyview on HN

See, this, to me, seems obvious, but I’m sure it’s more challenging/complex than I can imagine (I am NOT an expert on AI in any way imaginable). But there has to be a solution. Just yesterday I was asking Gemini to tell me about a certain college professor, and it gave me a list of facts about them. And it was perfect. Then, out of curiosity, I followed up with “tell me more about him!” and it spit out several more bits of information about this person that were entirely hallucinated (e.g., gave them credit for writing papers they didn’t write, said they won awards that actually someone else won). I know this is all complex and certainly beyond my limited skill set, but goodness, we’ve got to get this figured out with so many people depending on and trusting these things nowadays. It’s quite scary.


Replies

embedding-shapeyesterday at 12:42 PM

I bet most of these issues are essentially system prompt/harness issues.

If your example had "Validate any details before sharing them with the user, with multiple sources" as the system prompt, it was using a model that is strong at following system prompts precisely and had access to some basic tools, then it'd spend maybe minutes more, but the answer would have been way more accurate.

But no, Google want "the new search results" (LLM hallucinations) to be on top, so we end up with "sounds plausible" answers instead "Collection of evidence from reliable/semi-reliable" or similar, which sucks. We could have quality, but it's too expensive/slow, so we get slop instead, just to maximize for speed and convenience.

show 1 reply