logoalt Hacker News

johngossmanyesterday at 5:54 PM3 repliesview on HN

This verification problem is general.

As an experiment, I had Claude Cowork write a history book. I chose as subject a biography of Paolo Sarpi, a Venetian thinker most active in the early 17th century. I chose the subject because I know something about him, but am far from expert, because many of the sources in Italian, in which I am a beginner, and because many of the sources are behind paywalls, which does not mean the AIs haven't been trained on them.

I prompted it to cite and footnote all sources, avoid plagiarism and AI-style writing. After 5 hours, it was finished (amusingly, it generated JavaScript and emitted a DOCX). And then I read the book. There was still a lingering jauntiness and breathlessness ("Paolo Sarpi was a pivotal figure in European history!") but various online checkers did not detect AI writing or plagiarism. I spot checked the footnotes and dates. But clearly this was a huge job, especially since I couldn't see behind the paywalls (if I worked for a Uni I probably could).

Finally, I used Gemini Deep Research to confirm the historical facts and that all the cited sources exist. Gemini thought it was all good.

But how do I know Gemini didn't hallucinate the same things Claude did?

Definitely an incredible research tool. If I were actually writing such a book, this would be a big start. But verification would still be a huge effort.


Replies

apical_dendriteyesterday at 6:20 PM

I used gemini to look up a relative with a connection to a famous event. The relative himself is obscure, but I have some of his writings and I've heard his story from other relatives. Gemini fabricated a completely false narrative about my relative that was much more exciting than what actually happened. I spent a bunch of time looking at the sources that Gemini supplied trying to verify things and although the sources were real, the story Gemini came up with was completely made up.

show 1 reply
hirvi74yesterday at 7:00 PM

I believe that, on a fundamental level, the principle of 'trust, but verify' can be followed to its logical endpoint, as covered in Ken Thompson's lecture, 'Reflections on Trusting Trust' [1]. At some point, one simply has to trust that something is correct, unless they have the capability to verify every step of a long chain of indirection.

So, in regard to your book: Claude may or may not have hallucinated the information from its cited sources. Gemini, as well. However, say you had access to the cited information behind a paywall. How would you go about verifying the information cited in those sources was correct?

Since the release of LLMs over the past four years or so, I have noticed a trend where people are (rightfully) hesitant to trust the output of LLMs. But if the knowledge is in a book or comes from another other man-made source, it's some how infallible? Such thinking reminds me of my primary schooling days. Teachers would not let us use Wikipedia as a source because, "Anyone can edit anything." Though, it's as one cannot write anything they want in a book -- be it true or false?

How many scientific researchers have p-hacked their research, falsified data, or used other methods of deceit? I do not believe it's a truly an issue on a grand scale nor does it make vast amounts of science illegitimate. When caught, the punishments are usually handled in a serious manner, but no telling how much falsified research was never caught.

I do believe any and all information provided by LLMs should be verified and not blindly trusted, however, I extend that same policy to works from my fellow humans. Of course, no one has the time to verify every single detail of every bit of information one comes across. Hence, at some point, we all must settle on trusting in trust. Knowledge that we cannot verify is not knowledge. It is faith.

[1] https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

show 1 reply
gowldyesterday at 6:46 PM

Before AI, the smartest human still had to pass the paywall to access paywalled content.

AI has exacerbated the Internet's "content must be free or else does not exist" trend.

It's just not interesting to challenge an AI to write professional research content without giving it access to research conetent. Without access, it's just going to paraphrase what's already available.