they dont verify, they just present a LLM app and the user suffers if the information is not correct. however most of the time it is correct but sometimes it is not. one way to verify correctness is to ask a bigger model like OpenAI o3
I’m not sure how the solution to “LLMs lie” is “more LLM”. I’ve personally had o3 tell me stuff like: “I ran this query against version 0.10 of DuckDB and verified that it works” when the query contains functions that don’t exist in DuckDB or “this version gets better PageSpeed Insights results” when I know it can’t check those. It happens surprisingly often and it’s super obvious. However, it’s made me seriously wary of the information that it gives which I can’t verify purely based on logic.
I’m not sure how the solution to “LLMs lie” is “more LLM”. I’ve personally had o3 tell me stuff like: “I ran this query against version 0.10 of DuckDB and verified that it works” when the query contains functions that don’t exist in DuckDB or “this version gets better PageSpeed Insights results” when I know it can’t check those. It happens surprisingly often and it’s super obvious. However, it’s made me seriously wary of the information that it gives which I can’t verify purely based on logic.