logoalt Hacker News

JohnLeitchtoday at 6:44 PM1 replyview on HN

The problem is hallucinations. It's incredibly frustrating to have an LLM describe an API or piece of functionality that fulfills all requirements perfectly, only to find it was a hallucination. They are impressive sometimes though. Recently I had an issue with a regression in some of our test capabilities after a pivot to Microsoft Orleans. After trying everything I could think of, I asked Sonnet 4.5, and it came up with a solution to a problem I could not even find described on the internet, let alone solved. That was quite impressive, but I almost gave up on it because it hallucinated wildly before and after the workable solution.

The same stuff happens when summarizing documentation. In that regard, I would say that, at best, modern LLMs are only good for finding an entrypoint into the docs.


Replies

MrDarcytoday at 7:14 PM

While my reply was snarky I am prepared to take a reasonable bet with a reasonable test case. And pay out.

Why I think I’d win the bet is I’m proficient with tcpdump and wireshark and I’m reasonably confident that running to a frontier model and dealing with any hallucinations is more efficient and faster than recalling the incantantions and parsing the output myself.