logoalt Hacker News

lionkorlast Friday at 1:08 PM2 repliesview on HN

I feel like using an LLM for this is not a good fit, because it's super difficult to verify whether the knowledge it found is true or made up. LLMs are much better at coming to a conclusion when a human wouldn't be sure at all, and that seems really important here.


Replies

yencabulatorlast Friday at 5:34 PM

In this case, you verify whether the knowledge was made up by comparing the virtual waiter behaviour to the actual waiter. Having a strong test suite like that is actually the ideal scenario for agentic development.

(It still incredibly hard to pull off for real, because of complex stateful protocols and edge cases around timing and transfer sizes. Samba did take 12 years to develop, so even with LLM help you'd probably still be looking at several years.)

doodlesdevlast Friday at 1:14 PM

I guess the LLM doesn't need to verify whether what it found is true or made up, but rather just save the request and answer for later, so it can be reviewed by a developer and documented.