How would you evaluate it if the agent were not a fuzzy logic machine? The issue isnt the LLM, its...

JamesSwift • yesterday at 9:17 PM • 0 replies • view on HN

How would you evaluate it if the agent were not a fuzzy logic machine?

The issue isnt the LLM, its that verification is actually the hard part. In any case, its typically called “evals” and you can probably craft a test harness to evaluate these if you think about it hard enough

alt Hacker News