Some people also call evaluations "tests". There are unexpected things that come along wit...

suttontom • yesterday at 3:23 PM • 0 replies • view on HN

Some people also call evaluations "tests". There are unexpected things that come along with new models, like the model in a workflow you'd set up suddenly starts calling a tool and never stops or decides to no longer call a particular tool, so running your existing evaluations to catch regressions like this and potentially updating the prompts is considered "testing" your prompts and harnesses.

alt Hacker News