logoalt Hacker News

ssk42yesterday at 11:12 PM1 replyview on HN

Fun to see you not on tildes.

Setting up a clean room is one of the only ways to do Evals on agentic harnesses. Especially prevalent with Windsurf which doesn’t have an easy CLI start.

So how? The easiest answer when allowed is docker. Literally new image per prompt. There’s also flags with Claude to not use memory and from there you can use -p to have it just be like a normal cli tool. Windsurf requires manual effort of starting it up in a new dir.


Replies

skybriantoday at 12:57 AM

Sounds interesting, but I'm not quite getting the relevance for people writing code with an agent. Should I be doing evals?

show 2 replies