The problem is it’s so rarely A/B tested, definitely not at scale. An engineer, who writes all ...

yks • today at 12:31 AM • 1 reply • view on HN

The problem is it’s so rarely A/B tested, definitely not at scale. An engineer, who writes all these my-workflow-but-for-agents skills, proceeds to get the good outcome, while also seeing affirmations that the agent did follow the prescribed processes - that is considered a victory. In reality the outcome could’ve been just as good if they fed Claude a spec + acceptance criteria, or even a basic prompt for the simpler tasks.

Replies

AndyNemmity • today at 1:41 AM

Yeah, I Blind A/B test everything, and a lot.

But I don't expect anyone to every use my stuff. It's complicated as hell. But it's for me, and it works without me having to remotely think about the complexity.

I love that.

alt Hacker News

Replies