The problem is it’s so rarely A/B tested, definitely not at scale. An engineer, who writes all these my-workflow-but-for-agents skills, proceeds to get the good outcome, while also seeing affirmations that the agent did follow the prescribed processes - that is considered a victory. In reality the outcome could’ve been just as good if they fed Claude a spec + acceptance criteria, or even a basic prompt for the simpler tasks.
Yeah, I Blind A/B test everything, and a lot.
But I don't expect anyone to every use my stuff. It's complicated as hell. But it's for me, and it works without me having to remotely think about the complexity.
I love that.