I did it by making a huge database of allowlisted bash and having hooks check each one against the list. It makes a recursively parsed tree so it can handle gnarly blocks of bash. And then it outputs to the agent what failed and tells it to break it up next time. Then, in agent instructions, I impress on it strongly to use composable bash tools rather than trying to write python/ruby/perl scripts.
It was a bit of work, admittedly, but it's picked up a few users and I learned a lot from designing the research process and parsing the syntax trees.
I actually want to be alerted about everything that's not auto-approved, though. With safe commands auto-approved, it's much less noisy. I think it's important to read your code, as it develops, not just at the end, and understand what agents are doing.
This sounds like an interesting path. Wish I had time (instead of reading endless prompts and getting fatigued).