the answer isn't sandbox everything, it's knowing which steps need AI judgment and which should be deterministic code. I lean towards the latter as much as possible