You could have a multi agent harness that constraints each agent role with only the needed capabilities. If the agent reads untrusted input, it can only run read only tools and communicate to to use. Or maybe have all the code running goin on a sandbox, and then if needed, user can make the important decision of effecting the real world.
Yes, agree with the general idea: permissions are fine-grained and adaptive based on what the agent has done.
IFC + object-capabilities are the natural generalization of exactly what you're describing.
A system that tracks the integrity of each agent and knows as soon as it is tainted seems the right approach.
With forking of LLM state you can maintain multiple states with different levels of trust and you can choose which leg gets removed depending on what task needs to be accomplished. I see it like a tree - always maintaining an untainted "trunk" that shoots of branches to do operations. Tainted branches are constrained to strict schemas for outputs, focused actions and limited tool sets.