Here's OpenAI's docs page on how they sandbox Codex: https://developers.openai.com/codex/security/
Here's the macOS kernel-enforced sandbox profile that gets applied to processes spawned by the LLM: https://github.com/openai/codex/blob/main/codex-rs/core/src/...
I think skepticism is healthy here, but there's no need to just guess.
If I'm following this it means you need to audit all code that the llm writes though as anything you run from another terminal window will be run as you with full permissions.
That still doesn't seem ideal. Run the LLM itself in a kernel-enforced sandbox, lest it find ways to exploit vulnerabilities in its own code.