In my org the teams doing agent engineering at scale are all on Codex using gpt-5.5. By scale I mean fully agent authored code workflows with long running / multi hour plans.