> Somehow Codex for me is always way worse than the base models.
I feel the same. CodexTheModel (why have two things named the same way?!) is a good deal faster than the other models, and probably on the "fast/accuracy" scale it sits somewhere else, but most code I want to be as high quality as possible, and the base models do seem better at that than CodexTheModel.
have you tried adjusting the reasoning effort?