It's the most overrated model there is. I do Elixir development primarily and the model sucks balls in comparison to Gemini and GPT-5x. But the Claude fanboys will swear by it and will attack you if you ever say even something remotely negative about their "god sent" model. It fails miserably even in basic chat and research contexts and constantly goes off track. I wired it up to fire up some tasks. It kept hallucinating and swearing it did when it didn't even attempt to. It was so unreliable I had to revert to Gemini.
It might simply be that it was not trained enough in Elixir RL environments compared to Gemini and gpt. I use it for both ts and python and it's certainly better than Gemini. For Codex, it depends on the task.