You should repeat this experiment but with progressively more detail in the initial prompt. Claude's secret sauce is taking weakly specified prompts and making passable things from them, but as the degrees of freedom in the prompt go down Claude starts to disobey while other models close in on the intent.
That is a great suggestion that I am definitely going to look into, thanks!