This is fascinating to me. I completely believe you and I will not bother you with all the common "but did you try to tell it this or that" responses, but this is such a different experience from mine. I did the exact same task with claude in the Julia language last week, and everything worked perfectly. I am now in the habit of adding "keep it simple, use only public interfaces, do not use internals, be elegant and extremely minimal in your changes" to all my requests or SKILL.md or AGENTS.md files (because of the occasional failure like the one you described). But generally speaking, such complete failures have been so very rare for me, that it is amazing to see that others have had such a completely different experience.
You say it doesn't fail but you also mention all these work around you know and try...sounds like it fails a lot but your tolerance is different.
It's almost like.... LLMs are non-deterministic and hallucinating... Oh wait?!
My experience with working with AI agents is that they can be verbose and do things that are too over complicated by default. Unless directed explicitly. Which may be the reason for this discrepancy.