I think the skepticism here is that without tests or a _lot_ of manual QA how would you know that it did it correctly?
Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.
Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.
If you’re porting some formulas from one language to another, “correct” can be defined as “gets the same answers as before.” Assuming you can run both easily, this is easy to write a property test for.
Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with.