With C++ formatting is optional. A better test case for LLMs is Python where indention specifies code blocks. Even ChatGPT 3.5 got the formatting for Python and YAML correct - now the actual code back then was often hilariously wrong.
I can't even get Github Copilot's plugin to avoid randomly trashing files with a Zero No width break space at the beginning, let alone follow formatting rules consistently...
A quick search finds many COBOL checkers. I’d be very surprised if a modern model was not able to fix its own mistakes if connected to a checker tool. Yes, it may not be able to one shot it perfectly, but if it can quickly call a tool once and it “works”, does it really matter much in the end? (Maybe it matters from a cost perspective, but I’m just referring to it solving the problem you asked it to solve.)
Clearly it isn’t just “broken” for everyone, “Claude Code modernizes a legacy COBOL codebase”, from Anthropic:
https://youtu.be/OwMu0pyYZBc