Where they'd get training data?
Source code generation is possible due to large training set and effort put into reinforcing better outcomes.
I suspect debugging is not that straightforward to LLM'ize.
It's a non-sequential interaction - when something happens, it's not necessarily caused the problem, timeline may be shuffled. LLM would need tons of examples where something happens in debugger or logs and associate it with another abstraction.
I was debugging something in gdb recently and it was a pretty challenging bug. Out of interest I tried chatgpt, and it was hopeless - try this, add this print etc. That's not how you debug multi-threaded and async code. When I found the root cause, I was analyzing how I did it and where did I learn that specific combination of techniques, each individually well documented, but never in combination - it was learning from other people and my own experience.
Have you tried running gdb from a Claude Code or Codex CLI session?
LLMs are okay at bisecting programs and identifying bugs in my experience. Sometimes they require guidance but often enough I can describe the symptom and they identify the code causing the issue (and recommend a fix). They’re fairly methodical, and often ask me to run diagnostic code (or do it themselves).
> I suspect debugging is not that straightforward to LLM'ize.
Debugging is not easy but there should be a lot of training corpus for "bug fixing" from all the commits that have ever existed.
Debugging has been excellent for me with Opus 4.5 and Claude Code.
> Where they'd get training data?
They generated it, and had a compiler compile it, and then had it examine the output. Rinse, repeat.
How long ago was this? I've had outstansingly impressive results asking Copilot Chat with Sonnet 4.5 or ChatGPT to debug difficult multithreaded C++.