logoalt Hacker News

oxag3nlast Tuesday at 10:35 PM6 repliesview on HN

Where they'd get training data?

Source code generation is possible due to large training set and effort put into reinforcing better outcomes.

I suspect debugging is not that straightforward to LLM'ize.

It's a non-sequential interaction - when something happens, it's not necessarily caused the problem, timeline may be shuffled. LLM would need tons of examples where something happens in debugger or logs and associate it with another abstraction.

I was debugging something in gdb recently and it was a pretty challenging bug. Out of interest I tried chatgpt, and it was hopeless - try this, add this print etc. That's not how you debug multi-threaded and async code. When I found the root cause, I was analyzing how I did it and where did I learn that specific combination of techniques, each individually well documented, but never in combination - it was learning from other people and my own experience.


Replies

jimmaswelllast Tuesday at 10:37 PM

How long ago was this? I've had outstansingly impressive results asking Copilot Chat with Sonnet 4.5 or ChatGPT to debug difficult multithreaded C++.

show 1 reply
simonwlast Tuesday at 10:38 PM

Have you tried running gdb from a Claude Code or Codex CLI session?

show 2 replies
RA_Fisheryesterday at 12:26 AM

LLMs are okay at bisecting programs and identifying bugs in my experience. Sometimes they require guidance but often enough I can describe the symptom and they identify the code causing the issue (and recommend a fix). They’re fairly methodical, and often ask me to run diagnostic code (or do it themselves).

anon-3988last Tuesday at 11:47 PM

> I suspect debugging is not that straightforward to LLM'ize.

Debugging is not easy but there should be a lot of training corpus for "bug fixing" from all the commits that have ever existed.

christophiluslast Tuesday at 10:46 PM

Debugging has been excellent for me with Opus 4.5 and Claude Code.

fragmedelast Tuesday at 11:36 PM

> Where they'd get training data?

They generated it, and had a compiler compile it, and then had it examine the output. Rinse, repeat.