Where they'd get training data? Source code generation is possible due to large training set ...

oxag3n • last Tuesday at 10:35 PM • 6 replies • view on HN

Where they'd get training data?

Source code generation is possible due to large training set and effort put into reinforcing better outcomes.

I suspect debugging is not that straightforward to LLM'ize.

It's a non-sequential interaction - when something happens, it's not necessarily caused the problem, timeline may be shuffled. LLM would need tons of examples where something happens in debugger or logs and associate it with another abstraction.

I was debugging something in gdb recently and it was a pretty challenging bug. Out of interest I tried chatgpt, and it was hopeless - try this, add this print etc. That's not how you debug multi-threaded and async code. When I found the root cause, I was analyzing how I did it and where did I learn that specific combination of techniques, each individually well documented, but never in combination - it was learning from other people and my own experience.

Replies

jimmaswell • last Tuesday at 10:37 PM

How long ago was this? I've had outstansingly impressive results asking Copilot Chat with Sonnet 4.5 or ChatGPT to debug difficult multithreaded C++.

➕ show 1 reply

simonw • last Tuesday at 10:38 PM

Have you tried running gdb from a Claude Code or Codex CLI session?

➕ show 2 replies

RA_Fisher • yesterday at 12:26 AM

LLMs are okay at bisecting programs and identifying bugs in my experience. Sometimes they require guidance but often enough I can describe the symptom and they identify the code causing the issue (and recommend a fix). They’re fairly methodical, and often ask me to run diagnostic code (or do it themselves).

anon-3988 • last Tuesday at 11:47 PM

> I suspect debugging is not that straightforward to LLM'ize.

Debugging is not easy but there should be a lot of training corpus for "bug fixing" from all the commits that have ever existed.

christophilus • last Tuesday at 10:46 PM

Debugging has been excellent for me with Opus 4.5 and Claude Code.

fragmede • last Tuesday at 11:36 PM

> Where they'd get training data?

They generated it, and had a compiler compile it, and then had it examine the output. Rinse, repeat.

alt Hacker News

Replies