logoalt Hacker News

notepad0x90yesterday at 1:44 PM1 replyview on HN

> This does not preclude reasoning.

It does not imply it either. to claim reasoning you need evidence. it needs to reliably NOT hallucinate results for simple conversations for example (if it has basic reasoning).

> Because I see them solve real debugging problems talking through the impact of code changes or lines all the time to find non-obvious errors with ordering and timing conditions on code they’ve never seen before.

Programming languages and how programs work are extensively and abundantly documented, solutions to problems and how to approach them,etc.. have been documented on the internet extensively. It takes all of that data and it completes the right text by taking the most correct path way based on your input. it does not actually take your code and debug it. it is the sheer volume of data it uses and the computational resources behind it that are making it hard to wrap your head around the difference between guessing and understanding. You too can look at enough stack overflow and (poorly) guess answers for questions without understanding anything about the topic and if you guess enough you'll get some right. LLMs are just optimized to get the amount of correct responses to be high.


Replies

IanCalyesterday at 8:14 PM

> It does not imply it either.

Right, it's irrelevant to the question of whether they can reason.

> to claim reasoning you need evidence

Frankly I have no idea what most people are talking about when they use the term and say these models can't do it. It seems to be a similarly hand-wavey exercise as when people talk about thinking or understanding.

> it needs to reliably NOT hallucinate results for simple conversations for example (if it has basic reasoning).

That's not something I commonly see in frontier models.

Again this doesn't seem related to reasoning. What we call hallucinations would be seen in something that could reason but had a fallible memory. I remember things incorrectly and I can reason.

> it does not actually take your code and debug it

It talks through the code (which it has not seen) and process step by step, can choose to add logging, run it, go through the logs, change what it thinks is happening and repeat. It can do this until it explains what is happening, creates test cases to show the problem and what triggers it, fixes it and shows the tests pass.

If that's not debugging the code I really don't know what to call it.