logoalt Hacker News

chmod775today at 6:29 AM1 replyview on HN

You're anthropomorphizing.

LLMs can't lie nor can they tell the truth. These concepts just don't apply to them.

They also cannot tell you what they were "thinking" when they wrote a piece of code. If you "ask" them what they were thinking, you just get a plausible response, not the "intention" that may or may not have existed in some abstract form in some layer when the system selected tokens*. That information is gone at that point and the LLM has no means to turn that information into something a human could understand anyways. They simply do not have what in a human might be called metacognition. For now. There's lots of ongoing experimental research in this direction though.

Chances are that when you ask an LLM about their output, you'll get the response of either someone who now recognized an issue with their work, or the likeness of someone who believes they did great work and is now defending it. Obviously this is based on the work itself being fed back through the context window, which will inform the response, and thus it may not be entirely useless, but... this is all very far removed from what a conscious being might explain about their thoughts.

The closest you can currently get to this is reading the "reasoning" tokens, though even those are just some selected system output that is then fed back to inform later output. There's nothing stopping the system from "reasoning" that it should say A, but then outputting B. Example: https://i.imgur.com/e8PX84Z.png

* One might say that the LLM itself always considers every possible token and assigns weights to them, so there wouldn't even be a single chain of thought in the first place. More like... every possible "thought" at the same time at varying intensities.


Replies

pipestoday at 7:15 AM

Or ask another model to tell you what the changes do.

show 2 replies