The NLA also hallucinates, so it's still not revealing the models actual "thoughts" o...

phire • yesterday at 11:49 PM • 0 replies • view on HN

The NLA also hallucinates, so it's still not revealing the models actual "thoughts" of the model; The paper also points out that since the NLA is a full LLM, it can make inferences that aren't actually in the activations.

But it's a useful approximation for auditing.

alt Hacker News