This is an interesting paper and I like this kind of mechanistic interpretability work - but I canno...

jebarker • yesterday at 4:43 PM • 1 reply • view on HN

This is an interesting paper and I like this kind of mechanistic interpretability work - but I cannot figure out how the paper title "Transformers know more than they can tell" relates to the actual content. In this case what is it that they know and can't tell?

Replies

godelski • yesterday at 9:23 PM

I believe it's a reference to the paper "Language Models (Mostly) Know What They Know".

There's definitely some link but I'd need to give this paper a good read and refresh on the other to see how strong. But I think your final sentence strengthens my suspicion

https://arxiv.org/abs/2207.05221

alt Hacker News

Replies