logoalt Hacker News

jebarkeryesterday at 4:43 PM1 replyview on HN

This is an interesting paper and I like this kind of mechanistic interpretability work - but I cannot figure out how the paper title "Transformers know more than they can tell" relates to the actual content. In this case what is it that they know and can't tell?


Replies

godelskiyesterday at 9:23 PM

I believe it's a reference to the paper "Language Models (Mostly) Know What They Know".

There's definitely some link but I'd need to give this paper a good read and refresh on the other to see how strong. But I think your final sentence strengthens my suspicion

https://arxiv.org/abs/2207.05221