logoalt Hacker News

Transformers know more than they can tell: Learning the Collatz sequence

124 pointsby Xcelerate12/03/202544 commentsview on HN

Comments

jebarkeryesterday at 4:43 PM

This is an interesting paper and I like this kind of mechanistic interpretability work - but I cannot figure out how the paper title "Transformers know more than they can tell" relates to the actual content. In this case what is it that they know and can't tell?

show 1 reply
rikimaru0345yesterday at 2:40 PM

Ok, I've read the paper and now I wonder, why did they stop at the most interesting part?

They did all that work to figure out that learning "base conversion" is the difficult thing for transformers. Great! But then why not take that last remaining step to investigate why that specifically is hard for transformers? And how to modify the transformer architecture so that this becomes less hard / more natural / "intuitive" for the network to learn?

show 3 replies
niek_pasyesterday at 1:41 PM

Can someone ELI5 this for a non-mathematician?

show 3 replies
Onavoyesterday at 2:43 PM

Interesting, what about the old proof that neural networks can't model arbitrary length sine waves?

show 2 replies