That proof only applies to fixed architecture feed forward multilayer perceptrons with no recurrence...

kirubakaran • yesterday at 5:20 PM • 0 replies • view on HN

That proof only applies to fixed architecture feed forward multilayer perceptrons with no recurrence, iirc. Transformers are not that.

alt Hacker News