logoalt Hacker News

kirubakaranyesterday at 5:20 PM0 repliesview on HN

That proof only applies to fixed architecture feed forward multilayer perceptrons with no recurrence, iirc. Transformers are not that.