> I don't think there's any fundamental difference in the principle of their operation

red75prime • last Sunday at 10:27 PM • 0 replies • view on HN

Yeah, they seem to be a subject to the universal approximation theorem (it needs to be checked more thoroughly, but I think we can build a transformer that is equivalent to any given fully-connected multilayered network).

That is at a certain size they can do anything a human can do at a certain point in their life (that is with no additional training) regardless of whether humans have world models and what those model are on the neuronal level.

But there are additional nuances that are related to their architectures and training regimes. And practical questions of the required size.

alt Hacker News