I'm sorry but the input to a model is a sequence of tokens and the output is a probability distribution of what's the most likely next token. It's a very very very fancy next token predictor but that is fundamentally what it is. I'm making the argument that this paradigm might not give rise to a general intelligence no matter how much you scale it.
It's a very very very fancy next token predictor
Yes, and unless you are prepared to rebut the argument with evidence of the supernatural, that's all there is, period. That's all we are.
So tired of the thought-terminating "stochastic parrot" argument.