In the paper 'Hopfield networks are all you need', they calculate the total number of things able to be 'stored' in the attention layers, and it's exponential in the number of parameters. So essentially, you can store more 'ideas' in an LLM than there are particles in the universe. I think we'll be good.
From a technical perspective, this is due to the softmax activation function that causes high degrees of separation between memory points.
In the paper 'Hopfield networks are all you need', they calculate the total number of things able to be 'stored' in the attention layers, and it's exponential in the number of parameters. So essentially, you can store more 'ideas' in an LLM than there are particles in the universe. I think we'll be good.
From a technical perspective, this is due to the softmax activation function that causes high degrees of separation between memory points.