It's not possible with today's LLM models, but we are not wedded to the current architecture.
Realistically, we are.
This is not some arbitrary design choice, it's the core compromise to make LLMs viable to train at all.
Realistically, we are.
This is not some arbitrary design choice, it's the core compromise to make LLMs viable to train at all.