LLMs also don't work by generating probability distributions of the next word. Your explanation isn't able to explain why they can generate words, let alone sentences.
That is exactly how they work.
That is exactly how they work.