Absolute insanity to see a coherent text block that takes at least 2 minutes to read generated in a fraction of a second. Crazy stuff...
Not at all if you consider the internet pre-LLM. That is the standard expectation when you load a website.
The slow word-by-word typing was what we started to get used to with LLMs.
If these techniques get widespread, we may grow accustomed to the "old" speed again where content loads ~instantly.
Imagine a content forest like Wikipedia instantly generated like a Minecraft word...
Yes, but the quality of the output leaves to be desired. I just asked about some sports history and got a mix of correct information and totally made up nonsense. Not unexpected for an 8k model, but raises the question of what the use case is for such small models.
Accelerating the end of the usable text-based internet one chip at a time.