The speed is ridiunkulous. No doubt.
The quantization looks pretty severe, which could make the comparison chart misleading. But I tried a trick question suggested by Claude and got nearly identical results in regular ollama and with the chatbot. And quantization to 3 or 4 bits still would not get you that HOLY CRAP WTF speed on other hardware!
This is a very impressive proof of concept. If they can deliver that medium-sized model they're talking about... if they can mass produce these... I notice you can't order one, so far.
I doubt many of us will be able to order one for a long while. There is a significant number of existing datacentre and enterprise use-cases that will pay a premium for this.
Additionally LLMs have been tested, found valuable in benchmarks, but not used for a large number of domains due to speed and cost limitations. These spaces will eat up these chips very quickly.