The speed is ridiunkulous. No doubt. The quantization looks pretty severe, which could make the co...

boutell • today at 12:56 PM • 1 reply • view on HN

The speed is ridiunkulous. No doubt.

The quantization looks pretty severe, which could make the comparison chart misleading. But I tried a trick question suggested by Claude and got nearly identical results in regular ollama and with the chatbot. And quantization to 3 or 4 bits still would not get you that HOLY CRAP WTF speed on other hardware!

This is a very impressive proof of concept. If they can deliver that medium-sized model they're talking about... if they can mass produce these... I notice you can't order one, so far.

Replies

Normal_gaussian • today at 1:07 PM

I doubt many of us will be able to order one for a long while. There is a significant number of existing datacentre and enterprise use-cases that will pay a premium for this.

Additionally LLMs have been tested, found valuable in benchmarks, but not used for a large number of domains due to speed and cost limitations. These spaces will eat up these chips very quickly.

alt Hacker News

Replies