logoalt Hacker News

Kirby64yesterday at 10:01 PM3 repliesview on HN

But it’s irrelevant. 750 tokens/s on a full frontier model is useful. 15000 poor quality tokens is much less useful no matter how much scaffolding you put around it.


Replies

Legend2440yesterday at 10:36 PM

You are missing the point. This is a technology demonstration on prototype hardware, and no one intends it to be seriously useful.

Their architecture has fundamental speed and efficiency advantages over GPUs or Cerebras. They expect to scale up to real LLMs by splitting a model layer-wise across several chips, which they can do without incurring any throughput penalty.

show 1 reply
windexh8eryesterday at 10:11 PM

I think you missed the point and don't understand / aren't considerate of SLM utility.

show 1 reply
huflungdungyesterday at 11:18 PM

[dead]