But it’s irrelevant. 750 tokens/s on a full frontier model is useful. 15000 poor quality tokens...

Kirby64 • yesterday at 10:01 PM • 3 replies • view on HN

But it’s irrelevant. 750 tokens/s on a full frontier model is useful. 15000 poor quality tokens is much less useful no matter how much scaffolding you put around it.

Replies

Legend2440 • yesterday at 10:36 PM

You are missing the point. This is a technology demonstration on prototype hardware, and no one intends it to be seriously useful.

Their architecture has fundamental speed and efficiency advantages over GPUs or Cerebras. They expect to scale up to real LLMs by splitting a model layer-wise across several chips, which they can do without incurring any throughput penalty.

➕ show 1 reply

windexh8er • yesterday at 10:11 PM

I think you missed the point and don't understand / aren't considerate of SLM utility.

➕ show 1 reply

huflungdung • yesterday at 11:18 PM

[dead]

alt Hacker News

Replies