But it’s irrelevant. 750 tokens/s on a full frontier model is useful. 15000 poor quality tokens is much less useful no matter how much scaffolding you put around it.
I think you missed the point and don't understand / aren't considerate of SLM utility.
[dead]
You are missing the point. This is a technology demonstration on prototype hardware, and no one intends it to be seriously useful.
Their architecture has fundamental speed and efficiency advantages over GPUs or Cerebras. They expect to scale up to real LLMs by splitting a model layer-wise across several chips, which they can do without incurring any throughput penalty.