Cerebras is trialing Kimi K2.6 at 3000t/s (invite only). I'm excited for when the fast har...

scosman • today at 4:07 PM • 4 replies • view on HN

Cerebras is trialing Kimi K2.6 at 3000t/s (invite only). I'm excited for when the fast hardware gets more mainstream for frontier models. Models designed for speed on Nvidia are nice addition that could bridge the gap.

Replies

adrian_b • today at 5:16 PM

TFA mentions that until now special very expensive hardware like Cerebras was required for reaching this kind of speeds, and it emphasizes that what is novel in their results is that they have obtained over 1000 token/s for a model with over 1 T parameters by using just standard hardware, i.e. one server with 8 GPUs.

btian • today at 5:18 PM

Source? Their website says 1000t/s https://www.cerebras.ai/blog/which-is-faster-gemini-3-5-flas...

michael-ax • today at 4:41 PM

now that's what i call a software development breakthrough/platform! thanks for the heads up!

lostmsu • today at 4:38 PM

Cerebras currently does not provide any discounts for prefix caching making its use for agentic workloads sqr(n_turns) more expensive.

alt Hacker News

Replies