> That said, faster inference can't come soon enough. why is that? technical limits? I kno...

behnamoh • yesterday at 10:56 PM • 0 replies • view on HN

> That said, faster inference can't come soon enough.

why is that? technical limits? I know cerebras struggles with compute and they stopped their coding plan (sold out!). their arch also hasn't been used with large models like gpt-5.2. the largest they support (if not quantized) is glm 4.7 which is <500B params.

alt Hacker News