This is <1 tok/s for the 40GB model.
Come on, "Run" is not the right word. "Crawl" is.
Headlines like that are misleading.
Could still be useful; maybe for overnight async workloads? Tell your agent research xyz at night and wake up to a report.
Yes, and with virtually zero context, which makes an enormous difference for TTFT on the MoE models.
Could still be useful; maybe for overnight async workloads? Tell your agent research xyz at night and wake up to a report.