What can it actually run? The fact their benchmark plot refers to Llama 3.1 8b signals to me that it's hand implemented for that model and likely can't run newer / larger models. Why else would you benchmark such an outdated model? Show me a benchmark for gpt-oss-120b or something similar to that.
Looking at their blog, they in fact ran gpt-oss-120b: https://furiosa.ai/blog/serving-gpt-oss-120b-at-5-8-ms-tpot-...
I think Llama 3 focus mostly reflects demand. It may be hard to believe, but many people aren't even aware gpt-oss exists.
The fact that so many people are focusing solely on massive LLM models is an oversight by people that narrowly focusing on a tiny (but very lucrative) subdomain of AI applications.