GPT 4 was rumoured/leaked to be 1.8T. Claude 3.5 Sonnet was supposedly 175B, so around 0.5T-1T ...

wongarsu • today at 9:24 AM • 0 replies • view on HN

GPT 4 was rumoured/leaked to be 1.8T. Claude 3.5 Sonnet was supposedly 175B, so around 0.5T-1T seems reasonable for Opus 3.5. Maybe a step up to 1-3T for Opus 4.0

Since then inference pricing for new models has come down a lot, despite increasing pressure to be profitable. Opus 4.6 costs 1/3rd what Opus 4.0 (and 3.5) costs, and GPT 5.4 1/4th what o1 costs. You could take that as indication that inference costs have also come done by at least that degree.

My guess would have been that current frontier models like Opus are in the realm of 1T params with 32B active

alt Hacker News