afaik there's somewhat painful economics. Not sure back-of-napkin but something like:
• 150-500B: Sonnet
• 0.9-2T: Opus
• 3-5T/10T: Fable / Mythos
So if bigger model is "smarter" but you effectively wind up with a "shared hosting" model where a coherent inherence node(s) that cost $2m or something can run max 10x customer workloads simultaneously ... not sure what that can be priced at.If it turns out a $10m/10x shared node can host even smarter models, then what?
Fable/Mythos are based on the same model. Not totally clear whether they have identical weights (just different external guardrails), or there's also some slight finetuning difference.