Looking at the "Decide when to use fast mode", it seems the future they want is:
- Long running autonomous agents and background tasks use regular processing.
- "Human in the loop" scenarios use fast mode.
Which makes perfect sense, but the question is - does the billing also make sense?
The billing doesn't even make sense for Opus at the API prices, the sub is the killer.
It'll be a Cadillac offering for whales. People who care about value will just run stuff in parallel.