A useful feature would be slow-mode which gets low cost compute on spot pricing.
I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.
OpenAI offers that, or at least used to. You can batch all your inference and get much lower prices.
> I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.
If it's not time sensitive, why not just run it at on CPU/RAM rather than GPU.
https://platform.claude.com/docs/en/build-with-claude/batch-...
> The Batches API offers significant cost savings. All usage is charged at 50% of the standard API prices.