The pay-per-use API sucks. If you end up on the $50/mo plan, it's better, with caveats:
1 million tokens per minute, 24 million tokens per day. BUT: cached tokens count full, so if you have 100,000 tokens of context you can burn a minute of tokens in a few requests.
It’s wild that cached tokens count full - what’s in it for you to care about caching at all then? Is the processing speed gain significant?