If you’re already using GCP, Vertex AI is pretty good. You can run lots of models on it:
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models. It’s easy with token based options because usually are a drop-in replacement for OpenAI API endpoints.
You have you rent a GPU instance if you want to run the latest or custom stuff, but if you just want to play around for a few hours it’s not unreasonable.
GCloud and Hyperbolic have been my go-to as well