> There are plenty of 3rd party and big cloud options to run these models by the hour or token.
Which ones? I wanted to try a large base model for automated literature (fine-tuned models are a lot worse at it) but I couldn't find a provider which makes this easy.
have you checked OpenRouter if they offer any providers who serve the model you need?
If you’re already using GCP, Vertex AI is pretty good. You can run lots of models on it:
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models. It’s easy with token based options because usually are a drop-in replacement for OpenAI API endpoints.
You have you rent a GPU instance if you want to run the latest or custom stuff, but if you just want to play around for a few hours it’s not unreasonable.