Very cool, I see that "Deploy your finetunes, custom LoRAs, or any open-source model on our fleet." is "Book a call" -- any sense of what pricing will actually look like here, since this seems like it's kind of where your approach wins out, the ability to swap in custom model easier/cheaper?
Just curious how close we are to a world where I can fine tune for my (low volume calls) domain and then get it hosted. Right now this is not practical anywhere I've seen, at the volumes I would be doing it at (which are really hobby level).
We usually charge by GPU hour for those finetunes, around 8-10 dollars depending on GPU type and volume! This is similar to Modal, but since the engine is fully ours, you don't wait ~1 min for cold starts. Ideally, we will make onboarding super frictionless and self serve, but onboarding people manually for now.