we have a big dependency on AI, both for developers (can survive without it, mostly habits) and internal workflows (very hard to go without it). So we decided to unplug from cloud AI, rent our own GPU and use an open model for both scenarios. We have been very happy with it so far, 60% cheaper and around 50% faster
why not an inbetween scenario like using a managed inference provider to host your own models?
Faster in what way? All the open models we have access to at work are very noticeably behind the frontier models to the point where it's usually faster to not use them at all.