At scale (like comma.ai), it's probably cheaper. But until then it's a long term cost optimization with really high upfront capital expenditure and risk. Which means it doesn't make much sense for the majority of startup companies until they become late stage and their hosting cost actually becomes a big cost burden.
There are in between solutions. Renting bare metal instead of renting virtual machines can be quite nice. I've done that via Hetzner some years ago. You pay just about the same but you get a lot more performance for the same money. This is great if you actually need that performance.
People obsess about hardware but there's also the software side to consider. For smaller companies, operations/devops people are usually more expensive than the resources they manage. The cost to optimize is that cost. The hosting cost usually is a rounding error on the staffing cost. And on top of that the amount of responsibilities increases as soon as you own the hardware. You need to service it, monitor it, replace it when it fails, make sure those fans don't get jammed by dust puppies, deal with outages when they happen, etc. All the stuff that you pay cloud providers to do for you now becomes your problem. And it has a non zero cost.
The right mindset for hosting cost is to think of it in FTEs (full time employee cost for a year). If it's below 1 (most startups until they are well into scale up territory), you are doing great. Most of the optimizations you are going to get are going to cost you in actual FTEs spent doing that work. 1 FTE pays for quite a bit of hosting. Think 10K per month in AWS cost. A good ops person/developer is more expensive than that. My company runs at about 1K per month (GCP and misc managed services). It would be the wrong thing to optimize for us. It's not worth spending any amount of time on for me. I literally have more valuable things to do.
This flips when you start getting into the multiple FTEs per month in cost for just the hosting. At that point you probably have additional cost measured in 5-10 FTE in staffing anyway to babysit all of that. So now you can talk about trading off some hosting FTEs for modest amount of extra staffing FTEs and make net gains.
> At scale (like comma.ai), it's probably cheaper. But until then it's a long term cost optimization with really high upfront capital expenditure and risk. Which means it doesn't make much sense for the majority of startup companies until they become late stage and their hosting cost actually becomes a big cost burden.
You rent a dataspace, which is OPEX not CAPEX, and you just lease the servers, which turns big CAPEX into monthly OPEX bill
Running your own DC is "we have two dozen racks of servers" endeavour, but even just renting DC space and buying servers is much cheaper than getting same level of performance from the cloud.
> This flips when you start getting into the multiple FTEs per month in cost for just the hosting. At that point you probably have additional cost measured in 5-10 FTE in staffing anyway to babysit all of that. So now you can talk about trading off some hosting FTEs for modest amount of extra staffing FTEs and make net gains.
YOU NEED THOSE PEOPLE TO MANAGE CLOUD TOO. That's what always get ignore in calculations, people go "oh, but we really need like 2-3 ops people to cover datacenter and have shifts on the on-call", but you need same thing for cloud too, it is just dumped on programmers/devops guys in the team rather than having separate staff.
We have few racks and the part related to hardware is small part of total workload, most of it is same as we would (and do for few cloud customers) in cloud, writing manifests for automation.
To be fair, I think people are vastly over estimating the work they would have and the power they would need. Yes, if you have to massively scale up, then it'll take some work, but most of it is one-time work. You do it, and when it runs, you only have a fraction of work over the next months to maintain it. And with fraction, I mean below 5%. And keep in mind that >99% of startups who think of "yeah we need this and that cloud, because we need to scale" will never scale. Instead they are happily locking themselves into a cloud service. And if they actually scale at some point, this service will be massively more expensive.
Your calculation assumes that an FTE is needed to maintain a few beefy servers.
Once they are up and running that employee is spending at most a few hours a month on them. Maybe even a few hours every six months.
OTOH you are specifically ignoring that you'll require mostly the same time from a cloud trained person if you're all-in on AWS.
I expect the marginal cost of one employee over the other is zero.
> it doesn't make much sense for the majority of startup companies until they become late stage
Here's what TFA says about this:
> Cloud companies generally make onboarding very easy, and offboarding very difficult. If you are not vigilant you will sleepwalk into a situation of high cloud costs and no way out.
and I think they're right. Be careful how you start because you may be stuck in the initial situation for a long time.
And not just any FTEs, probably few senior / staff level engineers who would cost a lot more.
You should keep in mind that for a lot of things you can use a servicing contract, rather than hiring full-time employees.
It's typically going to cost significantly less; it can make a lot of sense for small companies, especially.
> But until then it's a long term cost optimization with really high upfront capital expenditure and risk.
The upfront capex does not need to be that high, unless you're running your own AI models. Other than leasing new ones, as a sibling comment stated, you can buy used. You can get a solid Dell 2U with a full service contract (3 years) for ~$5-10K depending on CPU / memory / storage configuration. Or if you don't mind going older - because honestly, most webapps aren't doing anything compute-heavy - you can drop that to < $1K/node. Replacement parts for those are cheap, so buy an extra of everything.