You'd have to build totally separate datacenters with totally different hardware than what they have today. You're not thinking about the complexity introduced by the use of pcie switches. For starters, you don't have enough bandwidth to saturate all gpus concurrently, they're sharing pcie root complex bandwidth, which is a non-starter if you want to define any kind of reasonable SLA. You can't really enforce limits, either. Even if you're able to tolerate that and sell customers on it, the security side is worse. All customer GPU transactions would be traversing a shared switch fabric, which means noisy bursty neighbors, timing side-channels, etc., etc., etc.
> You'd have to build totally separate datacenters with totally different hardware than what they have today.
No? You can reset GPUs with regular PCI-e commands.
> You can't really enforce limits, either. Even if you're able to tolerate that and sell customers on it, the security side is worse
Welp. AWS is already a totally insecure trash, it seems: https://aws.amazon.com/ec2/instance-types/g6e/ Good to know.
Not having GPUs on Fargate/Lambda is, at this point, just a sign of corporate impotence. They can't marshal internal teams to work together, so all they can do is a wrapper/router for AI models that a student can vibe-code in a month.
We're doing AI models for aerial imagery analysis, so we need to train and host very custom code. Right now, we have to use third-parties for that because AWS is way more expensive than the competition (e.g. https://lambda.ai/pricing ), _and_ it's harder to use. And yes, we spoke with the sales reps about private pricing offers.