> Frankly, this is strictly a positive signal to me. How? > The reason you can't run ...

cyberax • yesterday at 1:17 AM • 1 reply • view on HN

> Frankly, this is strictly a positive signal to me.

How?

> The reason you can't run GPU workloads on top of fargate and lambda is because exposing physical 3rd-party hardware to untrusted customer code dramatically increases the startup and shutdown costs

This is BS. Both NVidia and AMD offer virtualization extensions. And even without that, they can simply power-cycle the GPUs after switching tenants.

Moreover, Fargate is used for long-running tasks, and it definitely can run on a regular Nitro stack. They absolutely can provide GPUs for them, but it likely requires a lot of internal work across teams to make it happen. So it doesn't happen.

I worked at AWS, in a team responsible for EC2 instance launching. So I know how it all works internally :)

Replies

nickysielicki • yesterday at 2:14 AM

You'd have to build totally separate datacenters with totally different hardware than what they have today. You're not thinking about the complexity introduced by the use of pcie switches. For starters, you don't have enough bandwidth to saturate all gpus concurrently, they're sharing pcie root complex bandwidth, which is a non-starter if you want to define any kind of reasonable SLA. You can't really enforce limits, either. Even if you're able to tolerate that and sell customers on it, the security side is worse. All customer GPU transactions would be traversing a shared switch fabric, which means noisy bursty neighbors, timing side-channels, etc., etc., etc.

➕ show 1 reply

alt Hacker News

Replies