logoalt Hacker News

covilast Wednesday at 6:04 PM0 repliesview on HN

To massively increase the reliability to get GPUs, you can use something like SkyPilot (https://github.com/skypilot-org/skypilot) to fall back across regions, clouds, or GPU choices. E.g.,

$ sky launch --gpus H100

will fall back across GCP regions, AWS, your clusters, etc. There are options to say try either H100 or H200 or A100 or <insert>.

Essentially the way you deal with it is to increase the infra search space.