logoalt Hacker News

whiplash45110/01/20240 repliesview on HN

Not the author, but their description implies that they are running more than one stream per GPU.

So you can basically spin off a few GPUs as a baseline, allocate streams to them then boot up a new GPU when existing GPUs get overwhelmed.

Does not look very different than standard cloud compute management. I’m not saying it’s easy, but definitely not rocket science either.