Not the author, but their description implies that they are running more than one stream per GPU. ...

whiplash451 • 10/01/2024 • 0 replies • view on HN

Not the author, but their description implies that they are running more than one stream per GPU.

So you can basically spin off a few GPUs as a baseline, allocate streams to them then boot up a new GPU when existing GPUs get overwhelmed.

Does not look very different than standard cloud compute management. I’m not saying it’s easy, but definitely not rocket science either.

alt Hacker News