Not the author, but their description implies that they are running more than one stream per GPU.
So you can basically spin off a few GPUs as a baseline, allocate streams to them then boot up a new GPU when existing GPUs get overwhelmed.
Does not look very different than standard cloud compute management. I’m not saying it’s easy, but definitely not rocket science either.