If you were running a real business with these would the aim not be to overprovision and to setup auto scaling in such a way that you always have excess capacity?
That seems to be the gist of it. You cannot rely on serverless alone and you need one or many pre-warmed instances at all times. This distinction is rarely mentioned in serverless GPU spaces yet has been my experience in general.
That seems to be the gist of it. You cannot rely on serverless alone and you need one or many pre-warmed instances at all times. This distinction is rarely mentioned in serverless GPU spaces yet has been my experience in general.