logoalt Hacker News

omneitylast Wednesday at 2:15 PM1 replyview on HN

That seems to be the gist of it. You cannot rely on serverless alone and you need one or many pre-warmed instances at all times. This distinction is rarely mentioned in serverless GPU spaces yet has been my experience in general.


Replies

nullpointerexplast Wednesday at 3:42 PM

When scaling from 0 to 1 instances, yes, you have to wait 19 seconds.

For scaling N --> N+1 - If you configure the correct concurrency value (the number of parallel requests one instance can handle), Cloud Run will scale up to additional instances when getting to X% (I think it's 70%). That will be before the instance is fully exhausted. So your users should not experience the 19 seconds cold start.