google vp here: we appreciate the feedback! i generally agree that if you have a strong understanding of your static capacity needs, pre-provisioning VMs is likely to be more cost efficient with today's pricing. cloud run GPUs are ideal for more bursty workloads -- maybe a new AI app that doesn't yet have PMF, where you really need that scale-to-zero + fast start for more sparse traffic patterns.
Has this changed? When I looked pre-ga the requirements were you need to pay for the CPU 24x7 to attach a GPU so that is not really scaling to zero unless this requirement has changed...
How does that compare to spinning up some ec2s with amazon trainium gpus?
Appreciate the thoughtful response! I’m actually right in the ICP you described — I’ve run my own VMs in the past and recently switched to Cloud Run to simplify ops and take advantage of scale-to-zero. In my case, I was running a few inference jobs and expected a ~$100 bill. But due to the instance-based behavior, it stayed up the whole time, and I ended up with a $1,000 charge for relatively little usage.
I’m fairly experienced with GCP, but even then, the billing model here caught me off guard. When you’re dealing with machines that can run up to $64K/month, small missteps get expensive quickly. Predictability is key, and I’d love to see more safeguards or clearer cost modeling tooling around these types of workloads.