All the cruft of a big cloud provider, AND the joy of uncapped yolo billing that has the potential to drain your creditcard overnight. No thanks, I'll personally stick with Modal and vast.ai
The pricing doesn't look that compelling, here are the hourly rate comparisons vs runpod.io vs vast.ai:
1x L4 24GB: google: $0.71; runpod.io: $0.43, spot: $0.22
4x L4 24GB: google: $4.00; runpod.io: $1.72, spot: $0.88
1x A100 80GB: google: $5.07; runpod.io: $1.64, spot: $0.82; vast.ai $0.880, spot: $0.501
1x H100 80GB: google: $11.06; runpod.io: $2.79, spot: $1.65; vast.ai $1.535, spot: $0.473
8x H200 141GB: google: $88.08; runpod.io: $31.92; vast.ai $15.470, spot: $14.563
Google's pricing also assumes you're running it 24/7 for an entire month, where as this is just the hourly price for runpod.io or vast.ai which both bill per second. Wasn't able to find Google's spot pricing for GPUs.I’m personally a huge fan of Modal, and have been using their serverless scale-to-zero GPUs for a while. We’ve seen some nice cost reductions from using them, while also being able to scale WAY UP when needed. All with minimal development effort.
Interesting to see a big provider entering this space. Originally swapped to Modal because big providers weren’t offering this (e.g. AWS lambdas can’t run on GPU instances). Assuming all providers are going to start moving towards offering this?
Reason Cloud Run is so nice compared to other providers is that it has autoscaling, with scaling to 0. Meaning it can cost basically 0 if it's not being used. Also can set a cap on the scaling, e.g. 5 instances max, which caps the max cost of the service too. - Note, I only have experience with the CPU version of Cloud Run, (which is very reliable / easy).
A small and independent EU GPU cloud provider, DataCrunch (I am not affiliated), offers VMs with Nvidia GPUs even cheaper than Run Pod, etc
1x A100 80Gb 1.37€/hour
1x H100 80Gb 2.19€/hour
i'm the vp/gm responsible for cloud run and GKE. great to see the interest in this! happy to answer questions on this thread.
Oh this is great news. After a $1000 bill running a model on vertex.ai continuously for a little test i forgot to shut down, this will be my go to now. I've been using Cloud Run for years running production microservices, and little hobby projects and i've found it simple and cost effective.
If I understand this correctly, I should be able to stand up an API running arbitrary models (e.g. from Hugging Face), and it’s not quite charged by the token but should be very cheap if my usage is sporadic. Is that correct? Seems pretty huge if so, most of the providers I looked at required a monthly fee to run a custom model.
Love cloud run and this looks like a great addition. Only things I wish from cloud run is being able to run self hosted GitHub runners on it (last time I checked this wasn’t possible as it requires root), also the new worker pool feature seems great in practice but it looks like you have to write the scaler yourself rather than it being built in.
I'm the developer of kdeps.com, and I really like Google Cloud Run, been using it since beta version. Kdeps outputs Dockerized full-stack AI agent apps that runs open-source LLMs locally, and my project works so well with GCR.
That’s 67ct / hour for a gpu enabled instance. That’s pretty good, but I have no idea how T4 GPU’s compare against others.
The value in this really is running small custom models or the absolute latest open weight models.
Why bother when you can get payg API access to popular open weights models like Llama on Vertex AI model garden or at the edge on Cloudflare?
The Nvidia L4 has 24GB of VRAM and consumes 72 watts, which is relatively low compared to other datacenter cards. It's not a monster GPU, but it should work OK for inference.
How does this compare to Fly GPUs in terms of pricing?
i wonder what all this hype-driven overcapacity will be used for by future generations.
once this bubble pops we are going to have some serious albeit high-latency hardware
Everything good except the price.
if only they had some decent GPUs. L4s are pretty limited these days.
Im tired of using AI in cloud services. I want user friendly locally owned AI hardware.
Right now nothing is consumer friendly. I can’t get a packaged deal of some locally running ChatGPT quality UI or voice command system in an all in one package. Like what Macs did for PCs I want the same for AI.
Does anyone actually run a modest sized app and can share numbers on what one gpu gets you? Assuming something like vllm for concurrent requests, what kind of throughput are you seeing? Serving an LLM just feels like a nightmare.
Why is commercial advertising published as a content article here?
I've been using this for daily/weekly ETL tasks which saves quite a lot of money vs having an instance on all the time but it's been clunky.
The main issue is despite there being a 60 minute timeout available the API will just straight up not return a response code if your request takes > ~5 minutes in most cases so you gotta make sure you can poll where the datas being stored and let the client time out
[flagged]
> Time-to-First-Token of approximately 19 seconds for a gemma3:4b model (this includes startup time, model loading time, and running the inference)
This is my biggest pet-peeve with serverless GPU. 19 seconds is a horrible latency from the user’s perspective and that’s a best case scenario.
If this is the best one of the most experienced teams in the world can do, with a small 4B model, then it feels like serverless is really restricted to non-interactive use cases.
I love Google Cloud Run and highly recommend it as the best option[1]. The Cloud Run GPU, however is not something I can recommend. It is not cost effective (instance based billing is expensive as opposed to request based billing), GPU choices are limited, and the general loading/unloading of model (gigabytes) from GPU memory makes it slow to be used as server less.
Once you compare the numbers it is better to use a VM + GPU if the utilization of your service is even only for 30% of the day.
1 - https://ashishb.net/programming/free-deployment-of-side-proj...