The Nvidia L4 has 24GB of VRAM and consumes 72 watts, which is relatively low compared to other datacenter cards. It's not a monster GPU, but it should work OK for inference.
performance is okay, ada lovelace has cuda 8_9 support which brings native fp8 support. imo the best aspect is the speed of spinning up new containers and the overall easiness of this service. the live demo at google next 25 was quite something https://www.youtube.com/watch?v=PWPvX25R6dM&t=2140s
I wrote about cloud run and inference w/ ollama on cloud run -> https://medium.com/google-cloud/ollama-on-cloud-run-with-gpu...
performance is okay, ada lovelace has cuda 8_9 support which brings native fp8 support. imo the best aspect is the speed of spinning up new containers and the overall easiness of this service. the live demo at google next 25 was quite something https://www.youtube.com/watch?v=PWPvX25R6dM&t=2140s