logoalt Hacker News

gardnrlast Wednesday at 9:59 AM1 replyview on HN

The Nvidia L4 has 24GB of VRAM and consumes 72 watts, which is relatively low compared to other datacenter cards. It's not a monster GPU, but it should work OK for inference.


Replies

m4r1klast Wednesday at 10:11 AM

I wrote about cloud run and inference w/ ollama on cloud run -> https://medium.com/google-cloud/ollama-on-cloud-run-with-gpu...

performance is okay, ada lovelace has cuda 8_9 support which brings native fp8 support. imo the best aspect is the speed of spinning up new containers and the overall easiness of this service. the live demo at google next 25 was quite something https://www.youtube.com/watch?v=PWPvX25R6dM&t=2140s