I wrote about cloud run and inference w/ ollama on cloud run ->

m4r1k • last Wednesday at 10:11 AM • 0 replies • view on HN

I wrote about cloud run and inference w/ ollama on cloud run -> https://medium.com/google-cloud/ollama-on-cloud-run-with-gpu...

performance is okay, ada lovelace has cuda 8_9 support which brings native fp8 support. imo the best aspect is the speed of spinning up new containers and the overall easiness of this service. the live demo at google next 25 was quite something https://www.youtube.com/watch?v=PWPvX25R6dM&t=2140s

alt Hacker News