logoalt Hacker News

m4r1klast Wednesday at 10:11 AM0 repliesview on HN

I wrote about cloud run and inference w/ ollama on cloud run -> https://medium.com/google-cloud/ollama-on-cloud-run-with-gpu...

performance is okay, ada lovelace has cuda 8_9 support which brings native fp8 support. imo the best aspect is the speed of spinning up new containers and the overall easiness of this service. the live demo at google next 25 was quite something https://www.youtube.com/watch?v=PWPvX25R6dM&t=2140s