logoalt Hacker News

7777777philtoday at 8:00 AM0 repliesview on HN

32B model in 19.3GB matters is really cool imo. Memory and cold start are what gate production deployments.

I did a piece (1) on how Netflix and Spotify worked this out a while ago, cheap classical methods handle 90%+ of their recommendation requests and LLMs only get called when the payoff justifies it.

(1) https://philippdubach.com/posts/bandits-and-agents-netflix-a...