logoalt Hacker News

amorroxictoday at 9:04 AM1 replyview on HN

Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?


Replies

nnevatietoday at 2:59 PM

Ahh, I should have probably added some context around my hyperbole. I was referring to real-time computer vision - think of e.g. segmenting FHD/UHD video.