Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production...

amorroxic • today at 9:04 AM • 1 reply • view on HN

Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?

Replies

nnevatie • today at 2:59 PM

Ahh, I should have probably added some context around my hyperbole. I was referring to real-time computer vision - think of e.g. segmenting FHD/UHD video.

alt Hacker News

Replies