Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?
Ahh, I should have probably added some context around my hyperbole. I was referring to real-time computer vision - think of e.g. segmenting FHD/UHD video.