No one uses ONNXRuntime (nor the new engine in OpenCV 5) in production. For anything performance-sensitive, one would run models under TensorRT, as an example.
Strong statement to make when I have at least 2 datapoints contradicting it, in SaaS and embedded/robotics.
how are supposed to use TensorRT on iOS, iPadOS, Android or even Web? Production is not only cloud.
You can use ONNXRuntime with a TensorRT backend, so one does not exclude the other.
Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios.
I've never understood how anyone comes into contact with it and thinks its anything more than an incredible inconvenience masked as the easy way of doing things. Given it a few good shakes for various uses and regretted the time spent each time
We use this in production:
https://docs.rs/onnxruntime/latest/onnxruntime/
It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.
Ummm embedded robotics is all about this. For years.
Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?