logoalt Hacker News

nnevatietoday at 8:14 AM8 repliesview on HN

No one uses ONNXRuntime (nor the new engine in OpenCV 5) in production. For anything performance-sensitive, one would run models under TensorRT, as an example.


Replies

amorroxictoday at 9:04 AM

Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?

show 1 reply
snovv_crashtoday at 8:48 AM

Strong statement to make when I have at least 2 datapoints contradicting it, in SaaS and embedded/robotics.

pzotoday at 11:00 AM

how are supposed to use TensorRT on iOS, iPadOS, Android or even Web? Production is not only cloud.

OvervCWtoday at 9:49 AM

You can use ONNXRuntime with a TensorRT backend, so one does not exclude the other.

gunalxtoday at 8:36 AM

Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios.

monster_trucktoday at 12:28 PM

I've never understood how anyone comes into contact with it and thinks its anything more than an incredible inconvenience masked as the easy way of doing things. Given it a few good shakes for various uses and regretted the time spent each time

antonvstoday at 12:02 PM

We use this in production:

https://docs.rs/onnxruntime/latest/onnxruntime/

It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.

ciktoday at 1:16 PM

Ummm embedded robotics is all about this. For years.