logoalt Hacker News

antonvstoday at 12:02 PM0 repliesview on HN

We use this in production:

https://docs.rs/onnxruntime/latest/onnxruntime/

It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.