We use this in production:
https://docs.rs/onnxruntime/latest/onnxruntime/
It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.