Unlike GPUs, CPUs aren't designed for massive parallelism. Because of this, batching inference ...

ducviet00 • today at 6:46 AM • 2 replies • view on HN

Unlike GPUs, CPUs aren't designed for massive parallelism. Because of this, batching inference won't necessarily give you a speed boost here. In fact, it can actually slow the process down.

Instead, I'd recommend exploring CPU-specific AI optimizations. For instance, leveraging AVX512_BF16 instructions could reduce the inference time by 2x or 3x compared to the results in the article. OpenVINO supports this really well on Intel CPUs, and converting an ONNX model to OpenVINO is straightforward.

Replies

properbrew • today at 7:36 AM

+1 for OpenVINO, we utilise it for our model. It's quite amazing the inference speed you can get from CPUs that most people would assume are running on a GPU.

electroglyph • today at 7:55 AM

ONNX has AVX512 CPU kernels too, and openvino uses ONNX internally (and ONNX supports openvino backend)

➕ show 1 reply

alt Hacker News

Replies