Even with a PCIe FPGA card you're still going to be memory bound during inference. When running...

mysteria • 11/20/2024 • 0 replies • view on HN

Even with a PCIe FPGA card you're still going to be memory bound during inference. When running LLama.cpp on straight CPU memory bandwidth, not CPU power, is always the bottleneck.

Now if the FPGA card had a large amount of GPU tier memory then that would help.

alt Hacker News