logoalt Hacker News

fc417fc802today at 1:20 PM1 replyview on HN

Efficient execution on the GPU appears to have been one of the specific aims of the authors. Table 2 of their paper shows real world performance that would appear at a glance to be compatible with inference.


Replies

mskkmtoday at 1:39 PM

This is not an LLM inference result. Table 2 is the part I find most questionable. Claiming orders-of-magnitude improvements in vector search over standard methods is an extraordinary claim. If it actually held up in practice, I would have expected to see independent reproductions or real-world adoption by now. It’s been about a year since the paper came out, and I haven’t seen much of either. That doesn’t prove the claim is false, but it certainly doesn’t inspire confidence.