Pied Piper vibes. As far as I can tell, this algorithm is hardly compatible with modern GPU architectures. My guess is that’s why the paper reports accuracy-vs-space, but conveniently avoids reporting inference wall-clock time. The baseline numbers also look seriously underreported. “several orders of magnitude” speedups for vector search? Really? anyone has actually reproduced these results?
Apparently MLX confirmed it - https://x.com/prince_canuma/status/2036611007523512397
Classic academic move. If the authors show accuracy-vs-space charts but hide end-to-end latency, it usually means their code is slower in practice than vanilla fp16 without any compression. Polar coordinates are absolute poison for parallel GPU compute
Efficient execution on the GPU appears to have been one of the specific aims of the authors. Table 2 of their paper shows real world performance that would appear at a glance to be compatible with inference.