For CPU with bigger K you would put the centroids in a search tree, so take advantage of the sparsity, while a GPU would calculate the full NxK distance matrix. So from my understanding the bottleneck they are fixing doesn't show up on CPU.
search trees tend not to scale well to higher dimensions though, right?
from what I've seen I had the impression that Yinyang k-means was the best way to take advantage of the sparsity.
search trees tend not to scale well to higher dimensions though, right?
from what I've seen I had the impression that Yinyang k-means was the best way to take advantage of the sparsity.