Nice one. K-Means is one of those neat little powertools that once you get the hang of it you find more and more applications for, but it can be a bit slow for larger data sets. So this is very nice to have, thank you matt_d for posting.
Do they mean deterministic k-means, k-means++ ... ? Global optimal k-means is NP-Hard, so linear speedups aren't terribly helpful. It's nice, until you add more input. Standard k-means would be nice, or the k-means++ seed algorithm.
Does this have corresponding speed ups or memory gains for normal CPUs too? Just thinking about all the cups of coffee that have been made and drunk while scikit-learn kmeans chugs through a notebook :)
looks like flash attention concepts applied to kmeans, nice speedup results
[dead]
They created this in service of their video generation model which "clusters and reorders tokens based on semantic similarity using k-means.":
http://arxiv.org/pdf/2505.18875