They created this in service of their video generation model which "clusters and reorders tokens based on semantic similarity using k-means.":
http://arxiv.org/pdf/2505.18875
Project website https://svg-project.github.io/
Project website https://svg-project.github.io/