This is awesome! Using a CLIP or Dino v2 model to produce image embeddings would probably improve the similarity search a lot - kind of similar to http://same.energy/ .