logoalt Hacker News

minimaxirtoday at 6:38 AM2 repliesview on HN

We really need a replacement for all-MiniLM-L12-v2 that can create more robust embeddings with the same compute.

You can technically do Q4 quantization for larger embedding models but I am not sure if that plays nice with ONNX.


Replies

electroglyphtoday at 7:51 AM

it's a pain in the ass to do properly.

what we really need it something like auto-round for ONNX