We really need a replacement for all-MiniLM-L12-v2 that can create more robust embeddings with the s...

minimaxir • today at 6:38 AM • 2 replies • view on HN

We really need a replacement for all-MiniLM-L12-v2 that can create more robust embeddings with the same compute.

You can technically do Q4 quantization for larger embedding models but I am not sure if that plays nice with ONNX.

electroglyph • today at 7:51 AM

it's a pain in the ass to do properly.

what we really need it something like auto-round for ONNX

alt Hacker News