> I did not do a very long session
This is always the problem with the 2-bit and even 3-bit quants: They look promising in short sessions but then you try to do real work and realize they’re a waste of time.
Running a smaller dense model like 27B produces better results than 2-bit quants of larger models in my experience.
Lots of people seem to use 4bit. Do you think that's worth it vs a smaller model in some cases?
> This is always the problem with the 2-bit and even 3-bit quants: They look promising in short sessions but then you try to do real work and realize they’re a waste of time.
It would be nice to see a scientific assessment of that statement.