My biggest pet peeve with all these articles on local AI is the only thing they talk about is tokens per second. No one mentions the quality of the answers. No one. I don't mind waiting a little longer if the quality is better. Quickly serving me slop doesn't make it more useful. Are people really only looking at tokens per second?
Local model as such will give you "autocomplete on steroids" but it is not going to run away and implement cross project feature like frontier model in let's say Cursor.
So there is no value in testing quality of answers, but there is value in testing token speed.
You just have to have correct expectations.
That's fair. There are even many dimensions to define 'quality' which include use case (coding? writing? multimedia?) and prompt. I suppose if you ask testers to provide benchmarks with their analysis, that might hamper their desire to share.
The model already has its own quality benchmarks elsewhere. The article is just about running the model on X hardware, so the remaining question is then how fast it is. Or does the output quality somehow depend on the hardware too?