logoalt Hacker News

reenorapyesterday at 9:07 PM3 repliesview on HN

My biggest pet peeve with all these articles on local AI is the only thing they talk about is tokens per second. No one mentions the quality of the answers. No one. I don't mind waiting a little longer if the quality is better. Quickly serving me slop doesn't make it more useful. Are people really only looking at tokens per second?


Replies

frollogastonyesterday at 11:16 PM

The model already has its own quality benchmarks elsewhere. The article is just about running the model on X hardware, so the remaining question is then how fast it is. Or does the output quality somehow depend on the hardware too?

ozimyesterday at 10:15 PM

Local model as such will give you "autocomplete on steroids" but it is not going to run away and implement cross project feature like frontier model in let's say Cursor.

So there is no value in testing quality of answers, but there is value in testing token speed.

You just have to have correct expectations.

akmanyesterday at 9:28 PM

That's fair. There are even many dimensions to define 'quality' which include use case (coding? writing? multimedia?) and prompt. I suppose if you ask testers to provide benchmarks with their analysis, that might hamper their desire to share.