logoalt Hacker News

zambellitoday at 12:45 AM1 replyview on HN

I was surprised as well. I did go with an extreme (but true) example in the post. In this case, native function-calling template likely is in play.

However, that doesn't explain the Lamaserver prompt vs llamafile at ~ +4pts, or vs Ollama (at ~ +30ish pts) that sits almost perfectly between llamaserver native and llamafile.

The backend affects almost all model families, and was just something I've never seen really talked about.


Replies

eobtoday at 1:40 AM

Do you have any suspicion about what is different between the backends?

That's an absolutely bonkers statistic: it would mean spurious differences in hosting container overwhelm the performance differences between models.

show 1 reply