I was surprised as well. I did go with an extreme (but true) example in the post. In this cas...

zambelli • today at 12:45 AM • 1 reply • view on HN

I was surprised as well. I did go with an extreme (but true) example in the post. In this case, native function-calling template likely is in play.

However, that doesn't explain the Lamaserver prompt vs llamafile at ~ +4pts, or vs Ollama (at ~ +30ish pts) that sits almost perfectly between llamaserver native and llamafile.

The backend affects almost all model families, and was just something I've never seen really talked about.

Replies

eob • today at 1:40 AM

Do you have any suspicion about what is different between the backends?

That's an absolutely bonkers statistic: it would mean spurious differences in hosting container overwhelm the performance differences between models.

➕ show 1 reply

alt Hacker News

Replies