logoalt Hacker News

Aurornistoday at 2:44 PM1 replyview on HN

If this is your first time using open weight models right after release, know that there are always bugs in the early implementations and even quantizations.

Every project races to have support on launch day so they don’t lose users, but the output you get may not be correct. There are already several problems being discovered in tokenizer implementations and quantizations may have problems too if they use imatrix.

So you’re going to see a lot of “I tried it but it sucks because it can’t even do tool calls” and other reports about how the models don’t work at all in the coming weeks from people who don’t realize they were using broken implementations.

If you want to try cutting edge open models you need to be ready to constantly update your inference engine and check your quantization for updates and re-download when it’s changed. The mad rush to support it on launch day means everything gets shipped as soon as it looks like it can produce output tokens, not when it’s tested to be correct.


Replies

colechristensentoday at 2:49 PM

You seem like you know what you're talking about... what inference engine should I use? (linux, 4090)

I keep having "I tried it but it sucks" issues mostly around tool calling and it's not clear if it's the model or ollama. And not one model in particular, any of them really.

show 4 replies