logoalt Hacker News

FuckButtons01/21/20252 repliesview on HN

Just played with the qwen32b:Q8 distillation, gave it a fairly simple python function to write (albeit my line of work is fairly niche) and it failed spectacularly. not only not giving a invalid answer to the problem statement (which I tried very hard not to make ambiguous) but it also totally changed what the function was supposed to do. I suspect it ran out of useful context at some point and that’s when it started to derail, as it was clearly considering the problem constraints correctly at first.

It seemed like it couldn’t synthesize the problem quickly enough to keep the required details with enough attention on them.

My prior has been that test time compute is a band aid that can’t really get significant gains over and above doing a really good job writing a prompt yourself and this (totally not at all rigorous, but I’m busy) doesn’t persuade me to update that prior significantly.

Incidentally, does anyone know if this is a valid observation: it seems like the more context there is the more diffuse the attention mechanism seems to be. That seems to be true for this, or Claude or llama70b, so even if something fits in the supposed context window, the larger the amount of context, the less effective it becomes.

I’m not sure if that’s how it works, but it seems like it.


Replies

nowittyusername01/21/2025

When I asked the 32b r1 distilled model its context window it said it was 4k... I dont know if thats true or not as it might not know its own architecture, but if that is true, 4k doesnt leave much especially for its <thinking> tokens. Ive also seen some negative feedback on the model, it could be that the benchmarks are false and maybe the model has simply been trained on them or maybe because the model is so new the hyperparameters havent been set up properly. we will see in the next few days i guess. from my testing theres hints of something interesting in there, but i also dont like its extremely censored nature either. and i dont mean the CCP stuff, i mean the sanitized corpo safety nonsense it was most likely trained on....

show 1 reply
Havoc01/21/2025

Try the llama one instead. Seemed better than qwen for some reason

show 1 reply