He tried with a tiny model (gemma3:4b), got a range from 66 to 99. Then tried again with a small mod...

pu_pe • today at 7:22 AM • 1 reply • view on HN

He tried with a tiny model (gemma3:4b), got a range from 66 to 99. Then tried again with a small model (gemini 3.1 flash lite), the range was 48 to 64. Would a frontier model be more consistent? Perhaps this tool was optimized for more capable models?

Replies

srdjanr • today at 7:45 AM

It makes sense to me intuitively (though I'm not sure if my reasoning is actually correct).

Worse model may not "know" enough to distinguish between a 70 and a 100 candidate, so it's expected that it's output has high variance. But a better model might "know" enough, so it can be more confident and thus more consistent.

alt Hacker News

Replies