logoalt Hacker News

syntaxingyesterday at 5:11 PM2 repliesview on HN

1. What do you mean by accuracy? Like the facts and information? If so, I use a Wikipedia/kiwx MCP server. Or do you mean tool call accuracy?

2. 3.6 is noticeably better than 3.5 for agentic uses (I have yet to use the dense model). The downside is that there’s so little personality, you’ll find more entertainment talking to a wall. Anything for creative use like writing or talking, I use Gemma 4. I also use Gemma 4 as a “chat” bot only, no agents. One amazing thing about the Gemma models is the vision capabilities. I was able to pipe in some handwritten notes and it converted into markdown flawlessly. But my handwriting is much better than the typical engineer’s chicken scratch.


Replies

physiclesyesterday at 10:35 PM

I have a Supernote and was looking at different models for handwriting recognition, and I agree that gemma4-26B is the best I’ve tried so far (better than a qwen3-vl-8B and GLM-OCR). Besides turning off thinking, does your setup have any special sauce?

show 1 reply
throwaw12yesterday at 5:15 PM

by accuracy I meant how close is the output to your expectations, for example if you ask 8B model to write C compiler in C, it outputs theory of how to write compiler and writes pseudocode in Python. Which is off by 2 measures: (1) I haven't asked for theory (2) I haven't asked to write it in Python.

Or if you want to put it differently, if your prompt is super clear about the actions you want it to do, is it following it exactly as you said or going off the rails occasionally

show 1 reply