This is a very flashy page that's glossing over some pretty boring things. - This is a benchm...

0xbadcafebee • today at 5:20 PM • 3 replies • view on HN

This is a very flashy page that's glossing over some pretty boring things.

- This is a benchmark for "home security" workflows. I.e., extremely simple tasks that even open weight models from a year ago could handle.

- They're only comparing recent Qwen models to SOTA. Recent Qwen models are actually significantly slower than older Qwen models, and other open weight model families.

- Specific tasks do better with specific models. Are you doing VL? There's lots of tiny VL models now that will be faster and more accurate than small Qwen models. Are you doing multiple languages? Qwen supports many languages but none of them well. Need deep knowledge? Any really big model today will do, or you can use RAG. Need reasoning? Qwen (and some others) love to reason, often too much. They mention Qwen taking 435ms to first token, which is slow compared to some other models.

Yes, Qwen 3.5 is very capable. But there will never be one model that does everything the best. You get better results by picking specific models for specific tasks, designing good prompts, and using a good harness.

And you definitely do not need an M5 mac for all of this. Even a capable PC laptop from 2 years ago can do all this. Everyone's really excited for the latest toys, and that's fine, but please don't let people trick you into thinking you need the latest toys. Even a smartphone can do a lot of these tasks with local AI.

Replies

aegis_camera • today at 5:36 PM

Thanks a lot for your feedback :) I've noticed the slow down of QWEN3.5, so I turned it off thinking mode, the thinking mode even count words like ( 1 count 2 the 3 words, lol which is very funny ).

You are very correct, I just have 2 days of the MBP PRO 64GB on hands, so the test is just covering LLM part -- the logic handling.

For VLM, LFM is the best, even 450M works, I'll update soon :) Thanks again for your deep understanding of LLM/VLM domain and your suggestion.

mamcx • today at 5:59 PM

Where to lean what is good for what? I start experimenting with LM Studio and have a mini m4/16gb and m4 pro/24 and wanna have locally something to work "like" Claude for just coding (mostly rust and sql).

aegis_camera • today at 5:39 PM

You are right. I have Mac mini M2 16GB, it does hold all the cameras I have. Small models like QWEN 9B + LFM 450M handle their security job nicely with < $400 budge.

Will extend the test to more model and thanks again for your insight.

alt Hacker News

Replies