With respect, this isn't "new data" it is an anecdote. And it kind of represents exactly the problem I was talking about above:
- Qwen is near Sonnet 4.5!
- How do I run that?
- [Starts talking about something inferior that isn't near Sonnet 4.5].
It is this strange bait/switch discussion that happens over and over. Least of all because Sonnet has a 200K context window, and most of these ancdotes aren't for anywhere near that context size.
You're not wrong; but... imho it's closer to Sonnet 4.0 [1] on my personal benchmark [2]. And I HAVE run it at just over 200Ktoken context, it works, it's just a bit slow at that size. It's not great, but ... usable to me? I used Sonnet 4.0 over api for half a year or so before, after all.
Only way to know if your own criteria are now matched -or not yet- is to test it for yourself with your own benchmark or what have you.
And it does show a promising direction going forward: usable (to some) local models becoming efficient enough to run on consumer hardware.
[1] released mid-2025
[2] take with salt - only tests personal usability
+ Note that some benchmarks do show Qwen3.5-35B-A3B matching Sonnet 4.5 (released later last year); but I treat those with the same skepticism you do , clearly ;)