logoalt Hacker News

throwaw12yesterday at 5:09 PM2 repliesview on HN

compared to your test with GLM 5.1, this indeed looks off

https://xcancel.com/simonw/status/2041646779553476801


Replies

simonwyesterday at 5:21 PM

Yeah GLM 5.1 did an outstanding job on the possum - better than Opus 4.7 or GPT-5.4 and I think better than Gemini 3.1 Pro too.

But GLM 5.1 is a 1.51TB model, the Qwen 3.6 I used here was 17GB - that's 1/88 the size.

show 1 reply
refulgentisyesterday at 5:18 PM

Hoping this doesn't turn into a pelican-SVG back-and-forth: yesterday's GPT Image 2 thread ended up being three screenfuls of "I tried the prompt too" replies, and nothing on the model until you scroll past it. I appreciate the testing, and I know this sounds like fun police, but there's a pattern where well-known commenter + one-off vibe test + 1:1 sub-threads eats the whole discussion. It being fun makes it hard to push back on without looking picky.

show 1 reply