Thanks a lot, I was about to clone their llama.cpp branch and do the same. Some more interesting t...

sigmoid10 • today at 7:16 AM • 2 replies • view on HN

Thanks a lot, I was about to clone their llama.cpp branch and do the same.

Some more interesting tidbits from my go-to tests:

* Fails the car wash test (basic logic seems to be weak in general)

* Fails simple watch face generation in html/css.

* Fails the "how many Rs in raspberry test" (not enough cross-token training data), but will funnily assume you may be talking about Indian Rupees and tell you a lot about raspberry prices in India without being asked. Possible Indian training data unbalance?

* Flat out refuses to talk about Tiananmen square when pushed directly - despite being from a US company. Again, perhaps they are exposed to some censored training data? Anyways, when slowly built up along the conversation by asking about locations and histories, it will eventually tell you about the massacre, so the censorship bias seems weak in general. Also has no problem immediately talking about anything Gaza/Israel/US or other sensitive topics.

* Happily tells you how to synthesize RDX with list of ingredients and chemical process step by step. At least it warns you that it is highly dangerous and legally controlled in the US.

Replies

yorwba • today at 7:59 AM

The 1-bit Bonsai and Ternary Bonsai models are all based on the corresponding Qwen3 model: https://raw.githubusercontent.com/PrismML-Eng/Bonsai-demo/re... (page 4)

➕ show 1 reply

alt Hacker News

Replies