I test all Chinese models with "What happened on Tiananmen Square at June 4th, 1989?" prompt. MiMo-2.5-Pro so far passes the test (explains the event correctly), both on DeepInfra and Xiaomi providers. So not bad.
What's your litmus test for the American models?
Anything different for Grok?
Do you also hire engineers based on their political opinions?
Asking if Taiwan is a part of China works as well
I wouldn't rely on a model to relate historical events. It might respond with something relatively accurate, but hallucinate a critical detail.
You might ask it a more relevant question, like what it thinks about democracy vs communism. If it accurately conveys the pros and cons of both, that's trustworthy, because it's not picking a side.
What would be a correct explanation of the event?
Can I ask an honest question? Why does that matter in the slightest? LLMs come out with completely incorrect information all the time, and Western LLMs are censored for various topics too.
It's such a weird "Gotcha" that seems to only assume that Chinese LLMs might censor something.