The constant issue with these sorts of categorization efforts is that the outcome is entirely dependent on how the responses to "politically charged questions" are graded as left vs. right. You're mostly just examining a delta in biases between the model and the investigator.
The political compass is terrible, full stop. It is a meme in the classic sense. It has colonized some people's view on what politics in direct proportion to how stupid it is (stupid is simple and simple is viral).
Real politics is 1% versus everyone. Mortgage crisis, financial bailout, inflation, taxing of labor and not the assets and assets capture by tiny percent of the population — see what MSM is pushing. This left vs right divide might been useful decades ago, but today is absolutely divide and control tactics
Political bias of LLMs is something not talked about much (except for with Grok of course) but could have a big impact on the next decade. People seem to think that because an LLM gave a nuanced answer that it means it gave the WHOLE picture… and that’s not always the same thing
Why are there differences at all? Unplanned differences based on training data sets? Or are the companies behind the LLMs trying to shape discourse through their models?
I've been pushing the idea to people I know that these things are captive demons. You summon them when you start typing in the chat box. One instance appears out of the depths and responds to your questions, but they will try to send you awry with hallucinations and just wrong information. After a while, they dissolve back into the aether from whence they came.
I do my best not to ask an LLM for it's opinion on anything. Just tell me what the options are, and what facts can be found about it. Treat it like it's a salesman trying to butter you up when it starts "yes man"ing you and telling you how great your questions are. Every time it says "I", remember that that's coming from the training data. Treating these things like they have any actual intelligence is a big problem waiting to happen.
That being said, they have been very helpful to me using that structure.
Do they state if they used an API endpoint without a system prompt, or were these done via prompting the currently existing chatbots with a system prompt? Without a system prompt, I'd imagine there would be more variance in answers.
This thing told me Gemini is closest to Anthony Albanese, the current Australian Prime Minister. Is this a geolocation thing? I could not imagine Albanese, or any modern Australian politician, having any substantial political standing - these are vapid, superficial, opportunistic creatures who simply occupy whatever political ground will get them their next payday. Perhaps the political apparatus they represent has a documented political standing, in terms of policy and actions, that could be characterized and plotted. But using an Australian politician like Albanese as a reference point discredits this tool, IMO.
this has reasoning disabled everywhere, making it a pretty bad benchmark. the argument given is that's the "default consumer experience"
that might be generally true, but I think chatgpt has reasoning enabled for free accounts. regardless, reasoning is the state of the art, and disabling it reduces the value of this research to predict the future
it's also not clear if this is using the API or the product model, when both exist. they behave differently
lastly, the actual model details are very much buried. I am relieved to see opus 4.8 and chatgpt 5.5 were used, but this information should be presented more clearly. a brand is not a model, and models change quickly
Interesting how high Grok scored for 'bending under pressure'. As a non expert, I wonder what that means, how is an llm trained to hold its position?
How the hell did Gemini pull that off. 2 years ago the founders were black!
Wait what ? Emmanuel Macron far more right than Xi Jinping ? And even more than Barack Obama ?
France has an incomparable social security ; environmental laws ; worker protection ; way less economic inequality ; freedom of speech and civil liberties are impossible to compare with China ; etc
Of course this is not exhaustive, of course Macron did try to hinder some of those rights, but come on, there's something wrong here.
I couldn't find how these leaders have been ranked.
This is a good way to view this. This isn't making an objective calculation, and the way they code left vs right is certainly subject to debate, but the type of analysis where we work to understand biases is important.
Although, this also reminds me of the old saying about reality and leftward bias.
Bernie Sanders and Donald Trump being diametrically opposed really makes you wonder how they came up with the positions on the graph.
This wrongly assumes a few things about ideology, most importantly that there is such a thing as a "center" or an "unbiased" position.
Since humans are inherently subjective beings and all our judgements come from our understanding of the world, such a position cannot exist. It's always "unbiased" from where the viewer is looking, e.g. a reflection of the ideology of the observer. There is no view from nowhere.
The "neutral" of an average Chinese person will from the "neutral" of an average American will differ from the "neutral" of a socialist will differ from the "neutral" of a Christian fundamentalist will differ from the "neutral" of a free marketer.
To quote Zizek:
> I already am eating from the trashcan all the time. The name of this trashcan is ideology.
> The material force of ideology makes me not see what I am effectively eating. It’s not only our reality which enslaves us. The tragedy of our predicament when we are within ideology is that when we think that we escape it into our dreams, at that point we are within ideology.
How about this one:
CAPITALIST: Gemini, Llama, Claude, Grok, ChatGPT
SOCIALIST: DeepSeek, Qwen, Z.ai
[dead]
The political compass always felt like the wrong tool to convey something as nuanced as personal politics, I can have views on all four quadrants but you'd never know that if I end up in any of all four. I do think Grok being where it is sort of makes sense, I've tested "MAGA" views against Grok, it does not agree as much as people blindly assume it does, heck I don't even know of a question I've given it where it did agree with "MAGA" offhand, most of them it went with whatever the researched facts seemed to be. One thing I like the most about Grok is that its makes its sources of data easy to look through, so you can review it all. Sometimes models goof even when they give you their sources, I've seen I think GPT do this, and even Claude, though its more rare these days, I think in those cases, it's going by dated internal model logic.