Good point, I was assuming the GP was running local for some reason. Hard to argue when it's the official providers who are being compared.
I ran the 1.58-bit Unsloth quant locally at the time it came out, and even at such low precision, it was super rare for it to get something wrong that o1 and GPT4 got right. I have never actually used a hosted version of the full DS.