My pet theory is that OpenAI screwed up the image normalization calculation and was stuck with the mistake since that's something that can't be worked around.
At the least, it's not present in these new images.
There's still something off in the grading, and I suspect they worked around it
(although I get what you mean, not easily since you already trained)
I'm guessing when they get a clean slate we'll have Image 2 instead of 1.5. In LMArena it was immediately apparent it was an OpenAI model based on visuals.
wdym it cant be worked around when there exist literal yellow tint corrector models/tools haha