these things can just change with infrastructure changes rather than be some mysterious A/B tes...

hhh • yesterday at 9:34 PM • 1 reply • view on HN

these things can just change with infrastructure changes rather than be some mysterious A/B testing.

I don't disagree, we've seen performance shift with capacity changes in the past.

With that said, I doubt OpenAI would choose to publish a singular coding benchmark for a new model that exactly matches their previous model (88.8%).

alt Hacker News