Frontier Math, GPQA Diamond, and Browsecomp are the benchmarks I noticed this on.

ZeroCool2u • yesterday at 6:38 PM • 1 reply • view on HN

Replies

Are you may be comparing the pro model to the non pro model with thinking? Granted it’s a bit confusing but the pro model is 10 times more expensive and probably much larger as well.

➕ show 1 reply

alt Hacker News

Replies