logoalt Hacker News

highfrequencyyesterday at 6:36 PM1 replyview on HN

Can you be more specific about which math results you are talking about? Looks like significant improvement on FrontierMath esp for the Pro model (most inference time compute).


Replies

ZeroCool2uyesterday at 6:38 PM

Frontier Math, GPQA Diamond, and Browsecomp are the benchmarks I noticed this on.

show 1 reply