Can you be more specific about which math results you are talking about? Looks like significant impr...

highfrequency • yesterday at 6:36 PM • 1 reply • view on HN

Can you be more specific about which math results you are talking about? Looks like significant improvement on FrontierMath esp for the Pro model (most inference time compute).

Replies

ZeroCool2u • yesterday at 6:38 PM

Frontier Math, GPQA Diamond, and Browsecomp are the benchmarks I noticed this on.

➕ show 1 reply

alt Hacker News

Replies