> Subsequent to this solve, we finished developing our general scaffold for testing models on Fro...

6thbit • today at 2:47 AM • 1 reply • view on HN

> Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh).

Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?

Replies

inkysigma • today at 3:21 AM

I think in this context, scaffolds are generally the harness that surrounds the actual model. For example, any tools, ways to lay out tasks, or auto-critiquing methods.

I think there's quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky.

➕ show 1 reply

alt Hacker News

Replies