But they don't show "strictly better" performance at cost per task! The graphs show...

jsnell • yesterday at 9:07 PM • 1 reply • view on HN

But they don't show "strictly better" performance at cost per task!

The graphs show parts of the cost/performance pareto frontier occupied by Opus 4.8 and others occupied by Sonnet 5.0. If Opus 4.8 was strictly better at cost per task like you say, by definition the entire frontier would be occupied by Opus.

So neither is pareto-dominant over the other. In contrast, Sonnet 5.0 is Pareto-dominent over Sonnet 4.6 on those graphs.

Replies

energy123 • yesterday at 9:24 PM

> by definition the entire frontier would be occupied by Opus.

But the entire frontier is occupied by Opus under any reasonable interpolation scheme (piecewise linear which is what they've done, and most reasonable spline or polynomial fits would also lead to the same result) over the overlapping x values for which both are defined.

Under that interpolation scheme, for x > ($ cost of Opus low effort), Opus is Pareto-dominant over Sonnet 5. You can see this by picking any point on Opus's interpolation and realizing that you get strictly worse by switching to Sonnet for the same x value or the same y value. Meaning if you want to pay the same $x then you get a worse y, or if you want the same y you pay more $x.

➕ show 1 reply

alt Hacker News

Replies