Started using this earlier this week. I built a backtesting benchmark tool to compare a mix of front...

Multiplayer • today at 1:52 PM • 0 replies • view on HN

Started using this earlier this week. I built a backtesting benchmark tool to compare a mix of frontier and open-source models on a fairly heavy data analysis workflow I’d been running in the cloud.

The task is basically predicting pricing and costs.

Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.

It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.

So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.

alt Hacker News