It's so weird to me that the benchmarks remain so low, but the models are marketed as revolutio...

capten • yesterday at 7:12 PM • 2 replies • view on HN

It's so weird to me that the benchmarks remain so low, but the models are marketed as revolutionary. And if you say that low coding capabilities aren't a problem, say that to the token price hike and 'general use' model setup.

Why not sell it as a math agent? Why do I have to set up 4 agents to check each others' work?

Replies

npn • yesterday at 9:03 PM

from what I understand, it's because unlike the other models, MAI models haven't yet fine-tuned against the synthetic datasets specifically designed to boost the benchmark scores.

redrove • yesterday at 7:30 PM

It’s about bang for buck. That high a score for 5B params is pretty good, nigh unbelievable a short while ago.

It is my belief that smaller models will get better and better, and even cloud SOTA models will shrink.

Yet another reason the current buildout will feel like the railroads.

➕ show 4 replies

alt Hacker News

Replies