logoalt Hacker News

captenyesterday at 7:12 PM2 repliesview on HN

It's so weird to me that the benchmarks remain so low, but the models are marketed as revolutionary. And if you say that low coding capabilities aren't a problem, say that to the token price hike and 'general use' model setup.

Why not sell it as a math agent? Why do I have to set up 4 agents to check each others' work?


Replies

npnyesterday at 9:03 PM

from what I understand, it's because unlike the other models, MAI models haven't yet fine-tuned against the synthetic datasets specifically designed to boost the benchmark scores.

redroveyesterday at 7:30 PM

It’s about bang for buck. That high a score for 5B params is pretty good, nigh unbelievable a short while ago.

It is my belief that smaller models will get better and better, and even cloud SOTA models will shrink.

Yet another reason the current buildout will feel like the railroads.

show 4 replies