That ARC AGI score is a little suspicious. That's a really tough for AI benchmark. Curious if t...

causal • yesterday at 6:37 PM • 2 replies • view on HN

That ARC AGI score is a little suspicious. That's a really tough for AI benchmark. Curious if there were improvements to the test harness because that's a wild jump in general problem solving ability for an incremental update.

Replies

woeirua • yesterday at 10:37 PM

They're clearly building better training datasets and doing extensive RL on these benchmarks over time. The out of distribution performance is still awful.

taurath • yesterday at 6:50 PM

I don’t think their words mean just about anything, only the behavior of the models.

Still waiting of Full Self Driving myself.

alt Hacker News

Replies