logoalt Hacker News

sigmartoday at 4:15 PM2 repliesview on HN

blog post is up- https://blog.google/innovation-and-ai/models-and-research/ge...

edit: biggest benchmark changes from 3 pro:

arc-agi-2 score went from 31.1% -> 77.1%

apex-agents score went from 18.4% -> 33.5%


Replies

ripbozotoday at 4:29 PM

Does the arc-agi-2 score more than doubling in a .1 release indicate benchmark-maxing? Though i dont know what arc-agi-2 actually tests

show 4 replies
sho_hntoday at 4:20 PM

The touted SVG improvements make me excited for animated pelicans.

show 6 replies