logoalt Hacker News

skysnipertoday at 5:13 PM0 repliesview on HN

thanks for the info. before running the bench i only tried it in arena.ai type of tasks and it was not impressive. i didn't expect it to be that good at agentic tasks