logoalt Hacker News

andy12_yesterday at 8:38 PM0 repliesview on HN

I disagree. Even frontier models still achieve way worse results than the human baseline in VendingBench. As long as models can't manage optimally something as simple as a vending machine, they have no hope of managing a McDonalds.