logoalt Hacker News

behnamohyesterday at 5:05 PM0 repliesview on HN

> The UI oneshot demos are a big improvement over 4.6.

This is a terrible "test" of model quality. All these models fail when your UI is out of distribution; Codex gets close but still fails.