They should do a 95% and 99% version of the graphs, otherwise it's hard to ascertain whether th...

atleastoptimal • today at 7:22 AM • 0 replies • view on HN

They should do a 95% and 99% version of the graphs, otherwise it's hard to ascertain whether the failure cases will remain in the elusive "stuff humans can do easily but LLM's trip up despite scaling"

alt Hacker News