logoalt Hacker News

slewislast Wednesday at 12:04 AM1 replyview on HN

OpenAI created a benchmark for this: https://openai.com/index/paperbench/


Replies

suddenlybananaslast Wednesday at 6:38 AM

Still has data contamination though.

show 1 reply