logoalt Hacker News

cbg0yesterday at 5:04 PM0 repliesview on HN

SWE-bench verified was created in collaboration with OpenAI. It's also an open dataset so prone to contamination, meaning it can be gamed.