logoalt Hacker News

nine_ktoday at 12:02 AM0 repliesview on HN

LLM-written code passed SWE Bench even back then. This may just say that SWE Bench is an inadequate test, and should not be used for serious evaluation.