logoalt Hacker News

djfergustoday at 1:16 AM0 repliesview on HN

We need a benchmark that tests a models ability to do LLM research.