alt
Hacker News
djfergus
•
today at 1:16 AM
•
0 replies
•
view on HN
We need a benchmark that tests a models ability to do LLM research.