logoalt Hacker News

urbandw311ertoday at 7:06 PM0 repliesview on HN

When it comes to the evals for this kind of thing, is there a standard set of test data out there that one can work with to benchmark against? ie a collection of documents with questions that should result in particular documents or chunks being cited as the most relevant match.