It’s the mean. At least in Lucene. Using median would be an interesting experiment. Do you know of...

softwaredoug • 11/20/2024 • 1 reply • view on HN

It’s the mean. At least in Lucene. Using median would be an interesting experiment.

Do you know of a search dataset with very large document length differences? MSMarco for example is pretty consistent in length.

Replies

MPSimmons • 11/22/2024

Was just thinking about some of the docs we have at work, and how most are relatively short ( probably < 10 pages) and some are like... 200+ page government things

alt Hacker News

Replies