logoalt Hacker News

deepsquirrelnetyesterday at 12:50 PM1 replyview on HN

This is pretty cool. I have a similar model that’s 8 days into training on msmarco.

So far I only have the “cold start” data posted, but I’m planning on posting a full distillation dataset.

https://huggingface.co/datasets/dleemiller/lm25


Replies

jacobgormyesterday at 6:22 PM

What kind of hardware setup would be needed to replicate the paper’s results?