That would be an interesting experiment. It might be more useful to make a model with a cut off clos...

nickpsecurity • yesterday at 7:17 PM • 0 replies • view on HN

That would be an interesting experiment. It might be more useful to make a model with a cut off close to when copyrights expire to be as modern as possible.

Then, we have a model that knows quite a bit in modern English. We also legally have a data set for everything it knows. Then, there's all kinds of experimentation or copyright-safe training strategies we can do.

Project Gutenberg up to the 1920's seems to be the safest bet on that.

alt Hacker News