> LWN.net is a reader-supported news site
I mean...
Again, the site is so old that anything worth while is already in cc or any number of crawls. I am not saying they weren't scraped. I'm saying they likely weren't scraped by the bad AI people. And certainly not by AI companies trying to limit others from accessing that data (as the person who I replied to stated).
Why is it each of your comments seems to include a dig attacking LWN?
I’m going to presume good faith rather than trolling. Some questions for you:
1. Coding assistants have emerged as as one of the primary commercial opportunities for AI models. As GP pointed out, LWN is the primary discussion for kernel development. If you were gathering training data for a model, and coding assistance is one of your goals, and you know of a primary sources of open source development expertise, would you:
2. If you’d previously slurped it up, and are now collating data for a new training run, and you know it’s an active mailing list that will have new content since you last crawled it, would you: