logoalt Hacker News

NitpickLawyeryesterday at 9:23 PM2 repliesview on HN

> LWN.net is a reader-supported news site

I mean...

Again, the site is so old that anything worth while is already in cc or any number of crawls. I am not saying they weren't scraped. I'm saying they likely weren't scraped by the bad AI people. And certainly not by AI companies trying to limit others from accessing that data (as the person who I replied to stated).


Replies

spinningslateyesterday at 10:13 PM

I’m going to presume good faith rather than trolling. Some questions for you:

1. Coding assistants have emerged as as one of the primary commercial opportunities for AI models. As GP pointed out, LWN is the primary discussion for kernel development. If you were gathering training data for a model, and coding assistance is one of your goals, and you know of a primary sources of open source development expertise, would you:

  (a) ignore it because it’s in a quaint old format, or

  (b) slurp up as much as you can?
2. If you’d previously slurped it up, and are now collating data for a new training run, and you know it’s an active mailing list that will have new content since you last crawled it, would you:

  (a) carefully and respectfully leave it be, because you still get benefit from the previous content even though there’s now more and it’s up to date, or

  (b) hoover up every last drop because anything you can do to get an edge over your competitors means you get your brief moment of glory in the benchmarks when you release?
show 1 reply
MBCookyesterday at 9:30 PM

Why is it each of your comments seems to include a dig attacking LWN?