logoalt Hacker News

threetonesunyesterday at 4:08 PM3 repliesview on HN

If we kill all the platforms where content for training LLMs comes from, what do LLMs train on?


Replies

InsideOutSantayesterday at 4:45 PM

This. I'm really bothered by the almost cruel glee with which a lot of people respond to SO's downfall. Yeah, the moderation was needlessly aggressive. But it was successful at creating a huge repository of text-based knowledge which benefited LLMs greatly. If SO is gone, where will this come from for future programming languages, libraries, and tools?

jrmgyesterday at 4:19 PM

This always feels to me like, an elephant in the room.

I’d love to read a knowledgeable roundup of current thought on this. I have a hard time understanding how, with the web becoming a morass of SEO and AI slop - with really no effort being put into to keeping it accurate - we’ll be able to train LLMs to the level we do today in the future.

rvnxyesterday at 4:14 PM

Newspapers, scientific papers and soon, real-world interactions.

News is the main feed of new data and that can be an infinite incremental source of new information

show 1 reply