Wouldn't this be worrisome? People used StackOverflow and generated new knowledge along the way...

hintymad • yesterday at 6:41 PM • 9 replies • view on HN

Wouldn't this be worrisome? People used StackOverflow and generated new knowledge along the way. Without such medium for discussion, how can we feed models with up-to-date quality knowledge?

Replies

crazygringo • yesterday at 7:02 PM

Plenty of documentation, and plenty of code that the AI can read itself.

E.g. if a library has a bug that has a common workaround, it can learn that from open source code using the library that uses the workaround.

➕ show 4 replies

vanuatu • yesterday at 6:49 PM

I don't think its much of an issue

- Rl envs + synthetic data + human annotated

- Usage data from codex/claude code/cursor

Most of the model abilities in coding come from post-training, not pretraining

➕ show 1 reply

Jyaif • yesterday at 6:52 PM

We unironically need an StackOverflow for LLMs.

LLMs would post solutions to the issues that they've discovered after doing a lot of research.

Unfortunately the LLMs are concentrated into few providers (OpenAI, Anthropic, Google) so there's a chance they each end up doing their own private (and closed) StackOverflows. By leveraging their private StackOverflows, their LLMs will be able to short-circuit complex reasoning, saving tokens, time, and money.

➕ show 2 replies

stackghost • yesterday at 10:50 PM

I'm sure the AI companies will continue to pirate textbooks and papers, like always.

jmyeet • yesterday at 9:22 PM

Yeah, this is something I've been thinking about too. LLMs have basically profited from "stealing" (arguably) user-generated content from a time when there were no LLMs. In the LLM era there won't be a new Stack Overflow to train LLMs on going forward.

We're getting closer to Dead Internet Theory too where a lot of accounts, particularly on Twitter, are just LLMs. I imagine it's a huge problem on Reddit too. Just people farming karma or otherwise involved in influence campaigns or simply grifting to ad revenue.

So we're going to get to a point where the corpus we train LLMs on will itself just be filled with LLM slops. Self-reinforcing slop. Is that the future?

➕ show 2 replies

add-sub-mul-div • yesterday at 6:45 PM

Careful, you can't point out that the AI emperor has no clothes or you'll get called a Luddite.

piker • yesterday at 6:44 PM

Yes. Very.

nsxwolf • yesterday at 6:48 PM

How do you convince people to not want an instant answer? Even if SO didn’t result in so many “What have you tried?” responses and immediate closures, most people would still prefer instant feedback.

akkad33 • yesterday at 6:44 PM

Pointing them to docs? Which is anyway what stack overflow answers did?

➕ show 2 replies

alt Hacker News

Replies