> the overwhelming majority of input it has in-fact seen somewhere in the corpus it was traine...

crazygringo • last Monday at 10:07 PM • 2 replies • view on HN

> the overwhelming majority of input it has in-fact seen somewhere in the corpus it was trained on.

But it thinks just great on stuff it wasn't trained on.

I give it code I wrote that is not in its training data, using new concepts I've come up with in an academic paper I'm writing, and ask it to extend the code in a certain way in accordance with those concepts, and it does a great job.

This isn't regurgitation. Even if a lot of LLM usage is, the whole point is that it does fantastically with stuff that is brand new too. It's genuinely creating new, valuable stuff it's never seen before. Assembling it in ways that require thinking.

Replies

zeroonetwothree • yesterday at 6:34 AM

I think it would be hard to prove that it's truly so novel that nothing similar is present in the training data. I've certainly seen in research that it's quite easy to miss related work even with extensive searching.

rustystump • last Monday at 11:07 PM

I think you may think too highly of academic papers or more so that they oft still only have 1% in there.

➕ show 1 reply

alt Hacker News

Replies