logoalt Hacker News

jakkostoday at 11:34 AM2 repliesview on HN

If you fork an open source project and nuke the git history, that's considered to be a "dick move" because you are erasing the record of people's contributions.

LLMs are doing this on an industrial scale.


Replies

OJFordtoday at 12:07 PM

I don't really understand how that isn't allowed/disallowed simply on the basis of whether the licence permits use without attribution?

show 1 reply
armchairhackertoday at 4:36 PM

I’ve been thinking that information provenance would be very useful for LLMs. Not just for attribution (git authors), but the LLM would know (and be able to control) which outputs are derived from reliable sources (e.g. Wikipedia vs a Reddit post; also which outputs are derived from ideologically-aligned sources, which would make LLMs more personal and subjectively better, but also easier to bias and generate deliberate misinformation).

“Information provenance” could (and I think most likely would, although I’m very unfamiliar with LLM internals) be which sources most plausibly derive an output, so even output that exists today could eventually get proper attribution.

At least today if you know something’s origin, and it’s both obvious and publicly online, you have proof via the Internet Archive.