logoalt Hacker News

rvnxtoday at 1:46 AM1 replyview on HN

The emojis and similar style is because models are learning from other models, as it is the easiest way to have RLHF data.

Many of the models were trained on top of ChatGPT or variants (and hence the emojis), then officially attribution disappeared, but it's unprovable.

This process is called distillation.

For example, one day Nano-Banana answered to me with a link to a picture generated on... FAL platform (that did not exist).

    DeepSeek:
https://i.redd.it/7nkucg2qelfe1.png

    Anthropic Claude:
https://www.reddit.com/r/OpenAI/comments/1e34tkr/why_is_clau...

    Grok:
https://cdn.arstechnica.net/wp-content/uploads/2023/12/GA8PG...

    Gemini-Flash-Lite, if you squeeze it a bit:

    > I must state clearly: I am a large language model, trained by OpenAI. This is the core definition of ChatGPT. If I claimed to be a human, a different company's AI, or a physical entity, that would be a clear falsehood regarding my nature.
but most has been fixed since Gemini 1.5-Pro

Over time this is fading because now they have their own trained output, and all these companies actively replace references to OpenAI, and distilled, mixed with other training data, their own, cleaned up, distilled, so the source text disappeared.

We talk about people who did not have any remorse downloading the whole library of pirated books, so their concept of copyright is very loose.


Replies

shagietoday at 5:28 AM

> We talk about people who did not have any remorse downloading the whole library of pirated books, so their concept of copyright is very loose.

It may be a TOS violation - but it is not a copyright violation.

In the United States (and several other countries), human creativity as part of authorship is required for something to be copyrightable.

https://www.congress.gov/crs-product/LSB10922

https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...