logoalt Hacker News

danpalmer12/09/20243 repliesview on HN

...which they shouldn't have been able to get? I had thought that it was against the YouTube ToS? (my personal understanding, unrelated to my employer)


Replies

Havoc12/10/2024

AI companies don’t give a shit about ToS. Hell most of the big players actively ignored copyright entirely in bulk. See thousand upon thousands of pirated books in the pile dataset.

And right after that news broke they “fixed” the problem by stopping to disclose training data sources. Thats why early models had papers eg Llama 1 listed this and now nobody does. It’s just an unspoken yet open secret now.

show 1 reply
aprilthird202112/10/2024

The companies are fairly brazen, at least internally, about just scraping whatever, wherever and not caring about ToS of any website. All they really care about is blocking "bad" data that might make the models racist or sexual, etc.

paxys12/10/2024

If AI companies respected ToS there would be no AI