logoalt Hacker News

A Developer Accidentally Found CSAM in AI Data. Google Banned Him for It

108 pointsby markatlargelast Thursday at 4:02 PM82 commentsview on HN

Comments

winchester6788last Thursday at 5:22 PM

Author of NudeNet here.

I just scraped data from reddit and other sources so i could build a nsfw classifier and chose to open source the data and the model for general good.

Note that i was a 1 year experienced engineer working solely on this project in my free time, so it was basically impossible for me to review or clear out the few csam images in the 100,000+ images in the dataset.

Although, now i wonder if i should never have open sourced the data. Would have avoided lot of these issues.

show 3 replies
deltoidmaximuslast Thursday at 4:28 PM

Back when the first moat creation gambit for AI failed (that they were creating SkyNet so the government needs to block anyone else from working on SkyNet since only OpenAI can be trusted to control it not just any rando) they moved onto the safety angle with the same idea. I recall seeing an infographic that all the major players were signed onto some kind of safety pledge, Meta, OpenAI, Microsoft, etc. Basically they didn't want anyone else training on the whole world's data because only they could be trusted to not do nefarious things with it. The infographic had a statement about not training on CSAM and revenge porn and the like but the corpospeak it was worded in made it sound like they were promising not to do it anymore, not that they never did.

I've tried to find this graphic against several times over the years but it's either been scrubbed from the internet or I just can't remember enough details to find it. Amusingly, it only just occurred to me that maybe I should ask ChatGPT to help me find it.

show 1 reply
jsnelllast Thursday at 4:32 PM

As a small point of order, they did not get banned for "finding CSAM" like the outrage- and clickbait title claims. They got banned for uploading a data set containing child porn to Google Drive. They did not find it themselves, and them later reporting the data set to an appropriate organization is not why they got banned.

show 3 replies
amarcheschilast Thursday at 4:44 PM

Just a few days ago I was doing some low paid (well, not so low) Ai classification task - akin to mechanical turk ones - for a very big company and was - involuntarily, since I guess they don't review them before showing - shown an ai image by the platform depicting a naked man and naked kid. though it was more barbie like than anything else. I didn't really enjoy the view tbh, contacted them but got no answer back

show 2 replies
giantg2last Thursday at 4:40 PM

This raises an interesting point. Do you need to train models using CSAM so that the model can self-enforce restrictions on CSAM? If so, I wonder what moral/ethical questions this brings up.

show 2 replies
mflkgknrlast Thursday at 6:07 PM

Being banned for pointing out ”Emperor’s new clothes”, is what autocrats typically do, because the worst thing they know is when anyone embarrass them.

codedokodelast Thursday at 6:23 PM

Slightly unrelated, but I wonder if a 17-year old child sends her dirty photo to a 18-year old guy she likes, who goes to jail? Just curious how the law works if there is no "abuse" element.

show 5 replies
burnt-resistorlast Thursday at 6:50 PM

Technofeudalism strikes again. MAANG can ban people at any time for anything without appeal, and sometimes at the whim of any nation state. Reversal is the rare exception, not the rule, and only happens occasionally due to public pressure.

josefritzisherelast Thursday at 8:40 PM

Oh gross

bsowllast Thursday at 4:24 PM

More like "A developer accidentally uploaded child porn to his Google Drive account and Google banned him for it".

show 2 replies
UberFlylast Thursday at 5:37 PM

Posting articles that are paywalled is worthless.

show 2 replies