s/exacty/exactly "I minted a new TLS cert and it seems that OpenAI is scraping CT l...

1vuio0pswjnm7 • yesterday at 9:50 PM • 0 replies • view on HN

s/exacty/exactly

"I minted a new TLS cert and it seems that OpenAI is scraping CT logs for what I assume are things to scrape from, based on the near instant response from this:"

The reason presented by the blog post is "for what I assume are things to scrape from"

Putting aside the "assume" part (see below^1), is this also the reason that the other "systems" are "scraping" CT logs

After OpenAI "scrapes" then what does OpenAI do with the data (readers can guess)

But what about all the other "systems", i.e., parties that may use CT logs. If the logs are public then that's potentially a lot of different parties

Imagine in an age before the internet, telephone subscriber X sets up a new telephone line, the number is listed in a local telephone directory ("the phone book") and X immediately receives a phone call from telephone subscriber Z^2

X then writes an op-ed that suggests Z is using the phone book "for who to call"

This is only interesting if X explains why Z was calling or if the reader can guess why Z was calling

Anyone can use the phone book, anyone can use ICANN DNS, anyone can use CT logs, etc.

Why does someone use these public resources. Online commenter: "To look up names and numbers"

Correct. But that alone is not very interesting. Why are they looking up the names and numbers

We can make assumptions about why someone is using a public resource, i.e., what they will use the data for. But that's all they are: assumptions

With the telephone, X could ask "Why are you calling?"

With the internet, that's not possible.^3 This leads to speculation and assumptions. Online commenters love to speculate, and often to make conclusions without evidence

No one knows _everything_ that OpenAI does with the data it collects except OpenAi employees. The public only knows about what OpenAi chooses to share

Similarly no one knows what OpenAI will do with the data in the future

One could speculate that it's naive to think that, in the longterm, data collected by "AI" companies will only be used for "AI"

2. The telephone service also had the notion of "unlisted numbers", but that's another tangent for discussion

3. Hence for example people who do port scans of the IPv4 address space will try to prevent the public from accessing them by restricting access to "researchers", etc. Getting access always involves contacting the people with the scans and explaining what the requester will do with the data. In other words, removing speculation

alt Hacker News