So, as most of the web is HTTPS now they have DNS requests (if users haven't used a third party DNS like 1.1.1.1), and IP addresses. Maybe the SNI domain name if they are doing packet inspection.
Not really sure how useful this would be on model training?
Maybe ranking which sites it should give as answers based on popularity?
I wonder how much one can analyze with timings and sizes. If a news article on a known hostname has 5 Xitter embeds and 4 Instagram ones, the Creep in the Middle can count how many bytes are transferred in the HTTPS requests to Xitter/Instagram (there won't be 9 separate TCP connects due to connection reuse), and compare that to its own scrape of news articles of that host...
I was thinking the same. It's not like there is vast amounts of unencrypted HTTP just running around everything uses TLS nowadays.
They could find URLs to scrape maybe, but whats the point of that when certificate transparency lists exist?
This x thread may not be the best source of clarity on what is actually being default opted-into. Sorry. I looked into it and it seems that Starlink denies browsing history would be shared [0]. Seems I can't edit the title any more.
> Do you share my personal information for AI training? We are committed to protecting your privacy. In some instances, we may share personal information with trusted third-party partners who, among other activities, help us develop AI-enabled tools that improve your customer experience, although you can always opt out. Rest assured that we take reasonable safeguards to protect and secure your information whenever it is used or shared.
> Will these AI models see my Internet history? No, your internet history will never be shared with AI models, including individual browsing habits or geolocation tracking, and we comply with laws prohibiting unauthorized surveillance.
> What personal information does Starlink collect from me? We only collect what’s needed to provide you great service—like your name, address, email, and payment details when you sign up or order. We also gather some technical information (like IP address or service performance data) to keep your connection fast and reliable.
[0] https://starlink.com/support/article/b82cf54a-8e57-917a-bd06...