So if you send a picture to a Signal user, it's retrieved via cloudflare, and cached in a data center near that user; now you can look up the cache status and find the data center used. I'd say "deanonymization" is stretching it, unless the user is in the middle of nowhere (no other users near the data center). But interesting writeup anyway.
It gets more interesting when you think about the impact on groups. Sending an image to a group is enough for all devices associated with that group to be identifiable from CloudFlare's side, who additionally see a giant chunk of unencrypted traffic from the same client addresses going to other web sites. Given Cloudflare's less-than-straight approach to sales, it is astonishing the words "secure" and "Signal" ever appear in the same sentence.
CloudFlare get to see a fuckton of metadata from private and group chats, enough to trace who originally sends a piece of media (identifiable from its file size), who reads it, when it is is read, who forwards it and to whom. It really doesn't matter that they can't see an image or video, knowing its size upfront or later (for example in response to a law enforcement request) is enough
It could be useful for correlation.
Say for example that you're an investigating agent in regular contact with someone.
A single data-point wouldn't mean anything. However, a sequence of daily image retrievals might tell you that they spend 90% of their time in WA and 10% of their time elsewhere.
That information alone still might not mean anything, but if you also have a specific suspect in mind, it may help confirm it. Or if you have access to the suspected person directly, if you're able to also befriend their "clean" profile, you might be able to pull the same trick and correlate the two location profiles.
De-anonymisation isn't about single pieces of information, but all information helps feed into a profile to narrow suspects or confirm suspicions.
( By "agent" I just mean a person, not an AI agent nor Law enforcement, who could presumably just get the information more directly from cloudflare. )
It's not stretching it. The expectation is that Signal does not reveal any observable aspect of your IP address or location when receiving messages on it.
Whether this specific level/type of deanonymization is a problem for your particular use case is an entirely different question. Personally, I wouldn't even care if mutual contacts were to see my IP address outright (and they do for calls), but I'm not every user.
"Deanonymization" doesn't have to refer to a full exact address. There are people who wish to conceal which country or region they live in, which this cripples.
There was a real example of that amount of information being relevant in the Silk Road investigation. Ulbricht accidentally revealed his timezone early on, which was useful to US authorities since it narrowed him down to being in the US, whereas without that information he could have been from anywhere in the world.
When I was ~15 and this was ~2004, some friends and I ran a forum with a lot of users and did some bad things where we would track down repeat banned users and screw with them. (In our defense, they were screwing with us.)
We used everything, from browser fingerprinting (and EFF only made the world aware of it 6 years later), looking them up in databases, tracing every digital evidence they left, etc.
Every little thing counted. What I learned is that people leave a lot of traces and you can collect these traces to dox them. The way you write is even sometimes fairly identifiable.
If I know someone on Signal I can now check if they’ve left the country.
Or send this to a bunch of signal users whom you suspect one of them being a particular person, and if you know that the person you are looking for is going to travel you can send it once before and once after. Then see which of these users were in the home city and subsequently in the destination city.
The real attack is that a law enforcement agency can trivially subpoena CloudFlare with the attachment URL they will hand over the IP address of the recipient of the image along with whatever other requests they made through the CDN which can pretty precisely and rapidly de-anonymize you.
Indeed, "incredibly precise estimate of the user's location" feels like an exaggeration. But still, very interesting!
I'd say it'd be useful for very specific use cases. Such as finding out what country Jia Tan, the XZ Utils backdoor attacker, is in.
I wonder if it'd be a good idea for Signal to implement a "simple" mode that would deactivate most features in order to reduce the attack surface for people who really think they are being targeted. Would that be a good idea ?
Caching attachments at a single nice, big, juicy honeypot like CloudFlare is one of the reasons Signal's privacy guarantees don't feel totally solid to me. I get that it's pragmatic, but feel there must be a better way.
Does the caching occur even if both users are online when the attachment is sent?
Even time zone leaks are privacy issues, and the leak we're discussing is more fine grained than time zone.
It only takes 33 bits to identify someone. This reveals a couple of bits.
Combined with other information, it may identify someone reliably, just like you can with zip code, age and gender. For example, if you know this person is part of a group with members in several locations, or if you can corroborate someone's movements, etc.
For example, imagine someone suspected of sharing sensitive information with a journalist. They might have a short list of suspects, and use this technique to confirm which one it is. They might identify which journalist it is - maybe only a limited number cover this beat.
It's leaking so many bits idk what else you would call it, deanonymization isn't a one shot thing and it's a spectrum not a binary outcome
CloudFlare has the actual IP address that viewed the image. Which means some powerful (or rich enough) actors can get it.
This is very very bad.
> cached in a data center near that user
Not necessarily. Cloudflare is very upfront that they do not cache everything, and the time things are cached can vary greatly.
The kid keeps talking about "deanonymization" and he has no idea what the term actually means.
> attacker can use the cache geolocation method to pinpoint the recipient’s location
Agree, good writeup, but also a stretch to say they are "pinpointing" anyone's location.
Send picture to multiple accounts, perhaps on different services, the links that are cached at the same data center can be more confidently believed to be related.
This is not unique to signal. URL strings can contain identifying information regardless of where they are shared or posted. For example, if you send a link that ends with string of characters, these may correspond to a geographic location or browser settings. Blogger urls used to be geolocated, such as .ca for Canadian viewers. it is always safe to strip out unnecessary chacters if you're paranoid.
WhatsApp has an option to disable link previews.
Surprised signal doesn't have this option.
I only message people I know on Signal anyway.
Edit: it seems signal does have the option
Why would cloudflare ever operate a data center that only one user at a time is ever near?
Looks like it's possible to hit 2 datacenters due to load-balancing, which would narrow it down a bit more. Suppose you do this repeatedly as the target is moving around, hitting even more datacenters.
You underestimate the value of this piece of information taken at different times. It can be enough to know in which country a person was yesterday or is today.
Why does it need to be cached though?
The only case where it might be downloaded more than once is if the user has multiple clients. Not that common and still very little traffic.
For that reason that's why federated setup such as matrix are better. It is much harder to deanonymiza a set of users on different servers in group chat.
Did you see the GIF? It's able to triangulate.
Mmmm "qualified deanonymization" perhaps?
Imagine sending a friend request to bin Laden's videographer and getting a reply from Pakistan while your entire military is looking for him in Afghanistan?
There's definitely cases where this is going to be immediately used. Shit, just using it to scrape Cloudflare for additional metadata on everyone from other user table leaks is probably valuable data. Even triangulation over time as they move around is going to get a more precise result. Maybe you find a vulnerability that takes that cloudflare node offline and run it again, repeat until you've got a fairly small radius they could be in.
Headline feels like a click bait :)
timing and location can usually prune things down to enough data about a person.
> (no other users near the data center).
Yeah and in that case there won't be a data center because who puts one in places without clients nearby? :)
[dead]
"Near a user" is also a big assumption. I'm ~200 miles to ORD and ~500 to IAD, but my ISP's peering & upstream arrangements mean Cloudflare serves my traffic 700 miles from DFW.
But, at the same time: Cloudflare isn't going to serve me a cache from Seattle, Manchester, or Tokyo. Pinning down an unknown Signal user to even a rough geographic location is an important bit of metadata that could combine to unmask an individual. Neat attack!