There are plenty of VPN and proxy detection services, either as a service (API) or downloadable database, which are surprisingly comprehensive. Disclaimer: I’ve run one since 2017. Years on, our primary data source is literally holding dozens of subscriptions to every commercial provider we can find, and enumerating the exit node IP addresses they use.
There are also other methods, like using zmap/zgrab to probe for servers that respond to VPN software handshakes, which can in theory be run against the entire IP space. (this also highlights non-commercial VPNs which are not generally the target of our detection, so we use this sparingly)
It will never cover every VPN or proxy in existence, but it gets pretty close.
Tangent: if you hold access to all VPN providers, have you thought about also releasing benchmarks for them? I would be interested in knowing which ones offer the best bandwidth / peering (ping).
Interesting. I assumed all VPNs switched to IPv6 by now, making detection much harder.
> which are surprisingly comprehensive
How does the buyer even know what the precision and recall rates might be?
just out of curiosity: if i'm located in spain and i setup an ec2 or digital ocean instance in germany and use it as a socks proxy over ssh, do you will detect me?
This will also cause problems with anyone that happens to (even accidentally/unknowingly) use apps that integrate services from companies such as BrightData/Luminati/HolaVPN/etc. where they sell idle time on your device/connection to their VPN/proxy customers.
The legitimate end-user will then no longer be able to use e.g. SoundCloud.
> Years on, our primary data source is literally holding dozens of subscriptions to every commercial provider we can find, and enumerating the exit node IP addresses they use.
Assuming your VPN identification service operates commercially, I trust that you are in full compliance with all contractual agreements and Terms of Service for the services you utilize. Many of these agreements specifically prohibit commercial use, which could encompass the harvesting of exit node IP addresses and the subsequent sale of such information.