ISPs propagate flow-based snapshots of attacks to populate filters and redirect traffic to scrubbing centers, but they do so discreetly in part because of concerns about how well their data --- which is used exclusively to generate filters --- has been anonymized.
ISPs are also in the business of analytics [1, 2], and a significant percentage of customers hiding their traffic reduces the value of their analytic products.
So long as you have deep control of inbound and outbound peers you can deanonymize traffic by throttling one end and finding streams that are affected. The service providers selling faux security don't have to be involved.
If you route traffic and DNS to a DO droplet outside the UK, all the ISP sees is a connection to that droplet and how much data was sent, not anything about what it was that was sent.
It neatly circumvents this bullshit, some suspect doing so will put you on a list for a closer look but if your traffic is innocuous who cares, I'm more worried about my useless ISP leaking/losing such data than I am about state intelligence.
Just the existence of these databases held by ISP's built under lowest cost bids will make them a massive target.
And yet, confirmation attacks and traffic shaping by global adversaries are where the actual design flaws lie, inherent to any 'low-latency' (read: performance trumps privacy) anonymizing network, and you know that. Even efforts such as the Invisible Internet Project (I2P) which pile on tactic after tactic to improve upon this, but are too scared to delve in outright mixing traffic, are vulnerable to traffic confirmation attacks.
If you're not actively targeted, it's much less likely they're logging all of your traffic (or even new TCP connections or UDP 'connections') as it's expensive to do that for every customer in a non-sampled manner (like with Netflow).
It's easy for anyone who can do traffic analysis on your traffic, eg your ISP and mass surveillance perpetrators.
And whoever your ISP decides to sell or give this data to.
Even if we accept that scrubbing of personal data is possible, which is far from certain, that theoretically non-malicious traffic still provides camouflage for malicious traffic. If we insist on opt-in, then we can apply a very simple and fail-safe heuristic: any traffic the user didn't explicitly request is malicious. There's no need for slow and error-prone analysis.
I'm well aware of these things and none of them has anything to do with ISPs proactively snitching on individual users for the purpose of reporting suspicious activity to law enforcement, which was what my parent and my comment was all about.
(Though I agree that Deep Packet Inspection and Ad Injection may technically be considered a form of "looking at users traffic", albeit an automated one.)
Whether netflow is heavily sampled or not depends on where and how it is generated and for what purposes. It's just a format.
> You started out saying that running your own local recurser is privacy-enhancing (it's the opposite) and that ISPs can see your DNS queries when you run DoH (they can't) and retreated to a position of "other things leak".
Never said any of that, but your remarks are still wrong. What I said is basically that at ISP level there is much more information about what you are doing than DNS queries, including information that allows to identify the websites you are visited. Again, just IP addresses alone identify more than 90% of websites (can't remember the exact number from the research, it's like 95% from alexa top 1 million).
Now with regards to privacy. First it's important to understand what privacy is actually about. Within your traffic there is a lot of meta and side channel-like information, some of the sources of that information overlap a lot or are even identical, a lot of redundant information. Like we can look at a packet with a DNS query and see domain name identifying a website. Or we can look at IP address of a packet initiating new connection, it won't identify every website uniquely, but it will most of them, to get to 95% we would need to also look at IP addresses of new connections following that connection, i.e. sub resources, making sort of an IP set fingerprint, either way single IP addresses or a fingerprint can be mapped to unique domains or a set of potential domains. To complete this example there is also SNI in plain text in packets and domain names in Host: headers, which also uniquely identify websites. So there is at least 3 sources of the same or almost the same information. What kind of privacy level is that? Well, we have a single party with access to the same relatively easily extractable information from multiple redundant sources, it's a lot to trust with. Would it change anything if remove any of the sources? Not really, it would have very insignificant effect on privacy since other sources are still available. And if we have to send that information to another party, now that would make things much worse, because now we would have to trust two parties with the same information, rather than one. Ok, so, how can we actually improve it? We can try to remove all three sources of information and send all of them through another party. This changes the party we have to trust, but we still have to trust it with just as much. What if we can't trust any party that much? Then the next choice is to shard (partition) that information to as many parties as we can. Trusting even just with half of that information to one party and half to another is already much better and is where we finally start scratching the surface of privacy. I hope it's clear that DoH is not helping with privacy, it just steals one of the sources of this information.
As for running local recursive DNS server instead of the one provided by an ISP, it isn't leaking significantly more information to anyone. Packets likely go through the same route and there is a bit more of them for queries that didn't hit the cache for few popular domains. IP address is also exposed in such DNS packets, but ISP's DNS server would be coming from an IP address that belongs to the same AS and this leaks almost the same amount of identifying information.
Honestly, if you're browsing through your ISP or carrier's data service, you're not anonymous and people are watching. That point of view is the only way to operate.
1. You're a web app company that routinely comes under DDOS attacks. A private security company would like you to subscribe and push Netflow traces to them during attacks, so they can derive minimal ACLs and relay them to ISPs so the attack could be filtered upstream. That Netflow data could in theory contain some of your traffic, and further could be statistically de-anonymized to reveal your personal usage of that service. Currently, you need authorization from your counsel every time you share that Netflow data; post-CISPA, you could get a one-time authorization and automatically push Netflow to the anti-DDOS provider.
2. You're working with several social networking companies to track down a browser malware attack that transmit itself through online messages. You'd like to collect a large volume of samples programmatically to share with other providers and LEOs, but you're extremely concerned that in doing so you might reveal details about private messages, perhaps even by sharing which users have messaging relationships with other users. Pre-CISPA, your counsel might refuse that sharing outright; post-CISPA, you'd have statutory protections for sharing that operational data in the course of responding to malware.
American ISPs injected tracking codes into their user’s HTTP traffic so they could get paid by advertisers. I would not speak in absolutes about that, especially because anonymizing data is a hard problem which even we’ll-intended people have made mistakes with.
reply