If an ISP is willing to sell that data, are IP addresses now PII for everyone?
If one part of a company has such a DB, does it apply to every part of the company? What if it's multiple companies owned by a conglomerate?
If you include an image (or a font!) from somewhere else in a web page, you are causing the user's IP address to be sent to the hosting party, are you liable for sending PII if the target can link IPs to names, because they (e.g. Google) have a DB?
> If you include an image (or a font!) from somewhere else in a web page, you are causing the user's IP address to be sent to the hosting party, are you liable for sending PII if the target can link IPs to names, because they (e.g. Google) have a DB?
As an end user, I want this—if you wouldn't send my IP address to these people otherwise, wanting to show me an image or a font is not a good reason to send it.
As a web developer, I am happy to have excuses to tell my teammates that we need to rehost every asset we depend upon. It's the right thing to do for so many other reasons.
I know this makes things hard for people who have webfonts that don't allow rehosting them etc. Being able to say "We can't use this font because of GDPR unless you change your policies" sounds pretty great honestly.
Hosting every dependency should be the norm, that some sites pull in multiple megs of dependencies from random third parties is just asking for trouble long term.
On the topic of proprietary fonts, why do some websites seem to think using a questionably legible, licensed font is a good idea? It isn't adding value for the end user.
What's the intended harm that restricting people from linking to uncontrolled 3rd party assets would prevent/mitigate?
Consider: You're CompanyX, and I'm DodgyFontHost.tld
My business model is exploiting and selling as much data as I can gather/mine from my traffic.
You embed (that is, reference/hotlink) some of my fonts on your pages.
If a user visits your page, and as a result makes a request to me for a font, I can log everything about that request, but I don't (afaik) have much/any additional knowledge that makes it particularly useful.
Assume there's no ?UTM=... tracking content in the url itself, you're just referencing a static font file.
I'm not sure offhand if browsers would be passing a referer header by default, or if that could somehow reliably identify the site I'm actually visiting. If so, that'd be one valuable fact.
I might be able to fingerprint the users browser from other headers or their OS from network-level quirks.
Anything else I'm missing?
I feel like 'IP $x made a request for $file' isn't the important thing to be looking at here, it's what I can learn from other things associated with the request that I can exploit.
But yes, if you had a reliable lookup from (ip,timestamp) to legal person, then it's absolutely Personally Identifiable.
Imagine if every browser set a valid, correct 'X-Requestors-Legal-Name: Bob Smith, Sometown, USA' header on every request. That's obviously identifiable. Adding a layer of indirection doesn't make it less so, although it does maybe place it on a continuum of 'cost/effort to identify based on this info'.
It ranges from 'trivial, because it's right there in the content you're sending', through 'not directly, but easily enough via subscriptions to one or more commercial data providers' to 'if someone steals our data and combines it with stolen data from several other sources, they have a non-zero chance of guessing your identity correctly'.
> I'm not sure offhand if browsers would be passing a referer header by default, or if that could somehow reliably identify the site I'm actually visiting. If so, that'd be one valuable fact.
They are passing referer unless the context is an encrypted connection and the resource is on plain HTTP.
Thanks. I did a bit of digging around after posting that and found roughly what you describe, that the Referer: is a valuable datapoint, and should probably be a bit more selective.
I suspect it's sufficiently ingrained in existing apps to make it hard to deprecate completely, but something like the path stripping might be a decent compromise.
For cross-origin requests I think there's also a mandatory 'Origin:' header that would identify at least the domain (but not path) a user request was referenced from.
I used to use a firefox addon called RefControl but IIRC it was a casualty of the quantum/webextensions transition. uMatrix has a basic referer spoofing capability, but it's all or nothing for a particular site/scope.
If you use a third-party, they would be acting as your data processor. You'd need to make sure you have a contract with them that ensures they're respecting GDPR as well.
I agree with others that you should self-host whenever possible. It will simplify these questions and you'll be able to fully protect your users' data yourself.
I have submitted this because it is frequent to see on HN claims that IP addresses are personal data under GDPR. I’m yet to see a good source for this blanket statement, and this link contains a more nuanced analysis, essentially saying that IP addresses are only personal data in some cases, where they can be used to identify a person (without involvement of the ISP).
I'm fairly certain the US legal system, when interpreting domestic cases (the purview of HIPAA) doesn't care about the GDPR. If a case crossed international boundaries, sure, but to say GDPR supersedes HIPAA is false. They apply to different jurisdictions, which are mostly, if not entirely, separate.
Agreed, the US is more lax. I was responding to the parent comment "I have submitted this because it is frequent to see on HN claims that IP addresses are personal data under GDPR"
IP addresses are not themselves PHI, but the presence of IP addresses is considered to make PHI “individually identifiable”. IP addresses must be removed when you are de-identifing PHI.
You also, by the way, must remove any geographic information more specific than a state, such as ZIP codes. So it doesn’t say much to include IP addresses in the deidentification list.
It says IP address and time are necessary for it to be personal data, but not sufficient. To be personal data you also have to have access to some mapping to map those back to actual people. For example if you have an agreement with ISPs that allows you to map IP+time to person, then the IP+time is personal data. In the absence of such agreement, it isn't necessarily personal data.
It says access to the mapping is required. Point 49.
>a dynamic IP address registered by an online media services provider [...] constitutes personal data within the meaning of that provision, in relation to that provider, where the latter has the legal means which enable it to identify the data subject with additional data which the internet service provider has about that person.
In the US, we have a common law system, so a court ruling about the meaning of something instructs regulators on the interpretation of what the law means going forward. In a civil law system, judicial interpretation doesn't control how the law is interpreted by regulators.
We are at datacenter business and we rent hardware/vps. For our case, lawyer at our company said that IP is personal data only when it is written in contract, i.e. when you lease a server and we assign you a static IP. In other cases you cannot create a 1:1 mapping between IP address and physical person. Even when that IP is assigned to a household - still that cannot be PII because multiple people may use that IP.
> ... cannot be PII because multiple people may use that IP.
I think such reasoning is a little unfortunate.
Multiple people share my name. Multiple people could live in or visit my household.
So is my name and address not PII?
I don't think that underlining all the cases where some given bit of information may fail to identify a person is the right approach when it comes to making a blanket statement about whether said info is PII. I don't think courts would follow that reasoning either, especially when there will be lots and lots of counterexamples where following that trail of information leads to facts that most people would conclude as identifying a person exactly (at least with a very high degree of certainty).
In making the submission, the submitter faked the title. The title on HN is at the moment: " Court confirms that IP addresses are personal data only in some cases" whereas there is no "only" word anywhere on the linked page.
So the title on HN is misleading. Especially given the most important part of the article:
"The CJEU decided that a dynamic IP address will be personal data in the hands of a website operator if:
- there is another party (such as an ISP) that can link the dynamic IP address to the identity of an individual; and
- the website operator has a "legal means" of obtaining access to the information held by the ISP in order to identify the individual."
And it's known that the "legal means of obtaining access to that information" is very often present.
The main takeaway (IMO) from this article is right here:
> However, businesses should note that if they have sufficient information to link an IP address to a particular individual (e.g., through login details, cookies, or any other information or technology) then that IP address is personal data, and is subject to the full protections of EU data protection law.
I would. Better to be overly cautious if you're serious about protecting user data and privacy.
IPs specifically are quite likely to reveal some identifying info, and it's obvious how trivial it is to find that info. Even the company itself isn't looking that info up, losing that info could expose their users.
It's certainly possible. There are plenty of people who have been assigned (usually small) subnets (e.g., a /28 or /29) and have their name, address, phone number, etc., publicly available via WHOIS. (For "residential customers", it was acceptable to not publish their personal details, however.)
I'm not sure about the other RIRs but ARIN, at least, has (had?) a requirement that any assignment of a /29 or larger must be reported (see "SWIP" [0]).
In other cases, a PTR RR for a single IP address could be enough to personally identify an individual.
The coverage of GDPR I've seen (and in my view, the regulation itself) has been pretty clear that data becomes covered "personal data" only to the extent that the data, in aggregate, can be used to identify a real person.
So an IP address on its own is almost never personal data, because of wifi, NAT, dynamic IPs, shared devices, etc. Then again, a name is almost never personal data on its own either, "John Smith" could refer to any one hundreds of thousands or people or it could be a pseudonym and refer to literally billions of people.
But if someone registers on your site, and you log the IP address and their name, you're a lot closer to persona data. Add a timestamp, and you probably can identify a real person.
So if you're trying to be careful about GDPR, you should probably be careful about storing IP addresses (or IP addresses that can be linked to other bits of potentially personal data). The focus of GDPR compliance can't be on "oh this field is fine, but this field is personal data", it should be on what you're collecting in aggregate. That makes IP addresses dangerous, because they provide a lot of information that could be used to identify someone.
But as the article points out, adding a time stamp only will matter if you have access to other data to map it to a real person.
So based on my reading, IPs and time stamps are not PII unless you are an ISP or you link them to other PII (so still the IP and time stamps are really irrelevant because they depend on that other PII).
You're unlikely to be storing only (IP, timestamp) data though. Presumably there's some additional info attached to those records that makes it useful for something.
A web access-log records (ts, ip, request, ...), or maybe your application log stores (ts, ip, action, params, ...)
So the information from that single source is "at time T, IP accessed RESOURCE".
It's possible that's personally identifiable in context (if you have additional controls that RESOURCE can only be accessed by exactly 1 real person, etc)
But say it's not. All you know is: Opaque PERSON accessed RESOURCE.
if you can obtain the identifying information from elsewhere (buy, steal, etc) from ISP or whatever, you now know that (T, IP) = NAMEDPERSON.
A simple lookup/matching means you know that NAMEDPERSON accessed RESOURCE. That's the new personal data.
The IP isn't irrelevant, because without it, you'd have no lookup key to determine the mapping from PERSON? to NAMEDPERSON.
Right there may be more information, but none of that is personally identifiable without additional information- information that cannot be obtained legally or easily. So the IP is irrelevant.
If an ISP is willing to sell that data, are IP addresses now PII for everyone?
If one part of a company has such a DB, does it apply to every part of the company? What if it's multiple companies owned by a conglomerate?
If you include an image (or a font!) from somewhere else in a web page, you are causing the user's IP address to be sent to the hosting party, are you liable for sending PII if the target can link IPs to names, because they (e.g. Google) have a DB?
reply