Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

This is a straw man.

Firstly we have no idea how many of the 50 individuals data were public or not. All public data can be discounted since the state actor can just copy it.

Secondly, for the private data, the definition of ‘private’ is unspecified. It really just means not part of a published record. If propublica has access to it, then why couldn’t someone else?

I agree that if there were 150 separate sources with data not disclosed anywhere else, it would be impossible to guess.

But that’s just a made up scenario.

There could be many correct records that are public, and one or two that are private but available to (or even provided through another channel of) the state actor.

As long as the fake records are not part of the public or private data propublica already has, there would be no way to verify them.

This of course assumes that propublica’s list of records itself is kept securely.



sort by: page size:

I’m assuming good faith on ProPublica’s part, that a reasonable amount of the data was private and that it was truly private. If I didn’t trust them I wouldn’t read their reporting.

I agree--it's public data that has been removed. It's disingenuous (at best) to call it private data. I clarified my position in my original comment.

Agreed - I was trying to get a sense of the problem to-date, mostly. It's extremely important to ensure proper controls and I agree the burden of proof falls on the government to ensure they have controls in place to prevent abuses.

Perhaps I am merely surprised this creates so much outrage / surprise in this community. The folks on this website (we) have created so many platforms for sharing personal information publicly on the internet that it of course makes sense that people have been scraping it for years and it's likely stored somewhere and will be used for non-original purposes at some point because of course. As you say, how many data sets have not been abused?

Users have published this data themselves, on purpose, on web platforms meant for sharing.

I don't think this can be put back in the box. I'm not really sure how you regulate it in any effective way. There are some laws on the books and some corporate penalties but I don't see how you stop this if the model works using public data short of making the data so noisy and useless that it stops being effective (which may have to be the end result)


If it's publicly available data, then it's most likely legal, right?

Yes, but no single company could release that data.

Here a company is singled out with a very specific and easy to prove or falsify claim given the data.


Not if the suspicion is that interested parties have falsified the data.

If the data was sourced from public information that the target willingly posted online, does law enforcement even need to use parallel construction? Couldn’t they use the palantir data to obtain a warrant for more thorough searches?

I too have a problem with business models that invade our privacy. But I think it’s a bit disingenuous to conflate “private information” with information that you voluntarily posted on the internet.

Whatever happened to educating people that what they post on the internet is permanent? Dragnets over public data are a symptom of the real problem, which is lack of user education and understanding of what data they generate and share.


The vast majority of this data was posted publicly. You have no reasonable expectation of privacy.

You nailed it, we should not use public data for important things like this that could result in fraud.

You can't argue that such data is individually owned and also that it must be released publicly, because that would require consent from everyone whose data was used.

Actually, I think you missed the point of the article.

The overall point Jeremy is making is that: Propublica claims to be independent and non-partisan trying to do investigative journalism to the highest standards.

However, here they are really publishing a thinly veiled policy op-ed, and a pretty badly informed one at that. IE not investigative journalism.

They are further morally justifying the privacy invasion on some really shaky grounds as well.

Like, for example, even if one assumes that their story is somehow "investigative journalism" rather than "policy op ed", there is no obvious reason not to anonymize the data except to make it more salacious and get more clicks. They basically write that as their justification, along with some other "ends justify the means" crap, but try to make it sound better.

That is not an ethical or moral reason to do something.

You may agree or disagree, but this feels like a very valid set of points to raise, and doesn't "miss the point" at all.

It is propublica that is claiming to hold themselves to a high standard and failing in a lot of ways. So this is overall a critique/commentary on the standards folks are holding themselves to much more than "are they right".

The "are they right about tax" is almost secondary here, and that is what you (and lots of others) are focusing on.

Which is ironic, because it basically proves


You’re misunderstanding that page, you 100% cannot buy raw data and get individual records that contain personally identifiable information.

I would agree in theory, but you forget: private companies exploiting this data don't use it to engage in parallel construction and violate human rights to privacy. Law enforcement organizations can and will do this.

The same data, in different hands, causes much different scales of damage to people.


That’s a very poor excuse for mass collection of personal data. They can easily figure that out from commercial datasets, or even a small-scale research effort.

How do you find out it was private data without viewing it first? How is that misuse of data if you just view it?

This guy is asking the right thing.

It's not about you trusting somebody to handle the data. It's just that the data should not exist at all outside of you and the gov.

As soon as the data exists somewhere else, there is a way to misuse it and a chance it will happen. In todays word, those don't even require a lot of imagination to find because it already happened and is currently happening.


Digital data is easy to fake, though. Wouldn't it be more prudent to have the analysis done by a third party?

Regardless, there is legitimate value in the collection, cleaning, interlinking, and presentation of existing data. How that is interpreted by the law is one thing but merely because the data came from a variety of other public/private sources doesn't mean it derived all of its value externally.

Besides all that, he is not dealing with the collected data itself. But with analysis from non-government-employees, done on that data.

If it is the data, or those people had access to things that are not public yet, then all this discussion is moot and he is right. I'd even sue them for my 2.5 for each article.

next

Legal | privacy