Hacker Read

consp · 2021-02-13 10:04:53+00:00

I know it is not good HN policy to doubt your intentions but by emphasizing the NOT's, DEIDENTIFIED and YOU, you make me highly suspicious. Anonymization of data is difficult at best and sometimes nearly impossible so I would advise to publish the entire protocol if you want to give people assurances. The encryption key, as already stated, is pointless without an explanation how you use it and why it is employed and is otherwise just smoke and mirrors and no real security.

deadf00d | karma 239 | avg karma 5.83 · | 2022-02-28 07:17:17

Thanks for the feedback ! All keys, secrets, credentials has been anonymized.

kevin_nisbet | karma 1450 | avg karma 3.67 · | 2018-04-30 18:35:20+00:00

Author Here.

Using crypto hashes to anonymize data is one of those mistakes I've seen several times, and wanted to draw some attention to the issue so that hopefully we can all learn from it.

Let me know if you have any questions.

reply

dogleash | karma 2422 | avg karma 4.8 · | 2022-05-03 11:30:45

>The data is anonymized

Anonymized... for you.

reply

nmca | karma 1431 | avg karma 2.83 · | 2023-06-09 13:14:28

I basically don't believe non-degenerate (psued)anonymization is possible, although that complicated af homomorphic encryption stuff makes me a little uncertain.

kevin_nisbet | karma 1450 | avg karma 3.67 · | 2018-04-30 19:50:54+00:00

Thanks for the feedback.

I was trying to avoid the general ideas on what is a good way to anonymize data, because I don't think there are general rules that apply, and I'm not in a position to give authoritative advice on this. The more I dug in, the more I realized this is probably one of the hardest technical problems that exists right now, and there isn't yet a right answer that works (like use scrypt for passwords).

As for GDPR, I think digging into this in more detail would be a great follow up.

reply

rmbyrro | karma 3453 | avg karma 1.97 · | 2022-05-04 16:22:32

> generated anonymously and verifiably

People are so naive about how hard it is to really anonymize data and how surprisingly easy it could be to deanonymize it that I'd never trust it without:

1) Some serious explanations and scrutiny about exactly how their deanonimization process work

2) How data leaks are removed from logs and elsewhere

3) how data is aggregated and presented and what measures are taken to prevent deanonimization

Edit: their FAQ is not nearly close to a "serious" discussion. And looking at their code is not an efficient way of learning whether it's a trustworthy approach.

reply

pavel_lishin | karma 46523 | avg karma 3.63 · | 2015-07-09 02:44:42

Would you mind posting an anonymized version for everyone? Seems like it would be instructive.

Obviously, if there's no way to anonymize it without completely changing it, don't.

reply

mrmetanoia | karma 337 | avg karma 4.16 · | 2024-05-30 14:11:16

Forgive my language but I'd expect people here to understand that's horeshit, they absolutely have enough data and patterning to de-anonymize the data. They spent time making it look anonymous.

voidmain | karma 1582 | avg karma 4.88 · | 2018-04-30 22:05:02

"Anonymization" in the sense of transforming a dataset so that it's still useful but doesn't significantly reduce the privacy of the people it describes, is usually impossible, or at least beyond the state of the art. People start out with just a few tens of bits of anonymity and bits are everywhere.

You probably have a better chance of creating your own secure block cipher than of achieving this goal. In a similar way, your inability to see what's wrong with your scheme is not evidence that it works.

I don't like to be negative, and I'm all for continued research, but at this point the conservative thing to do with data that you need to "anonymize" is delete it.

reply

MrQuincle | karma 1678 | avg karma 2.31 · | 2018-04-03 18:36:30

Tell me how someone anonymous travels from day to day and I will tell you who he or she is.

To decouple identity data from other data does already not work in theory. Moreover, it is also in conflict with the ability to retract personal data. If data is anonymized, it is not possible anymore to get rid of your personal records.

Homeomorphic encryption is the only method that might make a dent here. Laws will be broken.

reply

contravariant | karma 8428 | avg karma 2.4 · | 2020-05-08 14:13:32+00:00

If their description of the software is correct then that part is irrelevant (privacy-wise). I guess with this source-code you could at least figure out if their implementation is as anonymous as they claim it is.

atrilumen | karma 180 | avg karma 0.93 · | 2018-08-13 21:13:33+00:00

I sent you (Dylan) an email, but openly for discussion:

I'm building a product around an interactive learning system. Users train it themselves, with our assistance as needed. But I don't want to ship a puppy that will piss on your rug... and it would be great if it already had some useful skills and intuitions out of the box...

So I want to carefully anonymize that data by hand, and retain it for training models that all customers can benefit from. (I also want to train models to recognize PII, but only to assist humans in doing the task; no amount of error would be considered acceptable for this.)

I honestly don't see any problem, but I had a security consultant nearly spew beer on me when I got to that part, and insist that I drop that line of thinking.

I need more advice on this.

(The only other path I can think of is homomorphic encryption... but I would not want to retain it in the the case where the original is deleted...)

reply

willseth | karma 562 | avg karma 2.36 · | 2023-11-12 09:19:36

I designed our anonymization systems. You clearly have a misunderstanding of the threat model and the treatment methods.

sixothree | karma 3756 | avg karma 1.54 · | 2014-05-14 18:57:44+00:00

It's not anonymized, is it?

JackCh | karma 1062 | avg karma 2.68 · | 2018-05-24 14:58:48

The intent behind this tool seems good, but I don't think it's a good idea. To actually anonymize data requires semantic understanding of that data and an understanding of what sort of data, harmless by itself, is transmuted into identifying data when provided in the context of other otherwise harmless data.

This tool doesn't help you with any of that. It seems to be a glorified awk script. My concern is that helping the user with the easiest part of anonymizing data stands to encourage the user to go full steam ahead without slowing down to stop and think very carefully about what they're doing.

reply

neolog | karma 2253 | avg karma 2.1 · | 2021-02-24 20:16:51+00:00

That "anonymized technical data" link points to a github repo with no documentation. It's not clear what data you're actually sending. What do you consider personally identifying?

dd9990 | karma 173 | avg karma 3.93 · | 2016-07-10 19:24:46+00:00

Anonymization of data is not a simple task. Even with the noblest of intentions things can go wrong. See the problems with the sharing of medical data in the UK and examples from the US [1].

[1] http://arstechnica.com/tech-policy/2009/09/your-secrets-live...

reply

village-idiot | karma 2191 | avg karma 1.3 · | 2018-11-26 03:53:02+00:00

De-anonymization is something that we already have a lot of experience with, specifically tying a device to an individual. There’s nothing special about a public key that makes this harder.

ewillbefull | karma 1227 | avg karma 4.6 · | 2012-07-31 07:35:31

Instead, the Cryptosphere favors system robustness over guarantees on anonymity.

You trace back the "provenance of that file" through crytographic signatures. You could make your own throwaway identity, use it to publish something, and through the continued propagation of that data through the network its publication would no longer require your activity.

It should be considered psuedoanonymous publishing.

reply