Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

The addresses from the user's address book should be hashed before sending to the server and compared to hashed addresses on the server. Then only positive matches are registered, and the server doesn't see more private information than it needs.


sort by: page size:

That's the point. You hash the number, and keep the hash. Send the first few digits of the hash to the server, it sends you a bunch of full hashes back. If your hash is in that list, then you know the server knows that person.

> URLs for example should be hashed before going out.

That doesn't help at all. If the server has a database it's going to match this hash too, then it knows what URL corresponds to the hash.


What would the alternative be - sending the stored hash from the server so the comparison can be done on the client side? That doesn't sound like a great idea...

I want to mention that such a feature CAN be implemented in a privacy preserving way using the K-anonymity model (it is currently being used by haveibeenpwned.com). In short you can sent the first 3 bytes of the hash to the server and the server would reply with all the full hashes that match at which point you just do a lookup in the response.

I think the concern may be that the total space of hashes for valid email addresses, especially ranked by domain usage, is small enough that SHA hashing doesn't meaningfully obscure content.

From the article:

Hash It!

The first instinct is often just to hash the contact information before sending it to the server. If the server has the SHA256 hash of every registered user, it can just check to see if those match any of the SHA256 hashes of contacts transmitted by a client.

Unfortunately, this doesn’t work because the “preimage space” (the set of all possible hash inputs) is small enough to easily calculate a map of all possible hash inputs to hash outputs. There are only roughly 10^10 phone numbers, and while the set of all possible email addresses is less finite, it’s still not terribly great. Inverting these hashes is basically a straightforward dictionary attack. It’s not possible to “salt” the hashes, either (they always have to match), which makes building rainbow tables possible.


Hashing is done on the server. Hashing on the client would defeat the whole purpose.

That's why the client should perform the hash and only submit the result.

Interesting. How do you compare salted hashed email addresses?

What's to prevent the server from sending you a unique hash?

Oh you mean the client sends 3 hashes and backend validates if just one matches?

You’re ignoring that it’s not the full hash, just the prefix. The prefix could potentially match millions of URLs many of which would be duplicated with random pages from other URLs. You would need a very, very extensive model of every single IP that was mapped to each person and that would not be trivial. That’s exactly why the actual matching is done on the client and not somewhere else.

Such hashing does not seem straight-forward to me. If you make it very exact, it's easy to get around. Use another email address, use a prepaid phone card, insert a middle initial, spell the street address slightly different and the hash will not match.

The more fuzzy you make it ("normalizing" the data before calcukating the hash), the more likely it will cause a false positive for legitimate users.

It's probably only a matter of time until we read articles on HN where such an algorithm got it wrong.


False positives would be sent to the device for local comparison against full hashes; it wouldn't "annoy people" because it would not trigger a match unless the full hash matched on device.

There is indeed a way to truncate enough to balance the amount of data sent to the device vs privacy.


They do hash IPs before sending them.

Would this not have been solved if the email addresses were stored as hashes? Besides this it would be an extra layer of security in the even of a breach. Why addresses aren't stored as hashes seems silly. Especially when stored in true databases that can be queried quickly.

No, the client doesn’t have access to the CSAM hashes. And matches are verified on the server, not on the client.

I forgot to mention, verifiable identifiers can optionally be salted before hashing but this introduces a dependency on the creator of the domain verification record to share the salt.

Of course if those hashes are also served via plaintext, then comparing them also doesn't matter, and using them as verification is akin to praying to not be compromised

Yes, though if you do both client-side and server-side hashing (which might be a good idea [0]), then the server only receives the fixed-size client hash.

[0] https://security.stackexchange.com/a/100517

next

Legal | privacy