I think the state of the art has progressed past that? A non-trivial reversal of a perceptual hash would mean that every cloud provider maintaining a CSAM scan list violates CSAM laws - if they could reverse the hash to get too close to the original image, the hash is just a lossy storage format.
They are perceptual hashes of CSAM images no? Sure, just because they are perceptual hashes, doesn't mean they can be reversed to an image, but it seems to be true PhotoDNA, a similar perceptual hashing algorithm, can be reversed to an image. For this reason, I believe it is possible Apple's perceptual hashes can be reversed as well, though, maybe they did it in a different way and it's not possible
No, these hashes can’t be reversed to an image. They’re not CSAM and therefore not illegal. That blog is not very good, either from a tech standpoint or a legal one.
It's entirely possible to alter an image such that its raw form looks different from its scaled form [0]. A government or just well resourced group can take a legitimate CSAM image and modify it such that when scaled for use in the perceptual algorithm(s) it changes to be some politically sensitive image. Upon review it'll look like CSAM so off it goes to reporting agencies.
Because the perceptual hash algorithms are presented as black boxes the image they perceive isn't audited or reviewed. There's zero recognition of this weakness by Apple or NCMEC (and their equivalents). For the system to even begin to be trustworthy all content would need to be reviewed raw and scaled-as-fed-into-the-algorithm.
The visual derivative of the image only gets sent if the image matches both hashes (including a multi-stage blind hash). It's almost certain that an image that makes it to that point would be CSAM.
I agree with your point as a whole but the Apple CSAM wouldn't really have this issue as it compares the images hashes against hashes of a specific list of CSAM.
The cryptography is most likely done at a higher level than the perception comparison and is quite likely done to protect the CSAM hashes than your privacy.
My interpretation of this is that they still use some sort of a perception based matching algorithm they just encrypt the hashes and then use some “zero knowledge proof” when comparing the locally generated hashes against the list, the result of which would be just that X hashes marched but not which X.
This way there would be no way to reverse engineer the CSAM hash list or bypass the process by altering key regions of the image.
The perceptual hashes are specifically designed to correlate images that are visually similar while not being exactly alike, even when the colors are altered. See:
> Given that changing one pixel of an image changes its hash
They don't use SHA/MD5 style hashes for CSAM image matching; they use "perceptual hashes"[1] which are resilient to changes (specifically, I believe, PhotoDNA[2] from Microsoft.)
No, i'm sure people would still make a fuss. Perceptual hashes are required to prevent criminals from slightly changing pixels within CSAM photos to avoid detection.
You’re correct that it’s not a pixel-by-pixel hash, but it’s still a hash of that specific image. It’s not analyzing the image subject and trying to identify it as CSAM.
Does this algorithm work for the reverse goal (i.e. can content that would trip the CSAM hash be perturbed enough to avoid it without compromising quality of the underlying image)?
To my mind, that's far more disquieting than the risk of someone staging and elaborate attack on an enemy's device.
I worked with perceptual hashes. The false positive rate makes it unusable. Even when combined with AI it does not work well. It can get you a list of possible matches, but only a human can really tell if the image is the image. Then you have a problem of images not making to the list. That was in 2016. Maybe things changed.
The abandoned plan was perceptual hashing, which should return the same hash for very similar photos, while the new one is a checksum, which should return the same hash only for identical photos. I don’t think that invalidates the point, but it does seem relevant. It certainly makes it much less useful for CSAM scanning or enforcing local dictator whims, since it’s now trivial to defeat if you actually try to.
The context there was that a typical CSAM hash like PhotoDNA is NOT a cryptographic hash, but a perceptual one, which DOES have notable false positive issues.
You can’t do a byte-for-byte hash on images because a slight resize or minor edit will dramatically change the hash, without really modifying the image in a meaningful way.
But image hashes are “perceptual” in the sense that the hash changes proportionally with the image. This is how reverse image searching works, and why it works so well.
It didn’t seem to take long for the weights for Apple’s network to be discovered. And I suppose they must send the banned hashes to the client for checking too. So I expect that list will be discovered and published soon too (unless they have some way to keep them secret?) I think one important question is: how reversible is Apple’s perceptual hash?
For example, my understanding of Microsoft’s PhotoDNA is that their perceptual hash has been reverse-engineered and that one could go backwards from a hash to a blurry image. But also it is very hard to get the list of PhotoDNA hashes for the NCMEC database. In other words, are Apple unintentionally releasing enough information to reconstruct a bunch of blurry CSAM?
reply