Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I wonder how resistant those are to distortion. The examples on the Insight page are pretty impressive, but fairly naive approaches. I feel like, if you knew the algorithm, you could probably manipulate your image data in a way to make it appear almost the same but come up with a very different hash. Similar to adversial images in computer vision.

https://spectrum.ieee.org/cars-that-think/transportation/sen...



sort by: page size:

Perceptual hashing is certainly resistant to some distortions, primarily affine transforms.

Unfortunately there's not any easy or sufficiently strong defense against someone gradient descent-ing your algorithm.


furthermore they are perceptual hashes. its not like you can just defeat it by changing a pixel in all your images

It's entirely possible to alter an image such that its raw form looks different from its scaled form [0]. A government or just well resourced group can take a legitimate CSAM image and modify it such that when scaled for use in the perceptual algorithm(s) it changes to be some politically sensitive image. Upon review it'll look like CSAM so off it goes to reporting agencies.

Because the perceptual hash algorithms are presented as black boxes the image they perceive isn't audited or reviewed. There's zero recognition of this weakness by Apple or NCMEC (and their equivalents). For the system to even begin to be trustworthy all content would need to be reviewed raw and scaled-as-fed-into-the-algorithm.

[0] https://bdtechtalks.com/2020/08/03/machine-learning-adversar...


Correlations in the math domain get weird. It wouldn't surprise me in the slightest that those slight distortions you see in the "memorized" image versus the original ones, while small to humans, turn out to be staggeringly large to your perceptual hash. See the things like the little stickers you can stick to stop signs to make some neural nets decide they are people and such.

And it could go the other way too; it could be that the perceptual hashes are even "better" than humans at seeing past those distortions.

My point is that this all gets more complicated than you may think when you're trying to apply a hash designed for real images against the output of an algorithm.

And even if the hash worked perfectly, the false positive and false negative landscape would still be very likely to contain very surprising things.


It's very likely that the image can be reconstructed from perceptual hashes. Perceptual hashes make two promises, too:

* that the original image can't be inferred from the hash, and * that similar images should get similar (if not the same) hashes

and these are in serious conflict, with what's happened with gradient-based methods the last 10 years.


How easy is it to generate an image that has the same “perceptual hash” or whatever that are calling it? My guess is it has to be easier than cracking a non fuzzy hash? Do we know the algorithm they are using?

Why do you think that? There are plenty of whitepapers on fooling NNs by changing random pixels by a bit, so that the picture is not meaningfully changed for a person, but the computer will label it very differently. Do note that these are not cryptographic hashes because they have to recognize the picture even when compressed differently, cropped a bit, etc.

I think you're overestimating the capabilities of neural networks here, and especially ones that we know the exact weights for. It is fairly trivial to generate invisible noise that makes an input image get an entirely different hash.

A perceptual hash is immune against the sort of manipulation you mention. One method I’ve implemented myself is recording changes in brightness along a path in the image. Inverting colors would work, but also make the image somewhat worthless. Flipping would also work, but is usually protected against by adding the flipped hash as well.

This looks to cool I didn't know anything about perceptual hashing but the idea makes a lot of sense. I'm curious if the system lose its effectiveness if a user shifts an image a few pixels, reflects it, or apply a certain filter that makes all pixels have a slightly different tonality. I'm also thinking of some system maybe a small ML program that applies a minimal amount of noise to the image such that to the human eye it looks the same but pixel-wise is totally different.

Yes, that's what I was alluding to. It's a perceptual hash, so you could probably take a flagged image and superficially alter it (contrast/color/crop/whatever) until it wasn't anything visually objectionable on it's own. But it would still match the perceptual hash.

its not always true (hash functions are in some sense designed by brilliant people to be as hard to learn as possible) but it’s true often enough to be surprising.

if there was some score that would look at lighting etc, you probably could just tack it on to one of the loss functions somewhere and expect to see improvement.

maybe not enough to beat the system; or perhaps it would increase the lighting score but make the image obviously unrealistic in other ways we haven’t thought of.


Perceptual hashes are reasonably robust to small perturbations. Scaling and converting shouldn’t be a problem, though applying filters that change the colors or blur the lines might.

If it’s been changed enough that it hashes to a different value, then it might be reasonable to treat it as a different image. At some point a human is also going to say “that’s not the same.” You can always change your hashing algorithm if you find it’s missing too many dupes.

Regardless, for the domain we’re talking about (deduping training data), a few false negatives should be acceptable.


It would be interesting to see a couple of those hashes and try to generate perfectly legitimate images with the same hashes.

Or even without calling it 'AI', perceptual hashing like you typically see in applications like CSAM is pretty damn close to ML techniques. The normal thought process is "we don't want child predators to sneak through just by cropping, or sticking a water mark on the image, or some other way slightly modifying the image like the color balance. Can we come up with something that'll hash the same even with minor modifications to the image?". And you basically end up with intentionally overfit AI.

I expect that the engineers developing this would go for a hash based on an abstraction of the image represented, not the actual bits of the image file.

Although I'm sure it's possible to fool their image hashing algorithm, I doubt this will. Image hashing algorithms are designed to be resistant to small changes in the image and more advanced ones can generate hashes that determine how similar one image is to another. I haven't tried this, but you can probably see a proof of this using google image search. Add some text to an image, and see if Google image search can find the original.

It's supposed to use a hash of the image, so they technically need the exact same image. Doesn't seem like they're using machine learning.

Perceptual hashes are not exact hashes, otherwise they would be useless for this task; you would just mirror the image or change 1 pixel.

They are instead fuzzy classifiers, and thus have non-zero error rates.

next

Legal | privacy