Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Ok yeah, I do agree this scaling attack potentially makes this feasible, if it essentially allows you to present a completely different image to the reviewer as to the user. Has anyone done this yet? i.e. an image that NeuralHashes to a target hash, and also scale-attacks to a target image, but looks completely different.

(Perhaps I misunderstood your original post, but this seems to be a completely different scenario to the one you originally described with reference to the three thumbnails)



sort by: page size:

my aim was to point out that the above reverenced "image scaling attack" is easily protected against, because it is fragile to alternate scaling methods -- it breaks if you don't use the exact scaling algorithm the attacker planned for. Since defeating the image scaling attack is trivial, it means that, if it is addressed, the thumbnail will always resemble the full image.

With that out of the way, that, obviously, just forecloses this one particular attack, specifically, where you want the thumbnail to appear dramatically different than the full image in order to fool the user that it's an innocent image and the reviewer that it's an illegal image. It's still, never-the-less, possible to have a confusing thumbnail -- perhaps an adult porn image engineered to have a CSAM hash collision will be enough to convince a beleaguered or overeager reviewer to pull the trigger. The "Image Scaling Attack" is both not sufficient and not necessary.


The human reviewer would be able to check against the exact image that generated the hash in the first place. Taking another completely unrelated image and perturbing it would be immediately obvious.

> The only downside is that trivial image modification will make the hash not match, but that's also true for the perceptual hashes... just with a slightly more complicated definition of trivial

Well, the difference between the two version of "trivial" is the difference between "incidental" and "purposefully".

Re-compressing, resizing or cropping the image changes the SHA256, but (with high probability) not the NeuralHash. Those operations happen all the time in the normal course of image distribution. Unlike "running through a tool designed to alter the NeuralHash value".

In other words, your suggestion trades false positive rate for false negative rate, under the argument that any nonzero false negative rate is effectively the same. This argument is not convincing, particularly when your suggestion would reduce the true positive rate from (a hypothetical) 99% to a much lower value.


It's entirely possible to alter an image such that its raw form looks different from its scaled form [0]. A government or just well resourced group can take a legitimate CSAM image and modify it such that when scaled for use in the perceptual algorithm(s) it changes to be some politically sensitive image. Upon review it'll look like CSAM so off it goes to reporting agencies.

Because the perceptual hash algorithms are presented as black boxes the image they perceive isn't audited or reviewed. There's zero recognition of this weakness by Apple or NCMEC (and their equivalents). For the system to even begin to be trustworthy all content would need to be reviewed raw and scaled-as-fed-into-the-algorithm.

[0] https://bdtechtalks.com/2020/08/03/machine-learning-adversar...


Yes, I can. This is just one possible strategy: there are many others, where different things are done, and where things are done in a different order.

You use the collider [1] and one of the many scaling attacks ([2] [3] [4], just the ones linked in this thread) to create an image that matches the hash of a reasonably fresh CSAM image currently circulating on the Internet, and resizes to some legal sexual or violent image. Note that knowing such a hash and having such an image are both perfectly legal. Moreover, since the resizing (the creation of the visual derivative) is done on the client, you can tailor your scaling attack to the specific resampling algorithm.

Eventually, someone will make a CyberTipline report about the actual CSAM image whose hash you used, and the image (being a genuine CSAM image) will make its way into the NCMEC hash database. You will even be able to tell precisely when this happens, since you have the client-side half of the PST database, and you can execute the NeuralHash algorithm.

You can start circulating the meme before or after this step. Repeat until you have circulated enough photos to make sure that many people in the targeted group have exceeded the threshold.

Note that the memes will trigger automated CSAM matches, and pass the Apple employee's visual inspection: due to the safety voucher system, Apple will not inspect the full-size images at all, and they will have no way of telling that the NeuralHash is a false positive.

[1] https://github.com/anishathalye/neural-hash-collider

[2] https://embracethered.com/blog/posts/2020/husky-ai-image-res...

[3] https://bdtechtalks.com/2020/08/03/machine-learning-adversar...

[4] https://graphicdesign.stackexchange.com/questions/106260/ima...


> I don't think you could just have a completely different picture create a collision though.

Allow me to introduce you to my posts on github: https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...

Where I post good looking examples of standard test images altered fairly subtly to give the specific hashes.

The apple neuralhash is broken as a 'hash function'.

It's much much easier to modify images to just have a different hash. A simple blemish on the image-- or, with whiteboxing using the hash function, no visually noticeable change is required at all.


Not resembles. The adversarial image has to match a private perceptual hash function of the same CSAM image that the NeuralHash function matched before a human reviewer ever looks at it.

This attack doesn’t work. If the resized image doesn’t match the CSAM image your NeuralHash mimicked, then when Apple runs it’s private perceptual hash, the hash value won’t match the expected value and it will be ignored without any human looking at it.

This is pretty much the opposite from what the article is talking about -- the article is trying to get a hash from an image in order to compare that image to another, while you're talking about synthesizing an image from an arbitrary hash...

The article and kristjansson's post are about intentional false negatives.

My comment was pointing out that the well known fact that the neuralhash is highly vulnerable to preimage attacks (intentional false positives) also means that is is highly vulnerable to intentional false negatives. Changing the 96-bit hash to a specific value while minimizing the visible image change is a LOT harder than just changing any single bit, but the harder task has been amply demonstrated.

Neuralhash is one component, yes, but it is a limiting component. If a slightly tweaked image has a different NeuralHash it will never be detected by their scheme.


> Yes, although I'm sure a sufficiently motivated attacker can obtain some CSAM that they are reasonably sure is present in the database, and generate the NeuralHash themselves

Remind us what the attack is here? The neural hash and the visual derivative both have to match for an image to trigger detection.


Ah, OK, I think I misunderstood the article. If you are supplying both images to me, you could do that with the MD5 hashes. Although, I think if you could get them to generate the same bitmap, then the attack has been at least partially mitigated, by definition. Not completely, I admit, but I think it wouldn't qualify as the same attack shown in the article.

It's a cool idea, but isn't this pretty easily defeated by an adversarial attack, or just altering the image in any simple way?

E.g. as the perpetrator, open the image of your ex in MS Paint, add a stripe across the bottom in such a manner as to not obscure the primary content at all, ta-dah, upload, image will not have a matching hash.


You are glossing over how an adversary can generate an image that meets the following requirements:

  a) hashes to the same value as known csam image A with the public NeuralHash algorithm 

  b) has a derivative (e.g. lower res thumbnail) that when processed with a _private_ perceptual hash algorithm also matches known csam image A.
What is your proposal for solving b. For a, it’s possible to iteratively generate NeuralHash’s that get close and closer to the value you are attempting to equal, while that isn’t possible for step b.

> And now they will evolve by developing a simple system to modify pixels in images when they copy and transmit that will easily defeat this hashing system

They don't use simple file hashes to match images, but perceptual hashes. That way they can find modified derivatives of a source image. The problem with this approach, though, is that this is ripe for false positives. Two completely unrelated images can have similar hashes.


Yes, I was hoping they would not simply `sha1()` the image data. I wouldn't worry much about trying to fake hashes since we are already allowing them to provide whatever image as input they want (hence the review).

furthermore they are perceptual hashes. its not like you can just defeat it by changing a pixel in all your images

>I would add that people have generated legal images that match the hashes.

That seems like a realistic attack. Since the hash list is public (has to be for client side scanning), you could likely set your computer to grind out a matching image hash but of some meme which you then distribute.


It can’t be. There’s a different private hash function that also has to match that particular csam image’s hash value before a human sees it. An adversarial attack can’t produce that one since the expected value isn’t known.
next

Legal | privacy