Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It's entirely possible to alter an image such that its raw form looks different from its scaled form [0]. A government or just well resourced group can take a legitimate CSAM image and modify it such that when scaled for use in the perceptual algorithm(s) it changes to be some politically sensitive image. Upon review it'll look like CSAM so off it goes to reporting agencies.

Because the perceptual hash algorithms are presented as black boxes the image they perceive isn't audited or reviewed. There's zero recognition of this weakness by Apple or NCMEC (and their equivalents). For the system to even begin to be trustworthy all content would need to be reviewed raw and scaled-as-fed-into-the-algorithm.

[0] https://bdtechtalks.com/2020/08/03/machine-learning-adversar...



sort by: page size:

Please quote what you think supports your claim.

"Indeed, Neural-Hash knows nothing at all about CSAM images. It is an algorithm designed to answer whether one image is really the same image as another, even if some image-altering transformations have been applied (like transcoding, resizing, and cropping)."[1]

[1] https://www.apple.com/child-safety/pdf/Security_Threat_Model...


Or even without calling it 'AI', perceptual hashing like you typically see in applications like CSAM is pretty damn close to ML techniques. The normal thought process is "we don't want child predators to sneak through just by cropping, or sticking a water mark on the image, or some other way slightly modifying the image like the color balance. Can we come up with something that'll hash the same even with minor modifications to the image?". And you basically end up with intentionally overfit AI.

This attack doesn’t work. If the resized image doesn’t match the CSAM image your NeuralHash mimicked, then when Apple runs it’s private perceptual hash, the hash value won’t match the expected value and it will be ignored without any human looking at it.

The adversarial images have to match both the NeuralHash output of CSAM, plus another private perceptual hash that points to the same image that only Apple has access to, plus a human reviewer needs to agree it is CSAM, and this has to happen for 30 images.

Cross-posting from another thread [1]:

1. Obtain known CSAM that is likely in the database and generate its NeuralHash.

2. Use an image-scaling attack [2] together with adversarial collisions to generate a perturbed image such that its NeuralHash is in the database and its image derivative looks like CSAM.

A difference compared to server-side CSAM detection could be that they verify the entire image, and not just the image derivative, before notifying the authorities.

[1] https://news.ycombinator.com/item?id=28218922

[2] https://bdtechtalks.com/2020/08/03/machine-learning-adversar...


It can’t be. There’s a different private hash function that also has to match that particular csam image’s hash value before a human sees it. An adversarial attack can’t produce that one since the expected value isn’t known.

Ok yeah, I do agree this scaling attack potentially makes this feasible, if it essentially allows you to present a completely different image to the reviewer as to the user. Has anyone done this yet? i.e. an image that NeuralHashes to a target hash, and also scale-attacks to a target image, but looks completely different.

(Perhaps I misunderstood your original post, but this seems to be a completely different scenario to the one you originally described with reference to the three thumbnails)


The cryptography is most likely done at a higher level than the perception comparison and is quite likely done to protect the CSAM hashes than your privacy.

My interpretation of this is that they still use some sort of a perception based matching algorithm they just encrypt the hashes and then use some “zero knowledge proof” when comparing the locally generated hashes against the list, the result of which would be just that X hashes marched but not which X.

This way there would be no way to reverse engineer the CSAM hash list or bypass the process by altering key regions of the image.


Apple's scheme includes operators manually verifying a low-res version of each image matching CSAM databases before any intervention. Of course, grey noise will never pass for CSAM and will fail that step.

The fact that you can randomly manipulate random noise until it matches the hash of an arbitrary image is not surprising. The real challenge is generating a real image that could be mistaken for CSAM at low res + is actually benign (or else just send CSAM directly) + matches the hash of real CSAM.

This is why SHAttered [1] was such a big deal, but daily random SHA collisions aren't.

[1] https://shattered.io/


Yes, I can. This is just one possible strategy: there are many others, where different things are done, and where things are done in a different order.

You use the collider [1] and one of the many scaling attacks ([2] [3] [4], just the ones linked in this thread) to create an image that matches the hash of a reasonably fresh CSAM image currently circulating on the Internet, and resizes to some legal sexual or violent image. Note that knowing such a hash and having such an image are both perfectly legal. Moreover, since the resizing (the creation of the visual derivative) is done on the client, you can tailor your scaling attack to the specific resampling algorithm.

Eventually, someone will make a CyberTipline report about the actual CSAM image whose hash you used, and the image (being a genuine CSAM image) will make its way into the NCMEC hash database. You will even be able to tell precisely when this happens, since you have the client-side half of the PST database, and you can execute the NeuralHash algorithm.

You can start circulating the meme before or after this step. Repeat until you have circulated enough photos to make sure that many people in the targeted group have exceeded the threshold.

Note that the memes will trigger automated CSAM matches, and pass the Apple employee's visual inspection: due to the safety voucher system, Apple will not inspect the full-size images at all, and they will have no way of telling that the NeuralHash is a false positive.

[1] https://github.com/anishathalye/neural-hash-collider

[2] https://embracethered.com/blog/posts/2020/husky-ai-image-res...

[3] https://bdtechtalks.com/2020/08/03/machine-learning-adversar...

[4] https://graphicdesign.stackexchange.com/questions/106260/ima...


> the perceptual hash used is secret

Yes, although I'm sure a sufficiently motivated attacker can obtain some CSAM that they are reasonably sure is present in the database, and generate the NeuralHash themselves.

> At that point, you are reduced to the threat present in every other existing CSAM detection system.

A difference could be that server-side CSAM detection will verify the entire image, and not just the image derivative, before notifying the authorities.


I think the state of the art has progressed past that? A non-trivial reversal of a perceptual hash would mean that every cloud provider maintaining a CSAM scan list violates CSAM laws - if they could reverse the hash to get too close to the original image, the hash is just a lossy storage format.

No, i'm sure people would still make a fuss. Perceptual hashes are required to prevent criminals from slightly changing pixels within CSAM photos to avoid detection.

Also, can we create hashes that are not CSA and match CSA?

https://www.theverge.com/2017/4/12/15271874/ai-adversarial-i...

If we can, then, hypothetically we need to get non-CSA images onto important people's iPhones so they get arrested, jailed for years and have their lives ruined by Apple.

Disclaimer: I buy Apple products.


CSAM in the database is confidential, so the human reviewer just needs to be plausibly confident that they have CSAM in front of them. However, it’s not clear to me that you can pull off all three simultaneously. Furthermore, the attack doesn’t specify how they would adversarially generate an image derivative for a secret perceptual hash that they can’t run gradient descent on.

If someone wanted to plant CSAM and had control of an iCloud account, it seems far easier to send some emails with those images since iCloud Mail is actively scanned and nobody checks their iCloud Mail account, especially not the sent folder.


You are glossing over how an adversary can generate an image that meets the following requirements:

  a) hashes to the same value as known csam image A with the public NeuralHash algorithm 

  b) has a derivative (e.g. lower res thumbnail) that when processed with a _private_ perceptual hash algorithm also matches known csam image A.
What is your proposal for solving b. For a, it’s possible to iteratively generate NeuralHash’s that get close and closer to the value you are attempting to equal, while that isn’t possible for step b.

Reminder that it's perceptual hashes, not cryptographic hashes. So it's enough that images have enough visual resemblance, according to the model. Natural collisions have been observed and generating collisions for planting on someone elses device is trivial.

Apple also only recieves a DB of hashes and so have no way to verify that they're only scanning for CSAM and not other "undesirable" content.

https://github.com/roboflow-ai/neuralhash-collisions

https://github.com/anishathalye/neural-hash-collider


This reads like a failure of the NCMEC, and the legal system surrounding it.

It is insane that using perceptual hashes is likely illegal. As the hashes are actually somewhat reversible and so possession of the hash is a criminal offence. It just shows how twisted up in itself the law is in this area.

One independent image analysis service should not be beating reporting rates of major service providers. And NCMEC should not be acting like detection is a trade secret. Wider detection and reporting is the goal.

And the law as setup prevents developing detection methods. You cannot legally check the results of your detection (which Apple are doing), as that involves transmitting the content to someone other than the NCMEC!


This doesn’t work for two reasons: 1) There’s no way to know the perceptual hash value of Apple’s private NeuralHash function that is run on the derivative of the image server side to verify a hit really is CSAM. So while you could cause a collision with the on device neural hash if you possessed illegal content, you wouldn’t know if you successfully faked Apple’s private neuralhash implementation. 2) An Apple reviewer must verify the image is illegal before it’s passed along to law enforcement.
next

Legal | privacy