Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

While they did use their own simple DNN classifier (because they couldn't find one), it was treated as a black box for purposes of the attack. A more robust classifier (probably using multiple methods) would be more resistant to attack, but the attack would still work, just the changes would be more obvious. At some point you hit "human-recognizable" but it's unclear where that point is.


sort by: page size:

The fact that a human isn't fooled by the attack (we can still recognize the 32x32 images for what they are), points to an interesting gap in the abilities of conventional convolutional neural nets.

Usually the attack requires the source code (or weightings of the neural network), I'd be surprised if they are able to actually attack these systems.

The difference is that you can calculate an adversarial example for our classifiers, but it's too slow to calculate on a human.

Even if you could, the result would be specific to that particular person, so it won't work as good on others. And these bastards learn while you're constructing the example (which isn't fair at all to a helpless classifier that's just sitting there and doesn't change).


When we do that (and we usually do it for accuracy improvements, not resistance to atrack, that happens at the same time for the same reasons - adversarial examples are a misclassification problem), we change the exact attack that breaks the system, but there is no 100% accurate system - and since it's completely foreign, the examples that a machine would misclassify would likely not confuse a human.

The issue is classification currently relies on a very small embedding of the data which is pattern-matched, with no semantics. It has no way of telling that the difference between a dog and an elephant ISN'T that noise gradient, at least some of the time!


Very interesting. First of all here's a youtube video associated with the paper -- https://www.youtube.com/watch?v=M2IebCN9Ht4 . Second some on here have posted about the Svegedy, Goodfellow, and Shlens paper http://arxiv.org/abs/1412.6572 which discusses the opposite effect. The Svegedy research is mentioned in the Nguyen paper and specifies that given an image that is correctly classified in a DNN, you can alter that image in a way imperceptible to a human to create a new image that will be INCORRECTLY classified. The Nguyen, Yosinski, et al. work that's the subject of this post states that given a DNN that correctly classifies a particular image, you can construct a gibberish image that the DNN will classify as the same image.

Both results are interesting from the point of DNN construction, and there have been some papers suggesting ways to counter the effects specified in the Svegedy research. In practice (as others have mentioned) in order to construct an exploit similar to the one described in this post, you'd need to have a lot of knowledge about the DNN (e.g. weights) that an external attacker wouldn't have.

What this does leave open, though is a disturbing way for someone with internal access to a DNN doing important work (e.g. object recognition in a self-driving car) to cause significant damage.


Can't you just "machine learn" the attack?

Anyone else find it weird they used a ML classifier to mask peoples' identities? It doesn't do a good job at constantly masking any of the bystanders. Which their identity is probably more important than the researchers?

I totally thought this was going to be about something completely different. But I don't suspect these kinds of methods to work in the long run. Honestly maybe someone can convince me otherwise. These kinds of attacks are always going to be extremely difficult. Your attack (the sweatshirt) changes as you move and walk through different lightings, etc. It just doesn't seem like a good direction for real protection. Plus, a model can always be tuned to correct for the attack. It just seems like the research is more about robustness which it feels like this is just flashy presentation. I can get that tbh, but these always seem like they are being presented as ways people can attack these systems in the real world (I don't buy this).

So privacy ML people, can you explain to a CV researcher why this is a useful direction?


> On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation.

Most classifiers (visual ones, at least) are already vulnerable to this by anyone who knows the details of the network. Is there something extra going on here?


Has anyone tried the same adversarial examples against many different DNNs? I would think these are fairly brittle attacks in reality and only effective with some amount of inside knowledge.

Yes, i'm aware of differential attacks on neural networks. That doesn't falsify the hypothesis. There are instances where you will fail to recognize a human that those NNs will not.

Author here. Some of them are black box attacks (like the one where they get the training data out of the model) and it was done on Amazon cloud classifier which big companies regularly use. So, I wouldn’t say that these attacks are entirely impractical and purely a research endeavour.

What I hope they do is use many instances of the system trained with slightly different sets of training data, otherwise, you just need to learn how to trick one version (and it will be possible) and you can trick them all.

When attacking human-managed systems they may be slow to respond, weak and unable to detect everything. But they are also weak in very different ways and will respond in different ways.


Slick! 84% success rate in the real world, and a simple clever technique. Basically use another DNN to reverse-engineer the target, find weaknesses using the substitute and then use those examples to make attack vectors. Without a robust mathematical framework to understand why a DNN behaves as it does, this is almost impossible to guard against.

From the abstract: "Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder."


Agree, it seems to me that there's a sense in which this is a trivial issue; just throw more adversarial examples into the training set, and turn the crank until the classifier understands how to correctly handle those cases too.

There's also a sense in which this is extremely non-trivial; any agent (human or machine) can be subjected to adversarial attacks. They just look different right now, and our current systems are vulnerable to very simple ones.

It seems to me that improving an algorithm's resistance to adversarial attacks is much more feasible than improving a human's resistance to their own class of adversarial attacks.


The whole idea is to tear down algorithm, simulate unnatural image that triggers the required responses and pass it to classifier. There is no direct link to security risks here.

So, I might be wrong, but my understanding of these attacks is that they require you know what the model of the classifier is.

If my understanding is correct, I guess my question is: how general are these attacks? Can we ever say "oh yah, don't try to classify this penguin with your off-the-shelf model, it'll come-up iguana 9/10"


> Your "highly doubt" is baseless. Black box attacks (where you create adversarial examples only using some inputs and outputs, but not the model) on machine learning models are not new. They have been demonstrated countless times [1]. You don't need to know the network at all.

This is not a machine learning model as such, though, and is used differently than they are.

> This is not the case, since regular, unprivileged Apple employees can and will look at the inputs and outputs of the model

Can they?


There is a similar attack against image classifiers.

So is this just a proof of concept or can this be exploited in the wild?

Based on my understanding you need to have access to network itself (weights, baisses, activation function, architectural topography) to pull off this kind of attack. Doesn't seem like this could be easily be duplicated as an outside agent.


Training classifiers can also go off the rails under adversarial attack. This commonly showed up in our systems when people sent short emails that were more ambiguous. For example this tends to cause problems where malevolent users adopt dogwhistles co-opting the language of the attacked group. The attacked group commonly becomes the ones getting banned/blocked in these cases
next

Legal | privacy