Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It's ok, they have "an ongoing effort to ensure those transcripts do not remain in any of Alexa’s other storage systems."


sort by: page size:

> It would cost more money than God in hardware to store every thing Alexa ever heard

Depends, first of all storing compressed audio isn't that space-expensive, especially in some long term data storage like s3. Additionally they could only be storing the transcriptions, but not the voice behind them, which would be a lot less data.

We don't know as Amazon hasn't been very forthcoming about the privacy aspects of Alexa. I personally suspect they are keeping some voice information so they can use it to improve their NLP. I hope they are doing so in a way that is detached from accounts / IDs, but you never know.

Additionally, you can indeed delete a record of the query from the app, but who knows if the voice data or even the query itself is still stored after deletion, just not visible to us end users.



Ah, so that's where all the Alexa recordings went.

> having an Alexa in your home means you must accept that your (i.e. you, the registered user) voice commands are being stored indefinitely.

is that actually true? That doesn't sound like it's GDPR compliant.

edit: I've just looked into the alexa privacy settings on amazon and it allows me to delete all my recordings from their servers (I'm in Germany, no idea if this is region specific)


No, they're talking about the recordings made when people say the wake word. People, especially children, probably don't know that this recordings are stored indefinitely: “At no point does Amazon warn unregistered users that it is creating persistent voice recordings of their Alexa interactions, let alone obtain their consent to do so.”

An important detail not directly addressed in the article (or I missed it) is do the audio recordings include conversations between Alexa commands? Or where the audio recordings specifically related to the human issuing commands to the device? Either way, it's a huge issue the data was leaked to the wrong customer.

I find it hard to believe this was a one-time mistake.


Well, now we know what's going to happen to all that Alexa voice data.

So we know that they have a record of every query that Alexa has detected. I'm guessing what you're questioning is whether they're storing the actual voices so that they can differentiate between different people that have queried Alexa.

>There is apparently no way to do that with the Alexa app.

But certainly it's not impossible for this to be fixed.


We know they are streaming it to do all the NLP in the cloud. This is obvious from the computational costs alone of doing NLP.

The important question here is "What happens to the audio data streamed to the cloud, a few seconds AFTER the query has been identified (or not identified) and the results sent back to the Alexa device?"

The nieve assumption is "all audio data is deleted after processing". The reality is that data is still valuable to Amazon for a variety of uses such as

(1) further training their voice recognition software

(2) advertising data mining [how many people are in the room, things they are talking about --- note, Facebook's mobile app infamously does this]

If they just store the query text, that is 'best case' from a privacy perspective. If they just store query text and query audio, that is less than ideal, but not too bad. If they store all audio, ever recorded, for an indefinite period of time... that is what this police request could reveal: Audio data stored for a non-keyword-trigger, and for days/weeks after the fact.


I work on Alexa and for whatever it’s worth, I can confirm that Amazon is telling the truth about how Alexa listens and about what is done with your data.

This is all publicly available info, and perhaps there’s no reason why you should trust me any more than you trust Amazon as a company, but as one privacy-conscious engineer to another, I promise that your ambient conversations are not being stored or sent to Amazon and that any data you delete in the app (either by specifying an auto-delete period or manually deleting it) is actually, really, truly deleted.

A process running locally on your Alexa device listens for the “wake word”.[0] This audio is only processed locally within a constantly-overwritten memory buffer, it is neither stored nor transmitted. Only once the wake word is detected does Alexa begin to transmit an utterance to the cloud for processing. I’ve worked with the device stack and it really isn’t transmitting your ambient conversations.

Within the Alexa app[1], you can see and hear all of your stored data and can delete any of it. You can also control the duration after which it is auto-deleted. From working with ASR datasets, I can confirm that deleted audio (and the associated text transcript) is really deleted, not just hidden from your view.

I never owned an Alexa or other smart home device before I worked on it, and I’m not sure I’d buy another company’s device where I lack the same ability to “trust but verify”, so I’m not sure how much weight my word should carry. But I can give my word that Alexa is not transmitting your ambient conversations or just setting “deleted=true” in a database when you tell it to delete your data.

[0] see page 4 of https://d1.awsstatic.com/product-marketing/A4B/White%20Paper...

[1] https://www.amazon.com/b/?node=23608614011


Sentences after the 'Alexa' key word are still recorded as un-identified questions that you can view in the Alexa app. Source: bought mom one, have used it.

> No effect, I believe, since Alexa haven't been providing crawl data to the Internet Archive since January 2021.

Were they still providing value to the Wayback Machine at that point? Has their cessation had a significant impact on the Wayback Machine's crawling ability?


> And as you know Siri and Alexa are always listening to every word so they can respond to the wake word. I believe recently Amazon admitted to storing all these records IIRC (please correct me if I am mis-remembering it).

This is not entirely correct the way I understand it. While they're listening as in, the mic is active, the full text processing is not happening until the trigger word. There's a reason Siri is called Siri (distinctive pattern, easy to pick up before applying a stricter check). The issue with the recording was that the device thought it heard the trigger word and the mismatched sample was still uploaded.

What I don't believe is happening is actual background conversation processing by the assistants on purpose. (There are going to be tech slipups) Simply because the moment that's revealed, they'll get a regulatory ban hammer they really do not want. It would also chew through either your data or battery and be easy to notice.

I don't put that much faith in TVs though for example...


> What about other stuff that gets uploaded prior to NLP & validation?

I would be very surprised if Amazon sends everything up to the cloud; it would be very expensive to do so. Looking for "Alexa" in the audio is done locally, and probably triggers recording then.


Summary of vague technical details (which may be all we hear about this):

> an Alexa engineer investigated ... they said 'our engineers went through your logs, and they saw exactly what you told us, they saw exactly what you said happened, and we're sorry.' He apologized like 15 times in a matter of 30 minutes and he said we really appreciate you bringing this to our attention, this is something we need to fix!"

> the engineer did not provide specifics about why it happened, or if it's a widespread issue.

> "He told us that the device just guessed what we were saying" The device did not audibly advise that it was preparing to send the recording, something it’s programmed to do.


Since it seems like quite a lot of people are surprised that Amazon stores these recordings...

PSA: you can go into the Alexa app and look at your Echo history and even listen to recordings of each interaction.


> Are you worried Alexa is listening to you as you rant to yourself?

Well, I mean, quite literally, it is. It's always listening to you, and that's how it knows when you say "Alexa". And IIUC all these audio recordings are sent right to AWS and stored indefinitely.


You can delete all of your voice recordings from Alexa using the companion app.
next

Legal | privacy