Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Amazon stores Alexa transcripts indefinitely (www.iafrikan.com) similar stories update story
180 points by iafrikan | karma 1936 | avg karma 0.63 2019-07-03 04:45:03 | hide | past | favorite | 97 comments



view as:

Why not delete by default or after 30 days or something more user friendly...

Yeah I wonder why would they not voluntary delete all of that data?

Why delete something if there might be the slightest chance of it being useful in the future, and there is no downside (storage, reputation, legal)?

I assume storage of this data for them is nearly 0 cost, reputation is not an issue as most >99% of people seem to simply don't care about privacy and there is probably no legal limit.


I was being overly sarcastic in my prev comment.

There is no reason for them to do it. In fact if they did, it would be considered a loss of value. Its valuable dataset.

Even deleting stuff in 'big data environment' is generally considered too expensive rather than just un-indexing data points that are not relevant anymore.


"no legal limit."

There is in the EU. Keeping data indefinitely is illegal. Expert opinion varies on how long you can keep it, many think you're save with 12 months, most think 36 months is too long.


If it's personal data. A transcript in itself is not personal data unless an individual can be identified or it is linked to an individual.

So "Alexa, order more beer" in itself can be kept for as long as you want. The audio recording may be a different matter because of voice recognition.

As to how long personal data may be kept, well in the UK one can sue on a civil matter for up to 6 years so there is a good argument that anything related to the service you provide and in relation to which you might be sued should be kept for 6 years. I expect it's similar in other countries.


According to the article the transcript is directly associated with the users. It's some of the most personal PII there can be in my opinion.

One day it will all be compromised or misused, because that's what history teaches us.


> According to the article the transcript is directly associated with the users

They just need to destroy that link and can then keep the transcripts for as long as they want.


It's used for training.

You aren't the user.

... but the product.

Why do we believe this concept? It's a business or political decision to delete something after X days/years, not a technical limitation. It's similar to government policy about data retention policy for X years. This concept is invented to calm the public, but there's no difference in technical aspect. The servers still keep everything. It's the contrast of the old days, when the mainframe has to discard data because there's no space left, or it's simply too expensive to keep everything.

To be fair, they make it easy to review and delete past recordings here: https://www.amazon.com/alexaprivacysettings

I really doubt the recording is actually deleted from all of their servers. Wouldn't be surprised when a few years from now there is another news item about how they accidentally kept these logs in backups or how they were shared with law enforcement or ad companies.

It's ok, they have "an ongoing effort to ensure those transcripts do not remain in any of Alexa’s other storage systems."

> kept these logs in backups

Is that unreasonable?

I mean, I cannot possibly expect myself to go through my backups and delete a folder from everything, down to the multi-year incremental.

Once you start tampering with a backup, it stops being a backup.


Well then "delete past recordings here" is probably not true.

This can be done by encrypting each backed-up item and forgetting the key for deleted items that shouldn't be restorable from backup. You still need to decide how long the keys should be kept, though.

This page was empty for me, I had to open the Alexa app and do it from there. It isn't clear why, as the random trash alexa misinterpreted and added to our shopping cart was in the alexa cart ok...

That's probably a "soft delete" and hidden. Really doubtful that it's actually deleted.


It will be interesting to see how this plays out wrt GDPR

It's not that interesting unfortunately. You can physically delete all of your recordings at any time at https://www.amazon.com/alexaprivacysettings

All of that data is wiped when you initiate a delete request under GDPR.


What about privacy by default?

I'm tempted to feel that if you are dumb enough to put one of these things in your house, you deserve to have your privacy obliterated by a huge private corporation who is as likely to hand it over to a government as they are to sell it to advertisers or worse.

I don't believe that, but it is shocking to me that anyone would want one of these things.


How is a smart phone any different? It has a microphone that, for most people, is always listening.

It’s not shocking at all. I find my Echo to be incredibly useful and I’ve decided that it’s worth the trade-off. Just like I did with the microphone I have on my wrist and the one in my pocket.


It's very different. My expectation that of my smart phone is that it's not listening unless I want it to (I don't). If I find out it is listening/recording and it's configured not to then I will find an alternative.

The "smart speaker", on the other hand, is 100% useless if the microphone is disabled. It also has microphones designed to hear as far away as possible given its design.

Finally, the phone in your pocket and watch on your wrist likely handle the audio data very different than that speaker manufacturer. You might find it useful now, but don't discount now you'll ever find it intrusive.


>My expectation that of my smart phone is that it's not listening unless I want it to (I don't). If I find out it is listening/recording and it's configured not to then I will find an alternative.

>My expectation

This is the key phrase here. Just as you have your expectations, Alexa/Home owners have their expectations. And if Alexa owners find out that Amazon/Google is using their inputs in ways they don't want it to, they'll drop it or find an alternative.

Different people have different expectations. I was late into the whole smartphone thing, and refused to buy one for years because of the privacy leak it represents (heck, I refused to buy a dumb phone for years for the same reason). Back in those days, I would have been as unsympathetic for your rationale of owning a smartphone as you are of those owning smart speakers.

Do you not see that there are different levels of comfort, and not everyone has the same level as you do? Furthermore, do you not see that on that continuum, even you have made compromises?


Fair enough, and this is a great argument that the press are failing to sufficiently publicize that Amazon is collecting a repository of these private recording.

Without being aware of what is going on, people can’t make these choices for themselves.



A smart speaker has a tiny CPU designed to listen for the wake word and doesn't even fully power up until that is heard. Sure it's voice but it's the voice equivalent of pressing a record button.

These are also available in some phones, as provided by my employer: https://statics.cirrus.com/pubs/proBulletin/CS47L35_201507.p...

Your phone microphone can be enabled remotely and the means to do so can be controlled through the baseband processor.

The Snowden leaks revealed that this had been an attack vector available and used for years.

The method they were using has since been patched, but one would be foolish to think that it has been impossible for interested parties to have found another way to do the same.


I don't think my point was well understood based on your counterargument. What I'm saying is the expectation of a smart phone is that voice assistance is an option. That was my point to OP. Smart speakers are useless sans listening. They are not one and the same in their use case or normal operation.

I'm also not disputing there are baseband controls a nefarious entity could abuse but I'm saying that people who choose to trust Amazon with always listening devices vs a smart phone are not comparing apples to apples.


What Smartphone is listening all the time and sending the voice data to some people to listen in?

I don't think Alexa nor the Google homes listen and send data to the respective "motherships" at all times.

Both do local processing for the keywords until it hears one then sends the following data (and I believe a few seconds prior) to the servers for processing.

My phone works the exact same way with the Google assistant. I don't know if other "assistants" (like Bixby) work the same way.


The smartphone is their; I mean, your GPS tracker for machine learning movement behavior. Voice (various project names), Text (SMS gateways) and Data (CarrierIQ) are just a value added bonus.

I enjoyed the conversation with Alex Stamos (former CSO at Yahoo/FB), where pointed out that had George Orwell been told that people would willingly pay for a tracking device and keep it on them at all times and powered on at all times, he would not have believed it.

I should add, any time I say anything even remotely negative about cell phones, I get beat up here at HN pretty good. This is a very good psychological indication that people have cognitive dissonance and denial around this topic.


You would know if your phone was uploading loads of voice and room data real time.

There are of course more subtle ways to do this, but not for real time monitoring. You can transcode the audio with an encoder that does advanced mathematical quantization and batch upload it. If all you care about is human voice, then you could upload an hours worth of audio in a couple hundred kbytes, if even that much.


I wouldn't be so sure you'd know. Modern phones have coprocessors that are dedicated to transcribing speech into text in a very power efficient manner (hell, my Pixel phone has a feature which allows it to identify songs playing at all times ins a very power efficient manner, even when in "airplane mode", even when the screen is off).

Days worth of text could be compressed and sent off in literally dozens of kilobytes. And that's not even accounting for side-channels (a hypothetical targeted attack could conceivably look for "target phrases" and send a single bit of information using a side channel to some server to alert that the phrase has been said).


I was not aware they have speech transcription co-processors now. That would certainly be a game changer.

Back when I used to help the FBI monitor cell phones, they were very slow processors and the user could tell when it was sending anything due to the heat. Best we could do was play with MWI on/off to get cell locations and use the mainframe (cell switch) to do audio monitoring, but that was very limited at the time.


I know it's ironic given the topic we're in, but the technology is actually really impressive!

The coprocessors are really just ASICs designed to run neural networks efficiently. The real cool shit is in the compression and simplification of the networks that allow constant recognition from a small power-sipping chip.

They're also doing a bunch of work with federated machine learning processes. So all of the training data can be kept on the device itself, but they can (i'm grossly oversimplifying this) basically "merge" a full neural net from somewhere else with data learned on-device.

It's that last part that I'm extremely interested in, as it means we can have all the benefits of machine learning systems, but also keep all of our privacy in that all the processing happens locally, our training data doesn't need to leave the device at all, and the algorithms can still adapt and improve for each person individually. It apparently also has side effects that the neural network on your device can adapt to your usage individually much faster and more accurately than a "master" neural network ever could.


Last I knew, the coprocessors that are running all the time are very, very small and incapable, and only capable of detecting the wake word, and that not even terribly reliably; it sent the wake word off to the main processor if it thought it heard it for further checks. This would be, say, two or three years ago.

Are there now co-processors running all the time that could do full speech recognition (not just keyword recognition, which is way easier, but full speech recognition) all the time without such significant battery drain that it would be noticed? I ask not because I consider this some sort of technical impossibility, but because if that is the case, I'd like to know so I can update my understanding of the situation.


[1] is google's paper on their implementation of a setup similar to this for identifying songs entirely offline without using much power at all.

You are correct that the actual processors that are "always" listening are fairly limited (They describe it as a DSP, so that should give you an idea of its capabilities), but when layered the way they are in Pixel phones, it makes it possible for the DSP to detect "something" interesting is happening, then wake up and pass the information to an "AI processor" (they call it the "Pixel visual core" but i find that a misnomer because it's a pretty capable TPU on it's own) which would then do the transcribing, and go back to sleep.

I'm making some assumptions that I haven't researched myself, but I'm assuming there are patterns, frequencies, and other simply detected "signals" present in most speech that make it easy to detect "someone is speaking", and from there can wake up a more powerful (but still surprisingly efficient) processor to do more work.

And Google also has some simplified and compressed neural networks dedicated to transcribing speech which are designed to run on lower powered devices[2]. I don't know if they are actually using the PVC in the Pixel phones to do audio transcription or if they're just using the general CPU for that, but I have a feeling that a TPU would be able to scream through buffered audio very quickly and would go back to sleep very quickly leading to minimal power draw.

In fact I'd be willing to bet good money that an "always transcribing" system could be done today with current generation phones in a way that wouldn't impact battery life enough to be obviously noticeable or cause the user to start looking for what was draining it in most cases. And considering how often I'm personally actively speaking in a day, I bet it would have virtually no impact on my personal device if implemented this way.

(and man did it feel wrong to type that last paragraph out!)

[1] https://ai.google/research/pubs/pub46522

[2] https://ai.googleblog.com/2019/03/an-all-neural-on-device-sp...


Eventually someone is going to turn up the talking points used to generate these comments.

So does that mean that everyone "dumb enough" to have a smartphone deserves to have their privacy obliterated by the manufacturer of that phone?

And of course anyone within earshot of any of those devices also deserves to have their privacy obliterated, even if they themselves don't own one?

I really don't understand why people don't get the utility in these things. They're useful. Being able to control lights in the house, control the TV, set timers and reminders, listen to music and the news, manage grocery lists, and more are all super useful.

I have Google homes, and I (and my family) use them on a daily basis, they're useful. That doesn't mean we forgo all privacy, that doesn't mean we deserve any violations, that doesn't mean we are okay with information being collected against our will.

My family and I are comfortable with the tradeoffs that the device makes, we trust that the company won't break a ton of laws and won't be "listening" when they explicitly say it's not. We are able to delete information we don't want Google to keep, and we are even able to set the account to automatically delete all recordings after a specific period of time if we want. And we trust that the company isn't just lying when they very explicitly say that when you delete something it is actually deleted.

I also support more laws and regulations around this, what they can store, how long they can store it, what they can do with it, and mandated ways of controlling your information.

You may have different priorities, and that's okay, but I absolutely hate this idea that people who want the benefits of these devices deserve anything, and the implication that only those who don't care about privacy have them. And a token "I don't believe that" doesn't negate the sentiment you wrote right above it. I'm an adult, I'm able to make a conscientious decision to trade some of my privacy in a controlled way to get benefits from it, and that is no more wrong than you wanting to protect your own privacy.


After a year with several Echos we ended up with listening to Spotify and getting the time of the day. Everything else looked like a gimmick. So we dropped the Echos, the tradeof between people listening into your conversations vs. using a light switch just wasn't worth it.

Yeah, I never understood what problem they were supposed to solve from the get-go.

Besides gathering a huge "spoken language" data-set for machine learning.


Seems to me like you understand perfectly.

>That doesn't mean we forgo all privacy, that doesn't mean we deserve any violations, that doesn't mean we are okay with information being collected against our will

You did forego all privacy and clearly you are okay with information being collected, otherwise you wouldn't have those devices.

That being said, you do not "deserve any violations".


>You did forego all privacy and clearly you are okay with information being collected

Absolutely not! I would not be okay with people being able to listen to the audio they record without my consent, I would not be okay with a transcript being publicly available, I would not be okay with the history of the commands I say to it being publicly available. I'm also not okay with audio being recorded without me triggering it to record.

Making limited tradeoffs with my privacy in specific situations is very different than not caring about privacy in general. And "privacy" isn't just one thing, there are tons of levels in many different areas.

For example, I am not okay with cameras that upload anywhere off site IN my house. That's a line that I'm uncomfortable crossing. Not because I think that there is someone viewing every second of uploaded footage, but because the risk of a security breach is high enough and the information that camera would collect "private" enough that I don't think the tradeoff is worth it. But cameras that record the OUTSIDE of my house and upload it somewhere? I'm fine with that. The expectation of privacy is different on the other side of that wall for me, and i'm more than comfortable with the tradeoffs and risks of that going public.

For you those lines might be different, but neither of us forego all privacy because of those decisions.


>I would not be okay with people being able to listen to the audio they record without my consent, I would not be okay with a transcript being publicly available, I would not be okay with the history of the commands I say to it being publicly available.

Something needn't be public to violate your privacy. Clearly you agreed to google listening to and recording to all sound being made in your house, leading to privacy violations. Also, the risk of security breach pretty much isn't if it happens, but when it happens.

>Making limited tradeoffs with my privacy in specific situations is very different than not caring about privacy in general. And "privacy" isn't just one thing, there are tons of levels in many different areas.

Agreed, however I didn't say you didn't care about privacy. Just that you traded it away for some (stupidly minuscule IMO) benefit.


> Just that you traded it away for some (stupidly minuscule IMO) benefit.

I think this is the disagreement. You are treating privacy as a binary thing. That once you trade any amount of privacy away, it is entirely gone.

That is not how most people approach privacy. Most people view privacy as a spectrum. Someone can choose to sacrifice some (but not all!) of their privacy to get a benefit.

What's key is they should be able to prescribe the limits as the poster above has described. We should be able to severely punish companies that lie about their behaviors, because that would make it impossible to make informed trade-offs about how much privacy we are truly giving up.


>We should be able to severely punish companies that lie about their behaviors

I can't agree with this more, and I really hope we can see some legislation in the US that creates very strict penalties for lying about this stuff, and possibly some regulations that can make storing user data more risky to the company doing it, which would hopefully put pressure on them to reduce or eliminate data storage where possible.


Privacy _is_ a binary thing, either something is private or it is not.

There is of course a difference between _everything_ and _nothing_ being private, but once _something_ ceases being private it ceases being private forever.

If you let some company, which will do who-knows-what, record voice in your house, you no longer can have a private conversation in your house, since the company will record them, possibly listen to them, and ultimately probably leak them. You completely lost that part of privacy, without for example losing privacy regarding visuals.

>What's key is they should be able to prescribe the limits as the poster above has described. We should be able to severely punish companies that lie about their behaviors, because that would make it impossible to make informed trade-offs about how much privacy we are truly giving up.

I do agree there need to be more regulations around this (yay GDPR), although even if you kill the company dead, you cannot un-ring a bell.


> Privacy _is_ a binary thing, either something is private or it is not.

I strongly disagree. If I tell something to my attorney it is still relatively private, but obviously not if I had never told anyone.

If I have a deep secret and I tell my significant other that secret asking for their confidence, it is still relatively private but not as private if I had never told anyone.

I agree that, if you absolutely need total control of information, then the only way to 100% guarantee that privacy is to tell no one ever.

But I think being absolutionist when it comes to privacy expectations means you may have a very hard time convincing others of your opinions.

> If you let some company, which will do who-knows-what, record voice in your house

This is exactly the point we are arguing. We should be able to find out from the company what they will do, and if they deviate from promises they make they should be held accountable.

I should be able to let some company record my voice in my house, with strict conditions on what it will do.


> Clearly you agreed to google listening to and recording to all sound being made in your house, leading to privacy violations.

That's not how these devices work.


This comment brings zero new points to the discussion, and just reiterates the point that parent was responding to. It’s not very constructive or substantial in my opinion, so I downvoted it.

None of these capabilities have an existential requirement for a centralized networked infrastructure spanning the globe.

Stop and think how ludicrous it is that your being able to say to the room "pick a random song and play it loud" requires fiber on ocean floors and huge buildings running servers and the rest of it.

We're being socialized into thinking privacy is an impediment to progress.

It is just a very bad idea to create a turn-key surveillance society and hope and cross your fingers it never gets used.


>None of these capabilities have an existential requirement for a centralized networked infrastructure spanning the globe.

Yes, but I don't want to store every single movie, TV show, song, and news story locally and find a way to get them to myself without using the internet.

I'd love if more stuff happened locally on these devices, and I experiment with various other systems every now and then. I even have a fully self-hosted home automation system running on open source software using local-only radio systems (z-wave) to control things like lights, garage doors, and door locks.

There are things that do require some kind of internet access though, and other things that become much more reliable and easy to use when internet connected.

>Stop and think how ludicrous it is that your being able to...

I have, and so has my family. And we have made the decision that the tradeoffs are worth it for us.

Why can't people understand that I'm an adult, everyone in my household is an adult, and we are able to understand and comprehend the situation, and make a conscientious decision to accept the trade offs.

I'm not an idiot, I know there are risks, but the risks are low enough for me and my family to make it worth it. Just like how there are risks simply just getting in a car, and there are risks every time I turn my oven on, but they are all acceptable.

Maybe you should stop and think just how ludicrous it is that you seem to think it's impossible that someone else could find value in these devices, and that some are okay with limited tradeoffs with their privacy in return for devices and services which improve their lives. And maybe you should listen when someone tells you that they've weighed the pros and cons and decided it was worth it, rather than quickly jumping to the conclusion that they are wrong, didn't give it any thought, or are somehow being tricked.


> Maybe you should stop and think just how ludicrous it is that you seem to think it's impossible that someone else could find value in these devices

You are claiming I am saying this.

I am saying (a) we can get these capabilities without damaging society so lets give it try, and (b) your "choices" affect the rest of us.

It is like second hand smoking.


That's fair, and there are absolutely discussions to be had around the "second hand smoke" kind of impacts these technologies can have. (I really like that metaphor)

But I think it's wrong to throw the baby out with the bathwater here. I agree that there absolutely needs to be more regulation around this area. I'd personally like a law that mandates a tone or sound (and lights) when one of these devices is recording. I'd love some laws on the books in the US that gives those who don't want in this ecosystem a quick and easy way of ensuring they won't end up in them.

I also think that there are a LOT of things that all of the big players in this space can do better (like the by-default recording and storing of all the voice data indefinitely). But I also think that there has to be compromises. Saying it's ludicrous that these devices use the internet is the pendulum swinging way too far in the other direction, the internet-connectedness of these is basically the biggest selling point.

But I'd be over the moon if they did all of the simple commands "locally", and I'd love if users were given more control over their information (something like a weekly/monthly/quarterly email asking if Google can use the following list of recordings to train their models would be awesome IMO).

There is so much hyperbole and just straight up wrong information gets thrown about so often with this stuff, and there's a lot of extremism when it comes to privacy, especially on this website (I know this is a loaded word, and I don't mean to include any additional connotations with it, I just don't know how else to describe this). I also don't want to enable turnkey surveillance, but at the same time I don't want to live as a luddite to prevent it. There has to be some middle ground, and if that means it makes dragnet surveillance easier while improving the lives of many people, then I might be okay with it.


> and there's a lot of extremism when it comes to privacy

For very sound and empirical reasons: historically, intrusive surveillance has always gone hand in hand with tyranny.


You are downvoted but I believe you are essentially correct. It is ridiculous to send voice data all over the globe to be interpreted and send back to IOT devices. Unnecessary and certainly not elegant, since mic and device are most often in very close proximity.

Good for learning to analyze voice on the other hand, but your point is still valid. We will get a compressed voice service that easily can be held in a small device at some point, that makes this ridiculous data exchange unnecessary.


At I/O Google announced on-device transcription and NLP:

https://www.youtube.com/watch?v=b9LQX9cLnZk


Windows has had no-internet-required voice recognition built in since XP. It is available for any programmer to use for any purpose through importing a DLL, and isn't too complicated to use. With an unconstrained grammar defined, it's something like 85% accurate, but depending on how much you constrain the grammar, can get incredibly accurate.

Meanwhile my mother's flip phone in 2003ish also had on device voice control, and it worked pretty damn effectively


>And of course anyone within earshot of any of those devices also deserves to have their privacy obliterated, even if they themselves don't own one?

If you don't warn others about a listening device you may even be running afoul of wiretapping laws, depending on your location.


If the device isn't actively listening, does this even apply?

Good to know since the police in that same state may be carrying their personal cellphones on them at the same time they interact with people...

My mother is bed-ridden and partially blind. She cannot use a smartphone. Alexa helps her call me.

I do not deserve to have my privacy obliterated.


And you shouldn't need to be disabled to get excused from HN's mean-streak. Lots of maliciousness and bitterness in that comment towards people.

My girlfriend uses a wheelchair. She is physically incapable of turning on and off most of her lights, meaning if someone leaves a light on, she's stuck with that light left on until someone can turn it off. She can sort of use her thermostat, but not terribly effectively. Alexa has been an accessibility gamechanger for her, and I dislike kneejerk reactions like this one about as much as I dislike the fact that accessibility is so integrally tied into the Alexa ecosystem such that you've got this devil's bargain of increased accessibility and increased privacy loss. You can't have one without the other.

I'm blind and in a similar boat, but at least I can tie all my appliances into my Home Assistant instance, and never have to worry about being stuck out of reach of a keyboard or phone. Even so, as a blind person, finding accessible weather websites is an absolute pain in the ass. I can get daily forecasts, no problem, but sometimes I need an hourly forecast to plan bus commutes, and nothing beats "Alexa, what's the hourly forecast at 4 PM" for getting a quick answer without having to check half a dozen sites searching for actual textual information vs. a damned infographic or radar display.

I'd love to build her a Mycroft-based setup, but she doesn't need a cluster of RPis and USB microphones hanging around on her desk, and even so that ecosystem isn't ready yet. I say that as someone who literally spent days piecing together a HassOS Mycroft addon and hardware setup that barely hears his commands from a few feet away and triggers all sorts of false positives, and even so I don't think its weather skill does hourly forecasts. I own an Alexa as well, but it stays unplugged unless I need it.

TLDR: It sucks, but don't just classify all Alexa owners as dumb and deserving of victimhood.

Edit: Rereading your comment, I see that you don't really believe that. Apologies for the kneejerk reaction of my own, but I've seen this reaction enough from folks who do that I was finally tempted to speak out about it. I really wish there was an accessibility-oriented privacy-focused voice assistant, or at least, I wish I could pay more to Amazon to opt out of their more invasive practices.


I wish there was one too. I'm glad you and your girlfriend have access to these features, but you shouldn't have to treat with these info-megaoplies to get them.

You should check out Mycroft's Mark II when it's released later this year as it's designed with privacy as an important feature.

The Alexa chain from the device to your AWS skill to your IOT device is fun to develop. Never ever would I install anything remotely similar in my home. I am a guy who puts tape on the camera of notebooks. Compulsive behavior? Maybe, but I don't mind a bit...

You might be smart enough, but what about your roommates and family members?

> you deserve to have your privacy obliterated by a huge private corporation

You don't have a smartphone? Or a laptop with a mic and/or cam but no hardware switch?

I have one of these things precisely because it is not as easy to overlook as a smartphone's assistant app always listening in.


What's the news here? The same happens with Google and Microsoft. You can even check on their websites and see ALL your audios on Google Assistant and Cortana. https://myactivity.google.com

Even your dictations on Windows is there too (Win+H). https://account.microsoft.com/privacy/activity-history?view=...


Also Discord. They brag about it a bit. [1] Don't say or type anything you don't want archived forever. Loose lips sink ships.

[1] - https://blog.discordapp.com/how-discord-stores-billions-of-m...


This is like criticizing Gmail for keeping your email forever by default. It's your account and you can delete your data when you don't want it anymore.

A more substantive critique would be that these services don't let you set a retention policy. Google Vault lets an organization automatically delete old email, but other than Snapchat, you don't see this for consumer accounts, so you have to do it manually.


I do criticize Google for many things, including their usage of email. But you are right, they can keep it forever as there is no mutually legal binding agreement and mutual NDA signed, witnessed and counter-signed by all parties.

When gmail was first created, I was one of the people to first whine about them trawling through messages for URLs and crawling them. Some friends and I would link to files that didn't set content length and would feed them data forever. They eventually caught on to that.


> Some friends and I would link to files that didn't set content length and would feed them data forever.

That seems spectacularly unwise (or at least useless) for all but a minuscule portion of people. Either you stop feeding them data after a short while, and don't care, or you keep feeding them data, and I'm pretty sure Google cares less about (and pays less for) data transfer than you do.

Best case scenario you're trickling data to them and just keeping the connection open as long as possible. Even then, Google likely cares about one connection and the resources it takes far less than you do (even if you care little).

If it makes you feel better, it's maybe worth it, but it feels akin to carrying a marker with you so you can cross out Google's name in any newspaper you happen to be reading. It's an act of protest, but its impact leaves a lot to be desired.


It totally made me feel better and they stopped connecting to my web servers. It just felt good to get a few cents of my tax money put to use. But yeah, in the grand scheme of things, they have infinite resources. I have been to two of their data-centers and they are massive.

Well, it's probably worth it then. It's hard to put a price on psychological well being, and if it helps you keep some sanity in this crazy world we're in, then power to you. And you get an interesting anecdote. :)

Google have just implemented the ability to auto-delete your activity data after three months. Quite handy since it's required to have history on if you want to use the voice assistant at all. I suspect this is GDPR-induced.

Can you delete your discord user data without resorting to JavaScript console hacking? It's possible, but not accessible for the average user.

That's just your local cache. Other users will have your chat cached as well. You can extract it with powershell.

Their servers save the data forever using Cassandra.


This gets into retention policy for shared data (as all messages are shared between the sender and recipient).

You could do it like Snapchat and let the author decide the retention policy, or like email where each recipient separately decides what they want to keep, or like a mailing list archive where you're basically out of luck. What makes sense for a chat app?


FWIW you can turn off historical recording of e.g. voice activity on the Activity Controls section of the Google My Activity page, along with a bunch of other things.

Mr Prosser: But the plans were on display…

Arthur: On display? I eventually had to go down to the cellar to find them.

Mr Prosser: That’s the display department.

Arthur: With a torch.

Mr Prosser: The lights had probably gone out.

Arthur: So had the stairs.

Mr Prosser: But look, you found the notice, didn’t you?


After reading this thread I really think we need more people like Richard Stallman in the world. "Embrace and educate", you'll say. No, people don't care. Let them do what they will. I will not participate in the destruction of privacy. However this, to me, isn't even the scariest part. The real scary stuff is when you realize what this data is most certainly being used for.

Remember when the CIA gave unknowing citizens LSD? Remember the unclassified docs that revealed their intentions to be able to basically warp a mind, and control it at their will? Well, today, we hand them the data for deep examination. Our emotions, intelligence, etc. will be revealed by a quick scan of our face. Just imagine the databases these government agencies have available. I imagine labs filled with scientists and engineers trying to crack the data.

It sounds conspiratorial, but come on. Imagine if you held humanities brain within your hands. Would you not study it, trying to figure out how to make it work more to your personal needs? Or maybe you'll tell yourself it's to make society better, to make it safer.

Just a note: I'm not accusing the CIA exactly of doing this. There is no doubt in my mind, though, that somebody is out there doing this (the CIA seems most reasonable). Sorry for rambling, I've been awake for 24 hours (is there irony in that?)


Do you use any open source Alexa alternative? If yes, please share your experience.

I set up a https://snips.ai/ raspberry pi, got it to respond to its wake word

Then realised it was going to take like 6+ hours (if not days) to get it to play music and set timers, so gave up on it

Ecosystem might be more mature now, when i looked there weren't any good spotify apps you could get for it


Something something GDPR?

Legal | privacy