Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It doesn’t cost much at all for companies to have infrastructure to delete user data. That’s just a cascading delete in any relational table. Poof, data gone in a single query. Sure, some systems are slightly more complex but deleting data is one of the easier challenges for any company to solve.

What costs money is companies trying to figure out how to work around legal requirements, obfuscate this option from users, or forcing them to go through support-intensive processes to delete their data rather than just building this like any other core automated business function.



sort by: page size:

What they’ve done is much cheaper to develop and maintain than implement more granular data deletion tools. Especially when they really don’t need to store the data any longer than one billing period.

I have fixed lines of code that cost many multiples of this in compute cost.

At $previous_company, user-initiated account deletions were only soft deletes, and we were supposed to actually delete data 60 days after the request (in case people regretted their decision). Due to a bug in a SQL query we stopped deleting users (and their associated data) for over six months, racking up over half a million dollars in additional storage cost.


I am pretty sure that for the amount of savings most companies can make through removing data retention, working out what data to remove is far, far more costly.

I don't understand how this is a surprising feature. Shouldn't every decent system be able to delete all data of one user without affecting other data?

That companies haven't even thought of being able to delete user data makes the law even more important. It should be common business practice to delete data if a user asks, not something technically impossible.


This is how most services work at scale. It's much cheaper to set a flag than actually delete an entry in a database. The data can then be scrubbed by some periodic maintenance process.

Exactly. The ability to delete user data should have been something engineered from the ground up. They are paying the penalty of technical debt and hopefully learning some tough lessons.

If I was a tech lead and also wanted to advocate for hard deletion, I would ask the question for this scenario: "What's the cost of keeping all this data unnecessarily, modifying most queries to filter for deleted data, and dealing with other various consequences of soft deletion, and how does this cost compare with the cost of building a bespoke tool to restore deleted data at a customer's request within a certain time frame and compare with the benefit of a customer being able to restore their data at the 'click of a button'?".

Having dealt with systems that have hundreds of millions of records or more, many of which reference deleted data and are therefore useless, I lean towards hard deletion more and more and on the off chance that deleted data really needs to be recovered you build a separate system/infrastructure to support that, rather than building your _entire_ system around the small likelihood you really need to restore it.


Just as hard as not deleting a production database, yet that still happens.

This is an edge case for which they probably didn't have an existing process which means they had to wing it.


And easier deletes if customer X wants all their data deleted!

And here I am dreaming of a world where companies do real deletions of data when a user requests (and also deleting older transactional data that has outlived its utility, including regulatory requirements) and storage prices being lower with a slightly smaller (steady) market for it from the major companies.

On the other hand, storage prices seem to be low enough for all these companies with bulk, long term contracts that developers wouldn’t bother doing real deletes of data.


Please give us more details of what you think the administrative burden is, because I think you have overestimated it.

Being able to provide a user with the data you have on them, and being able to delete it, should be basic requirements of any software company. And now they are, which is great.


This hasn't been my experience. Do you work in a large company? My experience has been that there are heaps and piles of data including (or potentially including, unstructured) personal information. And lots of reasons why complete deletion isn't possible - because certain other information nearby the personal information is necessary for business purposes (like submitting invoices), or because the person requesting deletion only wants part - not all - deleted, or because the database is structured such that deletion isn't feasible until next year when we roll onto a new technology, etc etc.

Manual data deletion without an automated process sounds like a recipe for disaster.

That’s not the reason. The reason why people don’t delete is because nobody wants to be left with inconsistent data relations. Deleting a customer is more deleting their PII(our scrambling it) and leaving everything else in tact.

Or in the case of Silicon Valley, leave everything in tact with a disabled flag and then spam you for the next decade or so.


I've definitely seen companies having different policies. First, the data is valuable, so keeping it when you are allowed too can be worth it for that. Second, doing a full deletion can be costly in hardware resource utilization, and sometimes even involve some manual labor. So minimizing it can be cost effective.

I'm not talking about corporations which have the means and the resources to engineer data pipelines that can be scrubbed and lawyers to deal with compliance. I'm speaking to the issue of you or I creating an MVP with a few months of dedicated hard work. How much extra time has to be spent on putting in place a process for data deletion?

I'm working on GDPR right now at my company, and it's not a small effort.


They don't provide tools for deleting data as easily as creating data, so users are bullied into buying storage instead of deleting trash.

The easy part is providing the information about how the data is used because it's do once, reuse always. Just refer to a document.

The hard part is the right to be forgotten which requires the company to remove all information that pertains to a person. The tech stack still has to implement some stuff here in order to reduce costs and make it easier.

Having to contact your database administrator because you can't delete something without leaving dangling information all over is bad tech implementation which will probably require a huge rework for some companies.

I wonder how you can send the information to the client. If you use GMail then GMail will also know the personal information (they used to read your emails.. good stuff).


Thanks for the link, it's very easy to read. I still have some questions about data deletion request:

- How will this affect invoices that have to be kept for accounting purposes? Even if a customer wishes their data to be removed, we should not remove accounting information.

- How will this affect Internet archives and caches?

It seems removing all traces of a customer can be a very hard thing to do.

next

Legal | privacy