Merely setting a delete flag is not compliant with the GDPR, that's why a cascading delete is necessary. Any programmer worth their salt knows mass random deletes and updates are extremely inefficient.
To your post specifically, I think a cascade of "zero outs" or the like to blank out a user's data would be sufficient is it not? It could happen at most once for each user account so it shouldn't be ruinously inefficient unless a system was already on the verge of collapse.
But on the topic in general, could someone explain to me what the real world consequences are likely to be for a small business not based in the EU, of not complying? If I've never cared where my users were as long as their payments cleared (oh, is that where they get you? the payment processor?), and I'm selling handcrafted bobbins online in Canada without letting people delete their email address, what is likely to happen if someone complains to EU authorities?
That would make it compliant but there will still be efficiency problems.
Databases such as Cassandra are made so that updating doesn't actually delete the old data until some time later so frequent updates will degrade performance and storage. Other databases that allow for immediate overwriting the data will cause fragmentation and thus performance decline and wasted storage until you compact (basically recreating the entire database) which is not something you want to do all the time, especially on SSDs.
The problem isn't to delete 1 piece of data 1 time. The problem is different people demanding thousand+ rows randomly spread out in your database deleted every day that is the problem.
Look at the cavalier attitude people have with their data until now. Do you really think starting today every one of them is going to start caring and requesting full deletes everywhere?
Maybe a percentage will be better educated, and actually request data deletion here and there, sometimes but I don't thing anything is going to massively change in general customer behavior. The GDPR just gives the means to those who really want to control their data (which were there before, by the way, just not really enforced. Now that there's a number figure to the possible fine, now is everyone paying attention.)
The problem isn't the odd paranoid submitting a delete request once a month, it's when some influential person publicly requests a delete for whatever outrage is going on that day and causes his 10k followers to do the same
You're suggesting that a business should be able ignore the privacy concerns of its users because they're inconvenient. That is decidedly worrying. If a startup can't afford to run ethically then it shouldn't really be in business.
Yeah, this sort of thing is like a pessimistic case for Cassandra and various databases that are designed to model data as an immutable set of facts and to model deletions as retractions or the like.
Apparently it defaults to 10 days for tombstone purging and recommends not going below 5 days. How bad is performance actually going to be at a nice slow several-day compaction rate?
The pessimistic case sounds like trying to remove things within hours.
All I can say is that not everyone's situation is the same. If you have a small forum where a few hundred people post a few dozen messages a day, it obviously won't be a big deal. There are situations where the amount of generated information is much larger than that. Webserver logs are one possible example.
It isn't an impossible problem to solve, but the GDPR is a significant time and a money burden that will especially be an issue for small startups that don't have millions in venture funding to spend on this.
I encourage you to read my comment again, and point out where I mentioned merely setting a delete flag. Any reader worth their salt will point out that it’s not what I suggested at all.
I believe the confusion is around your statement "mark every sensitive field". I think you mean "overwrite every sensitive field", but that definitely took a re-reading to infer, and I'm still not 100% sure.
you ignored "and mark every sensitive field with a special value", which is the key part. As long as all sensitive data has been essentially zero'd out (for some value of zero), all is fine.
The thing is, affected persons can not only request a data deletion, but also the pausing of data processing. In that case, you are not allowed to delete them but they must not be used any longer, which is essentially a soft delete. So to be compliant, you’d have to implement both a soft and a hard delete.
Wouldn't it be possible to just delete the 'idetifiabel' parts in the database in order to be GDPR compliant?
If you for instance save all the user data like user preferences under a random userId, and then delete the personal data (such as email address, name etc.) associated with the userId I would expect this to be GDPR complaint without having to do a cascading delete.
If you're absolutely certain that the user's identity cannot be reconstructed from the remaining data points, then yes, a full anonymization is enough. You are, after all, removing personally identifiable information, even if the record structure remains in your database.
It's a law, not a technical constraint. No one gives a fuck about some foreign key relations, they care that personal data cannot be accessed, or somehow reconstructed.
I read a lot about cascading deletes, which I interpret as holding personally identifiable data redundantly.
I can see two reasons why this would be a problem:
You have a really shitty un-normalized database design. Granted that you may have to denormalize specific columns for performance reasons. But why that would be the case with, for example names, phone numbers or sexual preferences, totally escapes me.
Or, you're referring to actual cascading deletes, meaning that you need to get rid of child relations, based on deletion of the parent relation. If this poses a problem then I'd argue that you're guilty of a shitty database implementation, arguably with criminally bad definition of your primary / foreign key pairs.
I really don't see a problem here, unless the database schema is implemented in a totally incompetent manner.
A cascading delete is not necessary. You need only to remove personal information, not all information. Now if you are producing an application that only contains personal information like a chat app, then sure, you might need to remove everything.
But often all you need to do is overwrite the name, address, or similar bits of information, and you can then leave the rest of the data intact and set your delete flag.
That is insufficient. You can still infer identities through metadata and behavioural analysis. For instance purchase history and geolocation is often enough to identify some individuals.