Merely setting a delete flag is not compliant with the GDPR, that's why a cascad...

dboat · on May 26, 2018

To your post specifically, I think a cascade of "zero outs" or the like to blank out a user's data would be sufficient is it not? It could happen at most once for each user account so it shouldn't be ruinously inefficient unless a system was already on the verge of collapse.

But on the topic in general, could someone explain to me what the real world consequences are likely to be for a small business not based in the EU, of not complying? If I've never cared where my users were as long as their payments cleared (oh, is that where they get you? the payment processor?), and I'm selling handcrafted bobbins online in Canada without letting people delete their email address, what is likely to happen if someone complains to EU authorities?

codexon · on May 26, 2018

That would make it compliant but there will still be efficiency problems.

Databases such as Cassandra are made so that updating doesn't actually delete the old data until some time later so frequent updates will degrade performance and storage. Other databases that allow for immediate overwriting the data will cause fragmentation and thus performance decline and wasted storage until you compact (basically recreating the entire database) which is not something you want to do all the time, especially on SSDs.

yzmtf2008 · on May 26, 2018

I mean, come on.

1. GDPR gives you 40 days to respond. You don’t have to run VACUUM everyday.

2. The entire point of my post was acknowledging that there are costs to being GDPR compliant, and why it’s responsible to have that cost.

Dylan16807 · on May 26, 2018

It's not frequent updates to delete a piece of data once in its lifetime.

If it takes a week to garbage collect that's fine, it just can't stick around forever.

codexon · on May 26, 2018

The problem isn't to delete 1 piece of data 1 time. The problem is different people demanding thousand+ rows randomly spread out in your database deleted every day that is the problem.

pilsetnieks · on May 26, 2018

Look at the cavalier attitude people have with their data until now. Do you really think starting today every one of them is going to start caring and requesting full deletes everywhere?

Maybe a percentage will be better educated, and actually request data deletion here and there, sometimes but I don't thing anything is going to massively change in general customer behavior. The GDPR just gives the means to those who really want to control their data (which were there before, by the way, just not really enforced. Now that there's a number figure to the possible fine, now is everyone paying attention.)

tomatotomato37 · on May 26, 2018

The problem isn't the odd paranoid submitting a delete request once a month, it's when some influential person publicly requests a delete for whatever outrage is going on that day and causes his 10k followers to do the same

onion2k · on May 26, 2018

You're suggesting that a business should be able ignore the privacy concerns of its users because they're inconvenient. That is decidedly worrying. If a startup can't afford to run ethically then it shouldn't really be in business.

Dylan16807 · on May 26, 2018

Is deleting an arbitrary set of rows every fortnight such a problem?

fiddlerwoaroof · on May 26, 2018

Yeah, this sort of thing is like a pessimistic case for Cassandra and various databases that are designed to model data as an immutable set of facts and to model deletions as retractions or the like.

Dylan16807 · on May 26, 2018

Apparently it defaults to 10 days for tombstone purging and recommends not going below 5 days. How bad is performance actually going to be at a nice slow several-day compaction rate?

The pessimistic case sounds like trying to remove things within hours.

codexon · on May 26, 2018

All I can say is that not everyone's situation is the same. If you have a small forum where a few hundred people post a few dozen messages a day, it obviously won't be a big deal. There are situations where the amount of generated information is much larger than that. Webserver logs are one possible example.

It isn't an impossible problem to solve, but the GDPR is a significant time and a money burden that will especially be an issue for small startups that don't have millions in venture funding to spend on this.

yzmtf2008 · on May 26, 2018

I encourage you to read my comment again, and point out where I mentioned merely setting a delete flag. Any reader worth their salt will point out that it’s not what I suggested at all.

alexbecker · on May 26, 2018

I believe the confusion is around your statement "mark every sensitive field". I think you mean "overwrite every sensitive field", but that definitely took a re-reading to infer, and I'm still not 100% sure.

yzmtf2008 · on May 26, 2018

Thanks, the processs reminded me of the “redaction” process, so I used mark. That’s definitely on me. Clarified.

codexon · on May 26, 2018

"you could easily not switch to a CASCADE, but instead set delete=1 and mark every sensitive field with a special value"

dbpatterson · on May 26, 2018

you ignored "and mark every sensitive field with a special value", which is the key part. As long as all sensitive data has been essentially zero'd out (for some value of zero), all is fine.

codexon · on May 26, 2018

Marking a field sounds to me like labeling and not zeroing it out.

jachee · on May 26, 2018

What if "a special value" == NULL?

namibj · on May 26, 2018

If you choose that value, and it's the only, or one of the few values that break your software, then it's your fault.

cygned · on May 26, 2018

The thing is, affected persons can not only request a data deletion, but also the pausing of data processing. In that case, you are not allowed to delete them but they must not be used any longer, which is essentially a soft delete. So to be compliant, you’d have to implement both a soft and a hard delete.

stemuk · on May 26, 2018

Wouldn't it be possible to just delete the 'idetifiabel' parts in the database in order to be GDPR compliant?

If you for instance save all the user data like user preferences under a random userId, and then delete the personal data (such as email address, name etc.) associated with the userId I would expect this to be GDPR complaint without having to do a cascading delete.

pilsetnieks · on May 26, 2018

If you're absolutely certain that the user's identity cannot be reconstructed from the remaining data points, then yes, a full anonymization is enough. You are, after all, removing personally identifiable information, even if the record structure remains in your database.

It's a law, not a technical constraint. No one gives a fuck about some foreign key relations, they care that personal data cannot be accessed, or somehow reconstructed.

wlll · on May 26, 2018

This may well be harder to get right than just deleting the data. Just as in the saying (and this is a terrible paraphrase) goes:

  "Anyone can design a lock that they themselves can't pick"

If you think you have anonymised data sufficiently you may well not have done it sufficiently to prevent others from re-conctructing it:

https://en.wikipedia.org/wiki/AOL_search_data_leak

ianamartin · on May 26, 2018

Yes, but this is actually more difficult than you think. It doesn't take very many data points to ID a user.

lagadu · on May 26, 2018

Anonymizing like that would be GDPR compliant yes, as long as the remaining information absolutely cannot be used to identity the original subject.

sunir · on May 26, 2018

That it isn’t known with confidence is your answer to whether it is worth implying it is easy.

CaptainZapp · on May 26, 2018

I read a lot about cascading deletes, which I interpret as holding personally identifiable data redundantly.

I can see two reasons why this would be a problem:

You have a really shitty un-normalized database design. Granted that you may have to denormalize specific columns for performance reasons. But why that would be the case with, for example names, phone numbers or sexual preferences, totally escapes me.

Or, you're referring to actual cascading deletes, meaning that you need to get rid of child relations, based on deletion of the parent relation. If this poses a problem then I'd argue that you're guilty of a shitty database implementation, arguably with criminally bad definition of your primary / foreign key pairs.

I really don't see a problem here, unless the database schema is implemented in a totally incompetent manner.

Edit: Clarity

tensor · on May 26, 2018

A cascading delete is not necessary. You need only to remove personal information, not all information. Now if you are producing an application that only contains personal information like a chat app, then sure, you might need to remove everything.

But often all you need to do is overwrite the name, address, or similar bits of information, and you can then leave the rest of the data intact and set your delete flag.

yzmtf2008 · on May 26, 2018

HN won’t let me go deeper, so here it goes:

> "you could easily not switch to a CASCADE, but instead set delete=1 and mark every sensitive field with a special value"

Emphasize on the part after “and”

sunir · on May 26, 2018

That is insufficient. You can still infer identities through metadata and behavioural analysis. For instance purchase history and geolocation is often enough to identify some individuals.