I've seen this same method used on multiples apps I've requested an account deletion on. It's super frustrating. Most companies either don't respond back, say they deleted it when they merely disabled it, or they updated the account name to something else.
Disabling is understandable, because as a company you need a record of transactions or interactions such as TOS agreements for legal purposes. This requires keeping the records.
Changing object name data though is a terrible practice to implement this.
It's a bit more complicated than that. You have a period of time to delete the data. And you can keep enough info to know what data you've deleted, so that if you restore from backups you are able to re-delete without having to go through your backups and delete everything. Probably more that I'm forgetting.
You can keep contact info even after a GDPR deletion request, so long as you're not using it for business purposes.
Otherwise imagine how easy it would be to violate the deletion request if you're running a business and can't remember the names of the people you had deletion requests for. Their data could come up again through normal channels and you'd treat them no differently than another sales contact, thus violating GDPR.
Depends on type of data and other laws. If you are a paying customer you can assume your data will be stay in database, until it is no longer required for audits. GDPR allows for anything that is 'absolutely totally required for providing service'.
In most countries you are required to keep payment records for tax purposes for 5 years or more. As this is a business necessity this trumps the GDPR. And since most business involves some kind of payment it's likely most businesses will not actually fully delete the information they have on file for you.
I noticed a similar thing being done for Bird scooters a while back. I forget the suffix but they did the same and I noticed because I was still authed on my phone after requesting deletion. My token has expired since then though so for all I know they have fully deleted the account since.
Having known people who worked at Bird, I doubt it. They had a real culture of “hack it up and then move on”, going back and fixing stuff like that isn’t in their culture.
I once received an automated email to 'deleted@example.com'. Where example.com is my domain. I employ catchall, so that I can generate a new mail for each service. I contacted them and they apologised. The CTO personally explained that this was legacy they lost track of and thanked me for pointing out.
You catch a lot, with catchall: dataleaks, hacks, sneaky data sales etc. When suddenly you recieve, say, marketing mail for shirts on 'jeansonline@example.com' something fishy is going down.
I have been doing the same since 2001, the amount of crap the net catches is unfathomable. Apparently there is a company somewhere on this planet with the same name as my last name "bemis.com" while my domain is "bemis.net" and I sometimes get invoices, CVs, PowerPoint presentations, emails from their external auditors.. I am waiting for the day they will ask me to buy my domain (which I use long before their company was created)(oh and it is my last name.. so good luck with that).
Having done enough security audits, "this was legacy" is a BS excuse. I will go ahead and assume that NYT have an audit department. And that audit dept runs throug the full audit universe every 4-5 years. Someone would have captured that a long time ago (1st, 2nd, 3rd lines)(external auditors)(any sales pitch: "we have 438264728 subscribers")
I call BS. They got busted and now they most likely change this from 1000 to 2000 and call it a day..
Ps: bemis is not my real last name.. but I am using a super cool name over here!!
Doing real deletes on user accounts is a surprisingly challenging problem and I'd be willing to bet very few companies do real deletes where all of your data is wiped permanently from the company. For legal and financial reasons, companies often need to keep track of historical user activity. If a company states in their investor quarterly report that they had 1M active users, they better be able to prove it in an audit.
And in a naive relational database implementation, deleting a user would cascade and delete activity associated with that user.
The easiest way around this is to do soft deletes where the data stays in the db, but the flag deactivates the user's account. Looks like the NYT just did a poor implementation of a soft-delete.
I used to work where (not a service for the general public) there was an "is deleted" flag for everything, but every now and then a client would insist that data be really deleted, and depending on who it was and how they asked, we might go and do it, which was a huge hassle and would cause no end of problems down the line.
On the other hand, "is deleted" flags end up causing issues when you forget to put "where not is_deleted" in your queries.
Lately I've faced kind of an inverse situation - I have a system that I can't control where things are permanently deleted once in a while for multiple reasons (rogue users, aging out of old versions) and so as I accumulate information in a little data warehouse for reporting, I decided to implement an "is deleted" flag there. Eventually though, deleting from the source was turned off because it's really not necessary.
I also worked at a similar place, and fantasized about rewriting everything so that soft-deletion wasn't a per-row technical detail, but was instead an explicit modeled business-flow. Perhaps stored as a flag on an Aggregate Root (like "Customer" or "Project") at a much coarser level of detail. Sure, your queries still need to account for it, but at least you don't have a potential patchwork of inconsistent flags.
P.S.: Random advice to anybody working on enterprisey stuff:
1. "Deletion" is too vague and broad. I strongly suggest you call it "deactivation" and some other word like "purging."
2. Deactivation is typically what a company actually wants, even if they don't know to ask for it. By phrasing it that way you also encourage stakeholders to think about "reactivation" before it becomes an architectural problem.
3. True purging is rare, and tends to be related to either disk-space issues or legal requirements. In the latter case, you'll want an audit-trail or tombstone of some sort, meaning it's still a real workflow and not just an easy SQL DELETE statement or something.
The discard gem for rails has put a lot of thought into this and works pretty well, all things considered.
It's not perfect, but I find using a library (either directly for inspiration) is often a shortcut to learning about a lot of edge cases from doing your own things.
Eh, the situation that I've been in, if I recall correctly, is that one has read/write access to all data for reporting, but not the ability to create views (or stored procedures etc) to share.
Where I am now, (as far as Oracle goes) you can't even create your own tables under your own schema. A view requires a meeting with a DBA and their manager and really special, compelling arguments.
Can you just have a general policy where every table that has an is_deleted flag also automatically gets a view, and clients must use the view unless they have a particular need to access deleted data?
Not sure about drawbacks but another solution would be to change the point where you do the queries instead. So you have a "user" object that is largely saved the same as most other "crud" objects, so you have a layer for those, add the flag there.
as far as I understand this would only work for users who don't have access to "deleted" rows. Are access rules a good way to handle this? Serious question. My solution would be to have views as "guards" for every table.
I don't understand what you mean re "access". RLS is what removes access to those rows, that's the point of RLS.
In postgresql, anyways, superusers aren't subject to RLS and the table owner, by default, isn't either. But RLS can be enforced for the table owner by a single alter statement.
I've been on both sides of this insisting, if a company annoyed me too much (e.g. headhunters mailing too frequently) I'd drop the "data privacy laws" (nowadays GDPR) bomb and ask for my data to be deleted.
On the other side, a customer got really pissed off by an online shop we maintained for a client, and asked for his data to be annihilated, we thought "What a douche.".
And something I had to learn the hard way and then teach quite a few people is that hard deletes don’t just turn your tables into Swiss cheese, they also can cause table scans.
When you delete a row, every inbound foreign key constraint has to be checked to look for any rows that refer to the deleted row, and most likely you didn’t set up an index for the foreign key, so now you have a table scan. Possibly several.
It’s not that much more work these days to set up a partial index on the table instead and add another WHERE clause, you have smaller problems with people accidentally deleting the wrong thing, and you’ve started down the path to audit trails.
It's been standard practice at all the companies I've worked at to index all foreign key fields, no matter what they're used for, I've yet to run into a situation where it's been more harmful than helpful, but these companies all had <10TB data in SQL so idk if it's good general advice.
There is also a user-valuable reason to not do hard deletes. Doing a soft delete prevents another malicious user from immediately reclaiming your now-available ID and pretending to be you.
> keep track of deleted users so that their usernames can't be reused
This seems to violate GPDR, no? Attacker attempts to create an account (say: victim@gmail.com) on AshleyMadison and is prevented because the server tracked past users. Attacker could them demonstrate victim@gmail.com was at one point a user on AshleyMadison.com
As others have mentioned, that's an issue already. The solution is to never acknowledge if a user does or doesn't exist on register/sign-up/forgot-password pages and simply state that instructions have been emailed to you in all cases. The key is that you don't act differently if the user does or doesn't exist.
In this case, where you're probing for user names or emails, you don't own the email, so you wouldn't receive the verification yourself, and thus wouldn't know if the account exists.
This is exactly why most password reset emails say "if you didn't request this, please let us know, as someone may be attempting to access your account".
You shouldn't use usernames in that scenario, just emails. After Signup, you just show a general message that a confirmation Email has been sent. If the account already exists, some policy to notify the account owner can be put in place.
Verifying the email keeps someone from hijacking the account without leaking that an account formerly existed. At least so long as their email isn't also compromised - in which case they have bigger problems.
That's not much different than not being able to create an account with victim@gmail.com because victim@gmail.com already has an account. Both instance leak information
You don't have to track their emails unless you are reusing emails as usernames. Just tracking the username suffices.
This is also one of those situations where people often put too much shit in the user table. "we have to delete the user row" -- I mean, you have to delete some of the user row, yes.
I like to solve this by proper namespacing. Suppose you instead deliberately have an authUser table which just has what you need for auth -- a UUID to hook into the rest of the system, salts and passwords for direct logins, maybe a nullable date "banned_until" if you want banning; assuming you use crypto bearer tokens rather than an auth tokens table then you also want a column with a date date for "tokens last reset on"; etc. You can put the username in there just fine, that's needed for auth. Maybe you let people log in with email+password and thus you also put their email address in there, also fine.
As long as the authUser table does not grow to encompass all of your other business logic you are good. Other tables foreign key to authUser and you delete rows from them and that doesn't upset the foreign key. You leave the row in authUser to indicate that the username is taken.
An additional "deleted" field on authUser can be used to block logins and thus the username is taken but they can't log in. As for the email address, even if you insist on a UNIQUE and NOT NULL constraint for it (and I would find this surprising in an age where we log in a lot with social media) you can auto purge by setting it to CONCAT(id, "@purged.example") and then you have a valid email address which is nowhere else used in your auth flow, no personally-identifiable information at all. Heck then you don't even need the boolean flag if you would rather forbid the .example TLD from logging in.
So that has worked well for me in the past and it seems to solve those sorts of problems with only a little tweak. The key is that the PII need is to delete the "user row" but that does not have to be the authUser row -- if you separate the two rows out then you can leave the authUser row while still having a table appUser which lives in your application and contains all the cool stuff about this user using that app. It also naturally lends itself to you thinking about a sort of SSO for all of your different applications up-front.
the real GDPR problem is if the user has asked to delete data and you do this soft delete but keep all their old data as well, and then someone hacks your system and gets that data.
You're obligated by GDPR to disclose to affected parties that their data has been compromised, but you were also obligated to delete the data by GDPR.
Yeah, that is about the worst possible way to do it. If you can't do a hard delete for whatever reason, the right way to do it is to set a flag that prevents any activity on that account. They can keep the name in order to prevent anyone else from stealing it, but still delete all the profile data attached to the account.
Whatever for? Just delete it. Keep the account tombstoned so its name can't be reused. Keep what content you can and want. Delete the PII and any metadata you're contractually and/or legally required to.
Did a glance through that thread and it didn't seem like there was a strong consensus on how to respect GPDR while maintaining historical data for reporting purposes. Any best practices?
Having worked in a HIPAA regulated space, I can say that hashing the username, such as the email address, for login purposes can allow for account recovery if the credentials are retained. At the same time the cleartext username and other PII can be stored in an object that is both encrypted at rest for its lifetime, and on top of that has its sensitive fields overwritten upon logical deletion. Account recovery cannot recover non-credential derived PII but that is a small annoyance to the user in order to be compliant and trustworthy. The internal user ID should be used throughout downstream reporting rather than actual PII for the sake of continuity and privacy.
the easier way, assuming neither is a primary key, is to convert the field values to UUIDs, which has the added advantage of anonymizing the data. that's disadvantageous if you want to prevent re-signups though, unless you take other measures.
In general you don't want to delete absolutely everything. For example, usernames should not be reused, so you can't "delete" them -- you can tombstone them though, and you should. Besides tombstoning to prevent reuse, you can and should delete as much associated metadata as you're willing to / contractually or legally required, naturally.
Even what you can delete can (and will) survive in logs and backups, web archives, screenshots, etc. Deleting things on the Internet is just difficult.
>If a company states in their investor quarterly report that they had 1M active users, they better be able to prove it in an audit.
Is this even legal? I've never heard of a company letting an outside firm go through their database to confirm any sort of statistic like that. Who is doing this auditing?
It does if they take subscribers from the EU (or california) and apply this process to them. It's incredibly straightforward. If you do business in some jurisdiction, then that business is subject to the jurisdiction's laws.
Hmm, it seems I was wrong then. I had thought that they didn’t localize to any EU countries, but I guess they have more of a global market than the other papers I am more familiar with.
Some violators might successfully keep their assets out of the reach of EU enforcement, but that's going to be really tough to do for any large business with global operations.
Are soft deletes even legal in the context of privacy laws like GDPR? If I’m writing in to delete my data, I don’t really give a crap how hard it is. I want that permanently wiped, so that even if you wanted to you can’t find it again.
How that messes up your technical implementation is your problem
Hah, Uber did this to me once. Someone signed up with my email and somehow the verification failed. So they started sending me details about someone else's trips! When I complained, instead of deactivating the account or trying to contact the user to find out their actual email, they changed the address on the account to the same thing but with "void" prepended. I have a gmail account and was pretty sure that email didn't exist... Sure enough, I tried registering the new email address with Google and got nothing but Uber spam. Oh well at least they're not sending me trip details anymore.
Note that "anonymization" has been legally found (DSB-D123.270/0009) to be acceptable to meet GDPR "erasure" requirements. However, this requires irrevocable overwriting of PII rather than just slapping 1000 on the end ;-) If they'd changed the username and email address to some random string, however, they would most likely be compliant.
But they'd need to get rid of it in the 11th year, right? And even before then, they'd need to delete some of the earliest records for someone who has subscribed for more than 10 years. Lots of compliance traps remain.
I think something similar happens in the UK. It's generally believed here that laws telling you to retain data take precedence over the GDPR telling you to delete it. (I am very much not a lawyer, as you can doubtless tell.)
Yes. For example, HMRC requires that you keep various business records for 6 years (or longer, circumstance-specific) after the end of the company's financial year.
Generally, the rule is "Delete the data unless there's a law that requires you not to" — and the UK's implementation of the GDPR (the Data Protection Act 2018) makes various explicit exemptions for this.
NYT is notoriously the worst at customer service and account handling. I tried to get a previous invoice from them previously and after 1 week of calling customer support and being passed around, I still wasn't able to get it.
I had this chat with their customer service department asking to "cancel" my account so that I don't incur any charges and they insisted that it wasn't possible without losing immediate access and getting a pro-rated refund. I thought that was stupid, but ok...
1 month later, still no refund. Account is still scheduled to be auto-renewed, talk to CS and they're basically ignoring me at this point (I'm using text messaging support on a secondary number), so I issue a chargeback with my credit card...
1 month later, I get a refund, no email, no text message explaining the delay and now I have to deal with that or else they'll probably put my account in collections. /facepalm
My credit card number changed and they sent my account to collections with no notice. The lesson I learned was never to subscribe to something on the company's own website; go through Apple.
Honestly, the stories i've ready about how difficult it is to cancel your NYT account are a contributing factor to why I don't subscribe. I love NYT, but if it's not as easy to cancel as it is to sign up then count me out.
I just subscribe to the local paper (LA times) and get my national coverage from their reporting. By the time you subscribe to the wapo, nyt, economist, wsj, atlantic, you are paying a huge sum a year on redundant coverage. Better to focus on local issues that are more likely to affect my life than the national soap opera anyway.
Still love the story of the Pinboard guy’s approach to invoices. If you ask for an invoice, he sends you a blank one and tells you to just fill it out however you want.
Same in the UK, but for instances where you can't get a fully valid invoice, you can either "self invoice" (so the "fill in your own invoice" approach) or just whatever receipt you get as long as you feel happy defending it to a tax inspector later on.
If you use US companies for services in a European business, you are almost certainly going to have invoices every month that don't meet EU regulations at all and you just have to make it work.
He said in his tweet about this that when he does it to Europeans (specifically Germans, I think) they just get even madder.
I mean great if there are legal requirements for invoices, but who enforces them, how likely is enforcement, and what’s the end result for a US-based company with no physical presence in The Netherlands?
The real issue is with the local (European) company when they claim the expense against profits and the tax inspector turns their nose up at the invoice/receipt.
Being in the UK, we tend to work on a system where things are taken in context and you can defend such decisions. Maybe other tax regimes are more restrictive, but the British way is always that you can have a debate with authorities and usually they will see sense in your reasoning if you're not trying to defraud them.
Just to make this kind of confusing story less confusing—the Pinboard guy prints valid invoices (in order to be legally compliant, and because he's "not a totally evil guy"). Someone (from Germany) asked him to add Company Name to the invoice, and he replied by saying "just edit the HTML to add whatever you need".
Ha, the tweets I remember are from years and years ago. Maybe 2015? Really funny that the invoice thing has been such a consistent part of the Pinboard Experience for so long.
Since this predates threads, I can’t find all the tweets but this is one of them:
I’d bet (a small amount of) money they have no ability to delete accounts at all, and it goes all the way down to foreign key constraints introduced by a well-meaning but inexperienced developer that unnecessarily couple the accounts table to many other records.
Regardless of the constraint on the key, the design fact remains that deleting a user record that might, for example, have associated transaction data (like subscription payments) is a little complex.
You don't want to cascade that deletion to a record of credit card charges, but you also need to make sure that all queries respect that the user record might now be deleted - ie make it an outer-join.
It's far more robust to add an active/inactive field.
The longer an organization has been around, the more interconnected the database is, and the more consequent changes are needed to accommodate a core database change.
I would be surprised if many many organizations have some hack like this under the covers.
I'm learning about database design, and I'm learning that this might not be that easy if the relationships between users and other data on the site are ill-defined. There may be multiple tables in their design that will require knowledge of that flag, and it might legitimately be way easier to just add junk to the end of the user account than it is to introduce a new flag.
> it goes all the way down to foreign key constraints introduced by a well-meaning but inexperienced developer that unnecessarily couple the accounts table to many other records.
You make it sound like developers doing things the wrong way is the exception instead of the norm.
Good developers don't, but every place I've worked at has a few chunks of the software by people who didn't know or care enough to do things the right way.
> I’d bet (a small amount of) money they have no ability to delete accounts at all
Probably not. I bet that goes for so very many organizations. It is far easier to reinstate an accidentally (user) deleted account if it is merely marked as deleted than if it were truly deleted.
Now, how NYT is handling this is ... well, it's bad.
probably the fact that a good portion account deletes are users doing something they want to undo in the next few days. Soft deletes make this much easier, then going back and deleting anything older than X days is simpler than expensive customer support tickets of "how do i login it says my account is deleted"
I have used that pattern in my apps and found it works well. But then I've watched video from respected DB experts, that I learn a great deal from, where they practically beg you to stop using nullable columns.
So I'm torn, because I think there may just be a major problem I've not yet grown my apps big enough to suffer. Anyone have thoughts either way?
A lot of companies do this. My buddy found out that he had two EA accounts, so he asked nicely for them to merge them into one and they "did". Well what they actually did was grant the games to the new account and "delete" the old account.
Of course "delete" meant rename it from username@custom.tld to usernameDELETED@custom.tld. He owns the entire domain (and has catchall) so he got the notification of the changed email to the new email address.
Now he has two EA accounts with all the games on both!
It’s odious that it’s impossible to delete accounts except in rare circumstances. Ever try? All anybody does is temporarily disable them unless you go through an hour with their tech support. Dark pattern at best, holding on to your data forever to continue selling it at worst
Now a days it is a dark pattern. But in the days before "sell everything you can about your users" became the rule, businesses optimized for the accidental deletion by users. They could easily reinstate you and your data would still be there. It used to be considered good customer service. Times change.
You know HN doesn't let you delete your account, right? Even if you send them an email, apparently they're "too backed up" to handle account deletion requests.
From the Twitter replies:
"the number was appended to local-part, not the domain. I found out by going back to a tab where my session was still valid but the account dropdown had updated with the new name. Profile settings revealed the email."
https://twitter.com/bicycult/status/1255122953798328320
One possible way I can think of is they're hosting their own mail server and route messages for all unknown addresses to some mailbox (not that uncommon as far as I know - people do that to avoid bouncing mails with a typo in the address).
With such a setup it would be possible to notice that NYT mail started coming to foo1000@... instead of foo@... after requesting account deletion. Username change could also be evident directly from the mail, or it was a trivial guess.
Just have a friend working at NYT who tells you how NYT actually manages account deletions while having a pint; then try to delete your account and check whether the emails come to a new email. Write an article about it.
They are not alone in this practice. I still receive emails from sitepoint to my email (As recently as March 10th), and the user name finishes in _DELETED (Not kidding).
Netflix does that too. You can’t delete your account so they just append a string like “csr_morgan” in the domain so that your account is “deleted” (you can’t login anymore, because your email address technically doesn’t have an account anymore) and you can re-register with your email later if you wish.
But I’d you use the altered email and the same password, everything is still there.
Pretty sure this goes against GDPR but I was totally unsuccessful at getting my account deleted.
> Pretty sure this goes against GDPR but I was totally unsuccessful at getting my account deleted.
Did you report this to your local privacy regulator (the ICO in the UK for example)? Not saying they'll do anything (I guess the "4% of global turnover" fines aren't enough to motivate them) but at least there's a record of it, and if anything else, a proof of how useless the whole regulation is.
What happens if I create an account at both foo@gmail.com and foo1000@gmail.com (or even foo+@gmail.com and foo+1000@gmail.com), and then delete the first one?
"instead of actually deleting it, they simply appended '1000' to both the username and the email address. anyone could thus create an email address with that suffix and request a password request to access my info."
Most likely the 1000 is appended to the local-part of the email address, not the domain, as any tool they are using for changing details most likely validates emails somehow.
The email address doesn’t really need to be valid though. I have an old client that appended ‘|disabled’ after the email address (and torched the password) when “deleting” accounts because they needed them in the DB for audit logging.
Unless someone figures out how to register a domain ending in ‘.com|disabled’ I’m not sure how someone would be able to access those accounts.
Every week we see multiple articles about security researchers who abuses some part of the tech stack to do something weird that shows the danger in this sort of thinking.
I believe it's easy to spoof emails from the .com|disabled domain. Receiving messages, I agree, seems harder. Maybe spoof an unencrypted DNS response at the right moment? No need to actually register a domain when DNS is spoofable.[1]
If you really need to use a hack like that to disable an email, consider adding some code to your email sending logic that skips such email addresses (and always use that logic). Otherwise clever hackers have a foothold to try their tricks against.
My guess would be that they don't want to have that email accidentally used, but they would have a check in the codepath anyway, because no-one wants to see its logs spammed with myriads DNS errors when this can be avoided. And in fact, if a DNS error shows up in the logs, devs would know that somehow their code path is not completely safe, so that change on the email is perhaps a way for them to ensure that the disabled account is indeed seen as disabled by their code in every situation.
A lot of people are complaining about NYT approach, but perhaps their only fault - if one consider that not deleting for good an account is not an issue, and it seems to be a common practice in the industry - is to not use a transaction when disabling user accounts (disable email -> disable account), which is perhaps difficult with NoSQL setups?
"the number was appended to local-part, not the domain. I found out by going back to a tab where my session was still valid but the account dropdown had updated with the new name. Profile settings revealed the email."
Completely wrong, dont just munge someone's email and hope it wont work, we literally have domains for this kind of stuff.
https://www.iana.org/domains/reserved
Is it also possible that since they have a subscription model, they have to build their system around people leaving and coming back?
I mean imagine if HBO had deleted accounts during GoT and Westworld instead is suspending them. How many people had a 9 month subscription per year for years at a time?
Yes especially in New York. I once asked the Curb cab app to delete my account. They replied with an email that they did and I checked the app. My session was still valid and I could see my account details. All they changed my email address domain to @aol.com and 555’d my phone number.
XBox live used to have a similar thing, may have changed since GDPR. When I asked for my account to be deleted they sent me instructions that said basically unfriend everyone i know and change my name to deleted - this was after several days of them looking into it for me :/
Their system probably doesn't implement a function to delete account, or it is not easily discoverable in the UI, or it is known to be bugged. So the employee who did that thought that renaming an account was a good idea to make it appear "missing". A big numeric suffix is an obvious idea to avoid collisions with the existing and future accounts.
But when people are asked to choose a large number, numbers around 1000 and its multiples are chosen particularly often. So 999, 1000 and 1001 are very likely numbers to be picked "randomly". I don't find the reference on that, but I suppose we all have enough anecdotal evidence. Just recall what are the common port numbers of various programs. X*1000 + Y is a very common formula, where |Y| < 100.
I really doubt someone who thought the solution to being unable to delete an account is to change the account name put this much thought into what to change the account name to.