Hacker News new | past | comments | ask | show | jobs | submit login

I think you are referring to tombstoning. That's usually a temporary process that may immediately delete the underlying data, keeping a tombstone to ensure the deletion propagates to all storage nodes. A compaction process purges the underlying data (if still present) and the tombstones after a suitable delay. It's a fancy delete that takes some time to process, but the data is eventually gone. You could turn off the compaction, if you wanted.

I believe Kafka make deletion difficult, since it's an append-only log, but Kafka doesn't work well with laws that require deletion of data, so I don't believe it's a popular choice any longer (I.E. isn't modern).




If you run a DELETE FROM in any modern sql engine, which is the absolute best you could expect when asking for a delete in the UI^, the data is nowhere near gone. It’s still in all the backups, all the WALs, all the transactions that started before yours, etc. It’s marked for eventual removal, and that’s it. Just as the definition of delete I provided says.

^ (more likely they’ll just update the table to set a deleted flag)


> eventual removal

To me, the idea that the deletion takes time to complete doesn't negate the idea that the data will be gone once the process completes.

WAL archive and backups are external systems. You could argue that nothing supports deletion because an external backup could exist, but that's not a useful conversation.


Going back to the point of the the thread, we agree the deleted data is not erased. The user is unable to access it through normal mechanisms, but the existence of side channels that could reveal it does not negate the idea that it has truly been “deleted”, especially when one looks at the historical context surrounding that word.


What? I don't agree with that.

Can you point to an example of a modern database that "supports deletion" but keeps the data around forever? Maybe I've just used different tools than you. Knowing modern data retention concerns I'd be surprised if such a thing existed.


Who said anything about that? We’re talking about side channels and eventual^TM deletion. Given enough time no information will remain anywhere, sure. But that’s not very relevant.


I think we are trying to define the word "delete". You found an archaic definition and are trying to use it in a modern technical setting. You've claimed that modern databases delete without actually removing data but haven't pointed to which systems you are talking about. I'm familiar with tombstoning, either as a "soft-delete" or as part of an eventual deletion process. But I've never seen that called deletion as that would be very confusing.

Pointing to which database you are talking about should clear this up quickly.

I don't think it's reasonable to talk about backups here. A backup is external to the database so it inherently cannot delete it. Similar to how a piece of paper cannot destroy a photograph of the paper, but burning the paper destroys it.


I used the first definition of delete I found which, while arguably “archaic”, matches the modern technical term almost exactly. We’d typically call that a well known word with a clear meaning.

And sure, the DELETE FROM statement in postgres - or any other standards compliment sql db I know.


In technical writing you often don't want to use the dictionary for definitions, similar to how words in a contract can have unexpected meaning in a legal setting.

For Postgres you've got to consider vacuum. Auto vacuum is enabled by default. Deleted rows are removed unless you go out of your way to make it do something different.


Imagine the data that was deleted is of the highest level of illegality you can imagine. Under no circumstance can your service be associated with that content.

- What was your "definition of delete" again?

- You mentioned some of the convenient technical defaults your frameworks and tools provide out-of-the-box, can you think of ways to improve the situation?

(You might re-run delete requests after restoring a backup; transaction should resolve in a timely fashion, failed deletes can be communicated to the user quickly etc.)


We are missing the point here. The GP was claiming that delete meant something other than adding a mark to an item that you want to eventually be removed from the system. It doesn’t.


I understand that you describe the status quo in many systems today.

However, besides the technical aspect you talked about the "absolute best you could expect when asking for a delete in the UI^".

I think this where I, other posters in the thread, most people, and probably the GDPR and other legislature, would disagree. We expect significantly more effort to clean up deleted data.

This includes, for example, the ability to delete datasets from backups, as well as a general accountability of how often and where all the data is stored and if, and when a deletion process is complete.


> GDPR and other legislature

Nope. GDPR allows deleted data to be retained in backups so long as there is an expiration process in place. Doesn’t matter how long it is. But certainly nobody has a right to forcing a company to pull all of their backups from cold storage and trove through them all any time any deletion request takes place. That’d be the quickest path to Distributed Denial of Bank Account Funds imaginable. Even the GDPR isn’t that bone-headed.

But yes, it is part of the law that the provider should tell you that your data isn’t actually being erased and instead it will be kept around until they get around to erasing everything as part of their standard timelines. But that knowledge doesn’t do anyone much good.

> CNIL confirmed that you’ll have one month to answer to a removal request, and that you don’t need to delete a backup set in order to remove an individual from it.

https://blog.quantum.com/2018/01/26/backup-administrators-th...


But GitHub is keeping this stuff indefinitely. No long expiration, no probability of eventual disk overwriting, nothing. All they're doing is shutting the front door without shutting the side door.


Interesting point about the GDPR; I will soften my point to mean that lawmakers have started (late) to regulate data retention / deletion and the rights of users in general and that might be a trend for the future.

However I would like to avoid the impression that with the description of the technical status quo the topic is settled. To do so I would go back to my previous point: Imagine some truly illegal pictures are in that cold storage backup, and one day you might have to restore that data. (Since aparently the user's wish to delete data is not quite as respected as certain other hard legal requirements regarding content)

What solutions to mitigate the situation could a company, or backup tool/web framework etc. reasonably come up with? Maybe check the restored data against a list of hashes/IDs of to-be-deleted-data?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: