What happens if you have some data that absolutely _must_ change or be deleted? ...

pmontra · on Dec 27, 2021

Or customers ask their personal data to be deleted, GDPR, right to be forgotten, etc.

I guess we must consider what can go in an immutable storage and what must not.

jaboutboul · on Dec 27, 2021

There is a data expiration and also logical deletion feature for exactly this use case.

joepie91_ · on Dec 28, 2021

Data expiration is easy enough if the expiry is a fixed term, you just 'chop off' a chunk of the internal DAG. But how would you implement 'logical deletion' under these circumstances?

To achieve "client does not need to trust the database engine", it would need to be possible for the client to independently walk the history of the database and verify that neither its mutations nor its order has been tampered with. For that, the actual data needs to somehow be taken into account in the signature, commit hash, whatever.

So when you logically delete data, how are you _not_ breaking that hash chain? The original data that produced a signature/hash is no longer available, and therefore not verifiable anymore. This means that the relevant commit cannot be trusted, and therefore neither can anything that comes before it.

Or are you just trusting that any 'redacted' commit is valid without actually verifying its hash? In that case you'd be compromising the trustless nature, because the database engine could autonomously decide to redact a commit.

Terry_Roll · on Dec 27, 2021

When is Money Laundering not a GDPR right to be forgotten, or is the level of surveillance too great?

Generally HW is the main factor with performance, then quality of coding and functionality that exists. For example in MS SQL BulkInsert is fastest ie importing from a txt/csv file then batch inserts then single record inserts.

Now the next factor is how many records need to be inserted?

Companies like Experian have been using a custom ISAM (dbase/clipper type) database where Mon-Sat users can read the data and all data is updated on Sunday.

It was the only way to serve a country wide levels of users with 90's HW.

It also meant the speed gains were found by doing stuff in RAM then writing out data in a serial manner to disk.

Things havent changed that much in 30 years, RDBMS systems are just another layer between HW and the end users app!

rch · on Dec 27, 2021

You should be storing potentially GDPR-covered data encrypted with entity specific keys, which are destroyed when necessary.

gnufx · on Dec 27, 2021

Right, regardless of the storage, but in the research computing circles I see, it's just not done. The promises of "data destruction" that get demanded are basically accompanied by fingers crossed behind the back (is that an international thing to "cover" for lying?) considering the filesystem and backup mechanisms etc.