immudb – world’s fastest immutable database, built on a zero trust model

spookylukey · on Dec 27, 2021

I find it very strange to have claims about the database being tamper-evident etc without a clear description of the threat/trust model, and how/for whom it works. For example, what data does the client need to store to be sure no tampering has occurred?

YogurtFiend · on Dec 28, 2021

I agree 100% with the need for the threat model. I think the current system is, at-best, a system with a tamper-evident audit log. Which, could still be interesting, but the authors' use of ill-defined terms makes it easy for people to think their software does something that it's not.

What's supposed to happen is that the server will give the client a path in a https://en.wikipedia.org/wiki/Merkle_tree to the current state, the prove that the key-value pair is included in the Merkle tree.

There are, however, some subtle issues which can arise if you're not careful. In particular, what happens if I set the key k to value v1, and then set the key k to value v2? If I subsequently ask for the value of k, I ought to see v2, and a proof that k is v1 shouldn't check out. However, in order for this to work, it's not sufficient for the server to prove that (k, v) is in the audit log, since that would allow for the server to maliciously roll-back the state. What you really want to prove is that v is not just _a_ value that k was set to, but _the most recent value_.

It's unclear to me whether the code actually does this--there's no architecture guide which describes the cryptographic algorithms at play (or what the threat model is), and the code appears to be mostly devoid of comments. There is a reference to separate inclusion and consistency proofs, which might be this distinction. But it's really hard to tell from the digging that I've done.

jeroiraz · on Dec 28, 2021

This is a very interesting point. Rollbacks are protected by consistency proofs and with inclusion proofs it's possible to detect such situation but it may require scanning over the transactions. immudb does not only provide access by key or key prefixes but it's also possible to fetch a particular tx by its unique identifier or scan over them.

For sure it's not the optimal solution, some ideas to cover this scenario were discussed but not yet fully defined.

jeroiraz · on Dec 28, 2021

I'll find detailed information in the research paper: https://immudb.io/

Basically a hash value denotes the entire state of the database (including the entire history). This hash may be cryptographically signed and exported from the server. immudb SDKs keeps track of the last verified state, each time a new one is received, it's cryptographically validated.

Whenever a particular entry or transaction is verified, the latest validated hash also used. If the entry was tampered in, hashes won't match.

joepie91_ · on Dec 28, 2021

I'd have a look at the research paper, if it weren't hidden behind an e-mail wall. I was initially happy to see a 'research paper' link at all, but that e-mail wall definitely compromises my first impression of immudb, because now it looks like I'll be getting a corporate brochure and not an actual technical paper.

> Basically a hash value denotes the entire state of the database (including the entire history). This hash may be cryptographically signed and exported from the server. immudb SDKs keeps track of the last verified state, each time a new one is received, it's cryptographically validated.

Okay, but how do you persist that hash across eg. client restarts? You obviously can't store it in the database. And this does not sound like "zero trust" to me - that's a much higher bar to meet, and would allow for eg. untrusted writers.

As I understand it right now, immudb works more or less the same way that Git does; it's a DAG of database (instead of file) mutations, and you can persist the latest commit hash to ensure that someone hasn't messed with what a branch points to.

Which can be useful, don't get me wrong, but it's not "zero trust" and it's certainly a fairly niche security feature.

Edit: To be clear, I'm very much in favour of what immudb seem to be trying to do - getting enterprises away from ultimately dysfunctional "blockchains" by providing something more sound with nominally the same features/appeal. But it's always important to be very clear about what your tech does or doesn't provide, blockchain or not.

jeroiraz · on Dec 28, 2021

first of all, thanks for your feedback and discussion :)

if entering an email for downloading the paper is a concern, we'll consider it.

immudb should be used as a traditional database (log, key-value or even a relational store - with limitations of course), so it's up to your deployment to whom you give user credentials with read/write permissions. The key difference is the state being captured by a single hash value.

Given the hash value denoting the entire state can be signed and exported. It's out of control of the server how many copies or when a validation is going to be made. Currently, official SDKs are storing the latest validated hash in a local file, but it's perfectly possible to store the hash in a remote storage, other database, etc. This will ensure data is only added but never changed once written, please note with never changed I mean it's subject to detection when a proof is requested.

immudb does not pretend to provide a complete security solution, but a key component when you deal with sensitive data.

willcipriano · on Dec 27, 2021

> You can add new versions of existing records, but never change or delete records. This lets you store critical data without fear of it being tampered.

> immudb can be used as a key-value store or relational data structure and supports both transactions and blobs, so there are no limits to the use cases.

This is game changing. Use it for say a secondary data store for high value audit logs. I'll consider using it in the future.

voidfunc · on Dec 27, 2021

What happens if you have some data that absolutely _must_ change or be deleted? For example, a record gets committed with something sensitive by mistake.

pmontra · on Dec 27, 2021

Or customers ask their personal data to be deleted, GDPR, right to be forgotten, etc.

I guess we must consider what can go in an immutable storage and what must not.

jaboutboul · on Dec 27, 2021

There is a data expiration and also logical deletion feature for exactly this use case.

joepie91_ · on Dec 28, 2021

Data expiration is easy enough if the expiry is a fixed term, you just 'chop off' a chunk of the internal DAG. But how would you implement 'logical deletion' under these circumstances?

To achieve "client does not need to trust the database engine", it would need to be possible for the client to independently walk the history of the database and verify that neither its mutations nor its order has been tampered with. For that, the actual data needs to somehow be taken into account in the signature, commit hash, whatever.

So when you logically delete data, how are you _not_ breaking that hash chain? The original data that produced a signature/hash is no longer available, and therefore not verifiable anymore. This means that the relevant commit cannot be trusted, and therefore neither can anything that comes before it.

Or are you just trusting that any 'redacted' commit is valid without actually verifying its hash? In that case you'd be compromising the trustless nature, because the database engine could autonomously decide to redact a commit.

Terry_Roll · on Dec 27, 2021

When is Money Laundering not a GDPR right to be forgotten, or is the level of surveillance too great?

Generally HW is the main factor with performance, then quality of coding and functionality that exists. For example in MS SQL BulkInsert is fastest ie importing from a txt/csv file then batch inserts then single record inserts.

Now the next factor is how many records need to be inserted?

Companies like Experian have been using a custom ISAM (dbase/clipper type) database where Mon-Sat users can read the data and all data is updated on Sunday.

It was the only way to serve a country wide levels of users with 90's HW.

It also meant the speed gains were found by doing stuff in RAM then writing out data in a serial manner to disk.

Things havent changed that much in 30 years, RDBMS systems are just another layer between HW and the end users app!

rch · on Dec 27, 2021

You should be storing potentially GDPR-covered data encrypted with entity specific keys, which are destroyed when necessary.

gnufx · on Dec 27, 2021

Right, regardless of the storage, but in the research computing circles I see, it's just not done. The promises of "data destruction" that get demanded are basically accompanied by fingers crossed behind the back (is that an international thing to "cover" for lying?) considering the filesystem and backup mechanisms etc.

jamesrr39 · on Dec 28, 2021

I'm not sure I see it as game changing for audit logs; it's relatively easy to use privileges in Postgresql to make a table effectively insert-only[1], mysql/mariadb and others almost certainly has something similar.

1: https://stackoverflow.com/questions/35919167/create-insert-o...

dillondoyle · on Dec 27, 2021

I don't totally understand the value of the second, but isn't the first already exist in things like BigQuery?

mathnmusic · on Dec 27, 2021

Or a traditional database with read-only credentials and a function that adds "ORDER BY version DESC LIMIT 1".

lojack · on Dec 27, 2021

but is a traditional database cryptographically secure? if a super user with write permissions (or, say, direct access to the physical data store) modifies records are users able to validate the integrity of the data?

mathnmusic · on Dec 28, 2021

Users just need to insert signed data into the database for tamper-proofing.

jayd16 · on Dec 27, 2021

You could use permissions and stored procedures that ensure append only.

jeroiraz · on Dec 27, 2021

The difference is that client applications does not need to trust proper "append-only" permissions were enforced on server side but they will be have the chance to detect any tampering while in the former approach, it won't be noticeable

dillondoyle · on Dec 27, 2021

that's basically what I do in bigquery sometimes just with a time or timestamp column 'last_updated' and a uuid

zimpenfish · on Dec 27, 2021

> isn't the first already exist in things like BigQuery?

You can truncate a BQ table and reload it if you want to change things. Had to do this at a previous gig (twice a day!) because the data warehouse people would only take data from BQ but the main data was in Firebase (yes, it was an insane place.)

dillondoyle · on Dec 27, 2021

oh that sounds horrible !! i would be terrified to mess that up

0xy · on Dec 28, 2021

The product is useless because it's a huge GDPR and CCPA violation waiting to happen. If you cannot comply with GDPR because of your technology choice, you are liable to large fines.

After working in the fintech space, nothing is truly immutable because of compliance.

Fintech -- the customer which should want this most -- can't use it.

jaboutboul · on Dec 28, 2021

data expiration features were mentioned in other comments.

zaza3311 · on Dec 28, 2021

You can comply with GDPR, you can set logical data expiration.

ShamelessC · on Dec 27, 2021

> Data stored in immudb is cryptographically coherent and verifiable. Unlike blockchains, immudb can handle millions of transactions per second, and can be used both as a lightweight service or embedded in your application as a library. immudb runs everywhere, on an IoT device, your notebook, a server, on-premise or in the cloud.

Seems pretty useful actually. Can anyone with a relevant background comment on when this would be a bad idea to use?

staticassertion · on Dec 27, 2021

If you can trust your writers there's likely no need for this. A modern approach tends to have databases owned by a single service, which exposes the model via RPCs. So you generally don't have more than one writer, which means you're pretty much de-facto "zero trust" if that single writer follows a few rules (ie: mutual auth, logging, etc).

But in some cases you don't have that same constraint. For example, databases that store logs (Elastic, Splunk, etc) might have many readers and writers, including humans.

In that case enforced immutability might be a nice property to have. Attackers who get access to your Splunk/ES cluster certainly will have fun with it.

imglorp · on Dec 27, 2021

There are a few properties to be aware of. Although it might be a KV store, you're probably going to want sensible queries using other than the primary key. Eg time series or secondary keys. So in addition to the KV store, there is probably a need for an external index and query mechanism. Another issue is obtaining consistent hashing, where multiple documents might have the same content but vary by order or by date format. Finally, do you have to go to the beginning and hash everything to get a proof of one transaction, or is there some shortcut aggregation possible?

We evaluated AWS QLDB for these things in our application as a financial ledger and were impressed at their progress with a novel data store. They invented some of the tech in house for this product instead of grabbing an off the shelf open product. Lockin would be a downside here.

Immudb looks promising because it's not locked to a cloud host.

https://aws.amazon.com/qldb/faqs/

mistrial9 · on Dec 27, 2021

> Eg time series or secondary keys

not an "all or nothing" question.. for example, a fast-enough "return the most recent in a time series" is not exactly time-series, but solves many use cases

jaboutboul · on Dec 27, 2021

note that it does KV and SQL

KarlKemp · on Dec 27, 2021

The data that is at risk of being changed with malicious intent is certainly not insignificant, but still just a fraction of all data. Changing to this adds a new and complicated system, replacing whatever you're currently using, which will have seen far better testing and is known by the people working with it.

rattlesnakedave · on Dec 27, 2021

Seems like it would still be vulnerable to rollback attacks. Signed rows would probably get you farther with less novel tech involved if you want immutability.

newtonapple · on Dec 27, 2021

Has anyone tried immudb in production? What are some of immudb's performance characteristics? It'd be nice to know how it performs under various conditions: query per sec, database / table sizes, SQL join performance etc.

Also, what are the system requirements for immudb? What kind of machine would I need to run a medium to large website (say, 1TB of data, 5-25K qps, e.g. Wikipedia)?

It mentioned in the documentation that it can use S3 as its storage? Are there performance implications if you do this?

timdaub · on Dec 27, 2021

I went on their website and tried to understand how immutability is enforced but I couldn't find anything.

I'm sceptical, but particularly because they make a deliberate comparison to blockchain that I doubt they'll be able to deliver.

The PoW immutability of e.g. BTC and ETH is strong as it yields the following guarantees for stored data:

- Immutability of the BTC blockchain is protected through all cummulative work that has happened on a specific branch of the chain. Even if someone replayed BTC, it'd take millenias to recompute the work on an average machine

- The immutability isn't enforced on a file level, as I suspect it is with immudb. Immutability is enforced through the network that has additionally shown to have conservative political views too. You can go, sync a BTC node and change the underlying level db. Still that won't change the network state. Immutability on a single system is physically impossible if e.g you consider deleting the file as mutation.

- immudb says "it's immutable like a blockchain but less complicated", but Bitcoin isn't more complicated than some sophisticated enterprise db solution.

- I think immudb should be maximally upfront what they mean by immutability: It seems they want to communicate that they're doing event sourcing - that's different from immutability

Finally there's a rather esotheric argument. If you run an immutable database as an organizatiom where one individual node cannot alter the network state but you have (in)direct control over all nodes: Isn't it always mutable as you could e.g. choose to swap out consensus?

So from a philosophical perspective, then immutability can truly only occur if mutability is out of an individual's control.

Why do I have the authority to say this? Because I too have once worked for a database with blockchain characteristics called https://www.bigchaindb.com

Edit: The best solution that also has a theoretically unlimited throughput is this toy project: https://github.com/hoytech/quadrable

Conceptually, it computes a merkle tree over all data and regularly commits to Ethereum. Through this commitment the data may still change locally: But then at least would be provably tampered. So I guess for databases, the artibute we can really implement is "tamper-proof".

layer8 · on Dec 27, 2021

I’d say the attribute is "tamper-proof history", not "tamper-proof data (current content)".

thebean11 · on Dec 28, 2021

I may be missing something but I don't think this prevents rollbacks. Can a client prove that the server rolled back / lost data?

timdaub · on Dec 28, 2021

In the merkle tree committment example, I don't think it allows identifiying e.g. a fraudulent rollback just from looking at the hash.

If you wanted to identify a fraudulent rollback, I think you had to constantly replicate all transitions and recompute the on-chain merkle hash. If you found discrepancies in the on-chain hash and yours, it'd mean that either node ran out of synch for some unknown reason.

Then, if e.g. other nodes too replicated your state, you may be able to conclude that the wrong-hash node had a (byzantine) fault etc.

But I think that at least in the case of the Ethereum Plasma architecture, just committing a state aggregate in form of a hash wasn't enough because of some block withholding attack. From my understanding, hoytech's quadrable is just a chain committment. Similar to Plasma.

Modern rollup architectures end up storing all transition state in Ethereum calldata state. This way proving fraudulent validation is possible and things become conceptually simpler as you can rely on the strong guarantees of e.g. data availability of the PoW chain.

layer8 · on Dec 28, 2021

Only by storing (a hash of the) previous state client-side and comparing with the audit log for the current state, i believe.

thebean11 · on Dec 28, 2021

Is that enough to prove a rollback happened? How do you validate that hash without the data?

layer8 · on Dec 28, 2021

You can think of it like a Git hash. If there was a rollback, the hash won’t be in the audit log (commit history in Git terms).

jeroiraz · on Dec 28, 2021

the hash is cryptographically signed by the server. So whatever state the database is, there must be a proof from the signed state up to the current one. Otherwise, the client application should handle this situation with care, as the database may be compromised.

layer8 · on Dec 28, 2021

Those signatures don’t prevent or expose or otherwise impede server-side rollbacks. They are thus immaterial to the question raised.

jeroiraz · on Dec 28, 2021

is out of the scope of immudb to prevent rollbacks if someone gain access to the filesystem. But even if that happen, immudb sdks will detect such situation when requesting a proof

jandrese · on Dec 27, 2021

The big question is if someone gets on your DB server and wants to change a record how does the software prevent them from altering a record and then recomputing the remainder of the chain?

layer8 · on Dec 27, 2021

That’s presumably out of scope. The scope of the guarantees is restricted to the operations made available by the database API. Just like the ACID guarantees of regular relational databases are.

thebean11 · on Dec 28, 2021

Which makes any blockchain comparison extremely stupid since it doesn't even attempt to solve the same threat model..

TomSwirly · on Dec 27, 2021

> but Bitcoin isn't more complicated than some sophisticated enterprise db solution.

It is, however, hundreds of thousands times slower as a database.

timdaub · on Dec 28, 2021

At least for money transmission use cases that's a rather weak argument considering that my local bank calls me in person to confirm transactions above 2k EUR.

tarr11 · on Dec 27, 2021

Previous HN thread about immutable databases:

https://news.ycombinator.com/item?id=23290769

pharmakom · on Dec 27, 2021

Is it possible to delete data for compliance reasons? Not as a frequent operation, but say on a monthly batch?

jeroiraz · on Dec 27, 2021

logical deletion is in place, physical deletion is already on the roadmap

xvector · on Dec 28, 2021

How can physical deletion work with the Merkle Tree mechanics?

Do you just store the hash but not the underlying KV pair?

jeroiraz · on Dec 28, 2021

the hash tree, transactions headers, payloads and index are all independently stored. So it will be possible to physically delete payloads and still be able to build all the proofs (but not being able to provide the actual values).

Truncating transaction headers should be possible, but each physical deletion will be associated to a real lost of data, so proof generation may be affected on deleted data.

kristjansson · on Dec 28, 2021

That’s a lot of words to say no

artemonster · on Dec 27, 2021

Can someone ELI5 how immutability applies to databases and which advantages it brings. Thank you!

gopalv · on Dec 27, 2021

> immutability ... which advantages it brings

Immutability brings a bunch of perf short-cuts which is usually impossible to build with a mutable store.

You'll find a lot of metric stores optimized for fast ingest to take advantage of the immutability as a core assumption, though they don't tend to do what immudb does with the cryptographic signatures to check for tampering.

Look at GE Historian or Apache Druid for most of what I'm talking about here.

You can build out a tiered storage system which pushes the data to a remote cold store and keep only immediate writes or recent reads locally.

You can run a filter condition once on an immutable block/tablet, then never run it again (like a count(*) where rpm > X and plane_id = ?) can be remembered as compressed bitsets of each column, rather than as final row selection masks. Then reuse half of that when you change the plane_id = ? parameter.

The fact that the data will never be updated makes it incredibly fast to query as you stream more data constantly while refreshing the exact same dashboard every 3 seconds for a monitoring screen - every 3s, it will only actually process the data that arrived in those 3 seconds, not repeat the query over the last 24h all over again.

The moment you allow even a DELETE operation, all of this becomes a complex mess of figuring out how to adjust for changes (you can invalidate the bit-vectors of the updated cols etc, but it is harder).

jandrese · on Dec 27, 2021

If the data is being added or updated continually how do you prevent the database from growing without bound?

gopalv · on Dec 28, 2021

> you prevent the database from growing without bound?

The whole point of shipping cold storage off to S3 was to solve that kind of scale problem.

You can keep your ingest nodes scaled up to the incoming data (per-day, approx), replicate 3-way for HA and use SSDs for commit throughput.

The query nodes scaled up to the working set sizes, but auto-scale up/down based on the workload scan sizes (no need to keep them running 24x7, cheaper to throw away the cache after a workday - no need for replicas, just jitter them so that the entire cache doesn't go poof at the same time + hit s3 throttling on the next query).

And the S3 bucket is literally infinite storage (more like caps out when the metadata about the s3 backed items hits 8 Tb).

Run the equivalent of an fsck every quarter to check the checksums, then re-encrypt them with a new key or to recompress the blocks by swapping them (go from lz4 to zstd as they age out).

There is a mechanism to expire data after 7 years of storage (I guess it won't be queried anymore, so there'd be nothing "live" to expire?), but that might be longer than the current architecture lives on without a refactor.

mjh2539 · on Dec 27, 2021

You don't. You just keep throwing disks at it.

throwaway984393 · on Dec 27, 2021

Immutability is probably the most powerful concept that applies to how modern technology can be used. Versioned, immutable, and cryptographically-signed artifacts do a bunch of things for you.

From an operational standpoint, it allows you to roll out a change in exactly the way you tested, confident that it will work the way it's intended. It also allows you to roll back or forward to any change with the same confidence. It also means you can restore a database immediately to the last known good state. Changes essentially cannot fail; no monkey-patching a schema or dataset, no "migrations" that have to be meticulously prepared and tested to make sure they won't accidentally break in production.

From a security and auditing standpoint, it ensures that a change is exactly what it's supposed to be. No random changes by who-knows-who at who-knows-when. You see a reliable history of all changes.

From a development standpoint, it allows you to see the full history of changes and verify the source or integrity of data, which is important in some fields like research.

bob1029 · on Dec 27, 2021

There is also a performance advantage if you can build everything under these constraints. A pointer to something held in an immutable log will never become invalid or otherwise point to garbage data in the future. At worst, whatever is pointed to has since been updated or compensated for in some future transaction which is held further towards the end of the log. Being able to make these assumptions allows for all kinds of clever tricks.

The inability to mutate data pointed to in prior areas of the log does come with tradeoffs regarding other performance optimizations that expressly rely on mutability, but in my experience constraining the application to work with an immutable log (i.e. dealing with stale reads & compensating transactions) usually results in substantial performance uplift compared to solutions relying on mutability. One recent idea that furthers this difference is NAND storage, where there may be a substantial cost to be paid if one wants to rewrite prior blocks of data (depending on the type of controller/algorithm used by the device).

fouc · on Dec 27, 2021

> A pointer to something held in an immutable log will never become invalid or otherwise point to garbage data in the future.

Now I'm wondering if we can have immutable versioned APIs

throwaway984393 · on Dec 28, 2021

Very good observation! I believe that's the future of SaaS APIs (though I don't know if anyone has started working on it yet).

Know how they have this tool Terraform that "orchestrates infrastructure" ? It's actually a configuration management tool for APIs; it expects the state to unexpectedly change under it, and it will attempt to "fix" it (in some cases). Because the APIs don't support immutable versioned calls, the result can (and does) randomly fail or result in different outcomes each time you run Terraform. There's really no way to know if a call will succeed until you call it, leading to regular situations where production is half-deployed, half-broken, until someone can manually fix it. Not only that, but multiple people can apply conflicting changes to different components one after the other, leading to an untested and possibly broken result. (Terraform makes an attempt to track its own changes in a state file, but the state file is not the same as the actual state of AWS, so often the state file actually prevents Terraform from fixing conflicts)

The only way to avoid that whole mess of constantly-mutating unreliable changes is for the APIs to support immutable operations, so that all changes necessary can be applied at once, and you can safely revert to previous tested state if necessary.

whichdan · on Dec 28, 2021

What would be the benefit of an immutable API?

Semi-relatedly, Stripe has a great post[0] on how they handle backwards-compatibility with their API.

[0] https://stripe.com/blog/api-versioning

hodgesrm · on Dec 29, 2021

> It also means you can restore a database immediately to the last known good state.

How is this different from preserving a database UNDO/REDO log permanently?

yencabulator · on Dec 28, 2021

They talk about performance a lot, but their benchmarks seem to always explicitly batch the inserts. I'm seeing <700 inserts per second in a simple loop. Quietly reporting batched inserts is a subtle lie.

jeroiraz · on Dec 28, 2021

each transaction may have several entries, in order to gain more performance, it's better to include several entries per transaction. But all the entries in the same transaction are atomically stored.

submitting several transactions at the same time is also possible and convenient, processing is concurrently done until the last step which requires a strict serialization.

benchmarking is a huge topic and we'd love to have contributions on this. It may refer to using embedded immudb, as a stand-alone server, kv, sql, etc....

vyrotek · on Dec 28, 2021

Is this comparable or different than these "cryptographically verifiable" ledger DB services?

https://docs.microsoft.com/en-us/azure/azure-sql/database/le...

https://aws.amazon.com/qldb/

chalcolithic · on Dec 27, 2021

>millions of transactions per second I wonder if I wanted to survey a landscape of all databases that claim such numbers how could I possibly find them?

ledgerdev · on Dec 27, 2021

Does this have, or are there any plans for a change-feed? Has anyone used this as an event sourcing db?

YogurtFiend · on Dec 27, 2021

I'm not sure that this is a _useful_ tool. Let's talk about the threat model or the attacks that this defends against.

If a Client is malicious, they might try to manipulate the data in the database in an untoward way. In a "normal" database, this might cause data loss, if the database isn't being continuously backed up. But immudb does continuous backups (effectively, since it's immutable) so, if a malicious client has been detected, it's possible to restore an older version of the database. The real problem is how would you know that a client has tampered with your database? Well, because this database is "tamper-proof," duh! But the issue lies in the definition of tamper-proof. From my reading of the source code and documentation, the "proof that no tampering has occurred" is a proof that the current state of the database can be reached by applying some database operations to a previous state. As a result, a malicious client could simply ask the database to "delete everything and insert this new data," to make the database look like whatever it wanted. This is a valid way to transition the state of the database from its old state to the new state, and so shouldn't be rejected by the tamper detection mechanism.

"Ah," but you say, "it would look super sus [as the kids say] to just delete the entire database. We'd know that something was up!" The problem with this solution is how are you going to automate "looking super sus?" You could enact a policy to flag any update that updates more than N records at a time, but that's not really a solution. The "right" solution is to trace the provenance of database updates. Rather than allowing arbitrary database updates, you want to allow your database to be changed only by updates that are sensible for your application. The _actual_ statement you want to prove is that "the current state of the database is a known past state of the database updated by operations that my application ought to have issued." Of course what are "operations that my application ought to have issued?" Well, it depends how deep you want to go with your threat model. A simple thing you could do is have a list of all the queries that your application issues, and check to make sure all operations come from that list. This still allows other attacks through, and you could go even more in depth if you wanted to.

Importantly, immudb doesn't appear to contend with any of this. They claim that their database is "tamper-proof," when in reality you'd need a complicated external auditing system to make it meaningfully tamper-proof for your application. (Again, a threat model ought to include a precise definition of "tamper-proof," which would help clear up these issues.)

It's also worth comparing this to https://en.wikipedia.org/wiki/Certificate_Transparency, which is an append-only database. Compared to immudb, the _exposed data model_ for certificate transparency logs is an append-only set, which means that it doesn't have any of these same problems. The problem with immudb is that the data model it exposes is more complicated, but it's built-in verification tools haven't been upgraded to match.

(Also, for context, I've tried to obtain a copy of their white paper, but after an hour the email with the link to it never arrived.)

layer8 · on Dec 27, 2021

Regarding backups, note that you still need separate backups with immudb.

renewiltord · on Dec 28, 2021

If I drop the trust requirement, what's the absolutely fastest blazing fast thing I can use that is network readable/writable and fault-tolerant?

cabalamat · on Dec 27, 2021

Would it be possible to have something like this that works by writing to a PROM? That would make it immutable at the hardware level.

yencabulator · on Dec 28, 2021

Note: no `ALTER TABLE` in their SQL.

Sounds like maintaining this over longer term with evolving data would be quite painful.

jeroiraz · on Dec 28, 2021

sure, schema changes are already in the roadmap

throwaway81523 · on Dec 28, 2021

This sort of reminds me of happstack, though the design and implementation are much different.

infogulch · on Dec 28, 2021

I'm interested in how they intend to implement data pruning from the roadmap.

furstenheim · on Dec 27, 2021

GDPR compliance will be tricky. How does one delete data?

fragmede · on Dec 27, 2021

Store the data encrypted, then delete the keys when requested.

endisneigh · on Dec 27, 2021

This isn't really deleting it though. What happens if in the future technology changes and current cryptography is moot?

_qzu4 · on Dec 27, 2021

Then fire up a new database with the latest customer data every 18 months. And completely delete the old database once you confirm it no longer has value.

endisneigh · on Dec 27, 2021

I thought the point of this is to have an exhaustive record for audit purposes.

cookiengineer · on Dec 27, 2021

Or just store the customer database in /tmp and reboot the server every 18 months. /s

jeroiraz · on Dec 27, 2021

currently it's logical deletion and time-based expiration. Actual values associated to expired entries are not fetched. Physical deletion is already in the roadmap.

ledgerdev · on Dec 27, 2021

My preferred method is to tokenize sensitive data before storing in the immutable logs/database.

KarlKemp · on Dec 27, 2021

Pruning is on the roadmap.

jquery · on Dec 27, 2021

How is it immutable if you can prune it?

jeroiraz · on Dec 27, 2021

several solutions may be possible. Simplest would be to delete payloads associated to entries. While the actual data won't be there, it will still be possible to build cryptographic proofs. Then it's possible to prune by physical deleting entire transaction data, which may or not affect proof generation. However, tampering will still be subject to detection.

jcims · on Dec 27, 2021

Are records atomically immutable or is there a set concept such that the lack of mutation can be verified over a set of records?

jeroiraz · on Dec 28, 2021

every change is made by appending a new transaction. immutability is intrinsic to transaction processing, it can not be disabled. Once a transaction (it may include several entries) is committed, the database state change accordingly and no change into already committed transaction may be done without clients being able to notice it. Note for this to be ensured, clients of immudb should keep track of the latest verified state. official sdks handle this for end-applications

okr · on Dec 27, 2021

You clone the database and remove/update the corresponding lines. GDPR does not mean, you have to fix it right away, imho.

peoplefromibiza · on Dec 27, 2021

But you have to do in a pretty short timeframe

> Under Article 12.3 of the GDPR, you have 30 days to provide information on the action your organization will decide to take on a legitimate erasure request. This timeframe can be extended up to 60 days depending on the complexity of the request.

even if they ask for more time, first communication has to come within 30 days

topspin · on Dec 28, 2021

So, no deadline for the actual deletion? Your 'first communication' could say "your data will be deleted at some point between now and ten years from now" and be compliant?

peoplefromibiza · on Dec 28, 2021

No, it says: 30 days, maximum 60, if you need 60 you have to communicate it within the first 30 with a motivation.

It looks pretty straightforward English to me.

GDPR is published in 24 languages, including English, I don't know why people still don't get it and what's so hard to understand.

It's not a single law, it's a collection of articles, the 17th says that data should be erased without undue delay.

We don't have common law in Europe, EU is mostly civil law (emphasis on civil) or Romano-Germanic law. The only exception is Scandinavian law, which is very similar to (and a subgroup of) civil law anyway.

gnabgib · on Dec 27, 2021

Article 17 does include the term "without undue delay"[0], but such vague language seems ripe for some court precedent.

A clone and remove/update per GDPR request seems like undue delay, certainly one that could be avoided by alternative architecture choices (keep the personally identifiable information (PII) in a mutable store)

[0]: https://gdpr-info.eu/art-17-gdpr/

sigzero · on Dec 27, 2021

No, it's not undue delay. That's just how it currently works and that is a fine argument.

1cvmask · on Dec 27, 2021

Words like immutable make me allergic

abc_lisper · on Dec 27, 2021

See a doctor then . It isn’t expected. Could be a lack of CS education, in which case, read some books. If that doesn’t fix it, see a psychiatrist - something could be wrong with your brain.

gigatexal · on Dec 27, 2021

So is this a useful alternative to blockchains or just hype?

throwaway984393 · on Dec 27, 2021

  Don't forget to star this repo if you like immudb!

I didn't realize GitHub had "Like and subscribe" culture now. : /