Immudb 1.0 – open-source, immutable database with SQL and verified timetravel

gregwebs · on May 25, 2021

> Data stored in immudb is cryptographically coherent and verifiable, just like blockchains, but without all the complexity. Unlike blockchains, immudb can handle millions of transactions per second, and can be used both as a lightweight service or embedded in your application as a library.

> Companies use immudb to protect credit card transactions and to secure processes by storing digital certificates and checksums.

This explanation is available on their github repo [1]. It has been a common refrain on Hacker News that you don't need a blockchain and instead can just use a database, but this product may actually fill the gap where tamper resistance is desired.

[1] https://github.com/codenotary/immudb

capableweb · on May 25, 2021

I don't quite understand how something you run yourself on your own hardware can be tamper-proof (digitally, not physically). If you're running the software you can modify it, so no matter how many processes there are in place for resisting mutability, you'll always be able to find some way to mutate it.

Compared to blockchain which is running on X number of nodes that you'd have to have access to in order to modify something, immudb doesn't actually seem to replace the use case when you need something actually tamper-proof.

rcoder · on May 25, 2021

You can build a Merkle tree from any data that is append-only; Git does this, as do ZFS and Dat/Hypercore. That lets you make strong assertions about data integrity, even without blocking local writes.

Now add a mirror: git upstream, FS snapshot, immudb replica, etc…or even just an outside log of the merkle proofs themselves. Then, if your database ever fails a check against that proof, you know the data has been modified, not just appended to.

To use a familiar Git workflow example: you can do whatever local writes you want, but if you disallow force pushes no one can erase history on the upstream repo.

Put another way: if you have immutable backups you don’t need a blockchain to ensure data integrity. OTOH, if you can’t trust your own infrastructure even as far as a secure remote backup you have other problems that a blockchain won’t solve either.

fastball · on May 26, 2021

Again though, the entire point of blockchain tech is that it's decentralized. In the git analogy, someone is "disallowing force pushes", but the person that is disallowing can modify the database.

swsieber · on May 26, 2021

You can use git in a secure decentralized way - you have people sign commits.

dmacvicar · on May 26, 2021

by the way, you can play with immudb Merkle tree visually on https://play.codenotary.com

foepys · on May 25, 2021

Blockchains like Bitcoin are actually not tamper-proof. They can be attacked by 51% attacks where you can even rewrite history if you have enough hashpower. The protocol is explicitly designed to always follow the longest chain thus the only defense is to hash faster than the attackers. This might vary for other blockchains but the biggest and most mentioned is particularly unsafe in that regard.

PhilippGille · on May 25, 2021

There are wo things you skip over:

1. You can't rewrite everything. Given enough hash power to create a longer chain, you can create a block that a) removes any transactions from a block in the original chain and b) contains new valid transactions (must be signed by you so you must be the owner of the Bitcoins used in the tx), allowing you to double spend your coins, but you can't change other people's transactions.

2. With each new block changing a past one becomes harder, while you make it sound as if you could arbitrarily rewrite history. Merchants usually wait several blocks before accepting your on-chain payment. Exchanges wait 6 blocks as that's seen as infeasible to change a block that's buried under 5 other blocks for non-nation state actors.

TL;DR: 1) Other people's transactions can at most be removed but not be changed and 2) data on the Bitcoin blockchain is tamper proof after x blocks.

XorNot · on May 26, 2021

This is ignoring that if you own 51% of the network, you are free to just rewind the chain back to whatever you think is suitable, and rebuild it from that point on.

Sure, someone, somewhere will know the truth but who cares - it's all about information control, and you control 51% of the network. Now your opponents have to scramble with social countermeasures to try and discredit your chain. A politically savvy attacker will simply buy off the main social nodes favorably.

EDIT: The problem with "oh but people will know" argument is that it assumes credibility outside the network, the thing which cryptocurrencies like to claim is unnecessary. In reality, if the network is 51% attacked, your entire recovery strategy is assuming enough people believe you about what's real or should be real. This isn't even theoretical: this is literally what happened to Ethereum.

sennight · on May 26, 2021

> This is ignoring that if you own 51% of the network...

Are you talking about some kind of node sybil attack? Because the only way I can make sense of what you've said is if a system had client implementations maintaining zero state - relying completely on unsecured network consensus. But in that scenario hashing power wouldn't enter the equation. I don't know of any cryptocurrency that operates that way, and doubt that one ever has outside of a home lab.

Or are you talking about someone trying to hijack a cryptocurrency by creating brand confusion and convincing people to run modified code, like the failed bcash campaign against bitcoin?

A 51% attack on a blockchain has no rewrite ability past the point that where an attacker establishes a longer chain. That is why it is a "chain"... where do you think the hashing function inputs come from that are being fed into the adversarial mining hardware? An earlier block - the one where the attack started, you can't go back further than that without having to throw out and rehash your entire attack fork.

zie · on May 26, 2021

You are both mostly right, as I understand it.

If I had >50% hashing power of BTC(for exactly 1 transaction), I can make the current chain say anything I want, like give me all the BTC, and it would become valid and "permanent". To re-write actual history takes a lot more work, and wouldn't be possible with 51% for 1 transaction.. If I was able to maintain 51% control for a long time, then I can do anything I want for as long as I have 51% control. Though I imagine after that very 1st transaction, the world of BTC would blow up and everyone would stop hashing BTC as there would be zero point as the current chain is now effectively useless.

This is the real issue that I see, 1 transaction of 51% power is enough to permanently wipe out all of BTC's worth. As far as I'm aware every crypto currency basically has this same problem.

sennight · on May 26, 2021

> You are both mostly right, as I understand it.

> If I had >50% hashing power of BTC(for exactly 1 transaction), I can make the current chain say anything I want, like give me all the BTC, and it would become valid and "permanent".

You clearly don't understand it, and it is kind of amazing that you think such a design would survive for as long as bitcoin has (going on 12 years) - by relying on altruistic miners not taking advantage of such a silly flaw. The Satoshi white paper is 9 pages long and written in very plain language... do like everyone use to do back in the day: read the paper and take a peek at the source, it goes a long way in inoculating you from misinformation that you subsequently repeat, accidentally (?) misinforming others.

zie · on May 26, 2021

BTC has changed a bit since the original white paper, and I have read the white paper(though it's been a while). Back it up with sources to convince me I'm wrong, don't just say I am with nothing but YOU ARE WRONG, that doesn't accomplish anything.

To help, here is my understanding.

In a < 51%(i.e. normal transaction), if I win the BTC mining lottery and get to write the next transaction, I can make it say whatever I want, but then 51% of the mining network has to agree that what I said was sane(as defined by the current mining software).

If I control 51%, then I can make the transaction say whatever I want AND I can make everyone else accept it as sane, because a majority of the network agrees with me.

This is how BTC changes over time, > 51% of the network agree that they will accept X as the new reality, and it then becomes the new reality.

sennight · on May 26, 2021

The old question about how one eats an elephant comes to mind... you've expressed such a fundamental misunderstanding that the only sensible correction is: YOU ARE WRONG, START OVER. For example, this:

> ..make the current chain say anything I want, like give me all the BTC..

You know that transactions are cryptographically secured by public and private keys, right? That would be like saying "I can h4x0r all the hotmails and rewrite every PGP armored message to say whatever I want!" Do you think that miners, upon building the next block, have the opportunity to ignore all the rules with regard to PKI?

> ..and it would become valid..

There is a very straightforward block validation sequence, your attack would impotently collapse against it for any number of reasons: https://en.bitcoin.it/wiki/Protocol_rules

> I have read the white paper.

No you didn't. Sorry to have to put it so bluntly, but you couldn't have read it and still be so comically wrong - like I said, it is written in very plain language.

zie · on May 26, 2021

I don't think you understand software like you think you do. Those protocol rules do exist, and yes cryptography is involved, obviously, but those rules exist because the miners all agree on it, see: BIP2: https://github.com/bitcoin/bips/blob/master/bip-0002.mediawi...

I.e. These rules CHANGE, and since they change(and have changed in the past) if you can convince a majority, you can change the rules to be whatever you want.

Also see: https://en.bitcoin.it/wiki/Economic_majority

Which is exactly what I said. If I have 51% of mining I can make BTC do whatever I want, but that doesn't mean the majority(or any) of the exchanges will accept it, which is basically what the above is saying.

Also see: https://en.bitcoin.it/wiki/Bitcoin_is_not_ruled_by_miners

Again, what they are saying is what I've said, they just put flowery language around it saying, see, miners can't do EVEYRTHING, which isn't technically true, but practically true. Miners can technically do whatever the hell they want, assuming they have the majority, but that doesn't mean exchanges like Coinbase will accept it and exchange the BTC for USD.

so what I said above is generally and technically true, see my other comment in this thread as well, where I said it would be a giant mess and likely ruin BTC forever if someone ever did execute a 51% attack. So there is little incentive(financial or otherwise) to do so.

The closest real-life example we have(that I'm aware of) is the BTC cash stuff, where it hard-forked and became it's own crypto currency because they couldn't get a majority to agree, but enough agreed to fork themselves.

sennight · on May 26, 2021

> I don't think you understand software like you think you do.

Sorry, can't hear you above the noise of your furious backpedaling. If somebody rewrites the protocol rules in their software to allow anything approaching what you've described (lol, "yes cryptography is involved"), then why are they worried about btc hashing power? I mean, you specifically said:

> If I had >50% hashing power of BTC(for exactly 1 transaction), I can make the current chain say anything I want, like give me all the BTC, and it would become valid and "permanent".

I'll tell you what it would look like to the rest of the network if anybody enacted your diabolical plot: suddenly somebody starts submitting invalid blocks to the network at regular intervals and only those who are monitoring for weird traffic like that even notice, then the block solve rate slightly sags until the difficulty automatically adjusts. Congratulations, you've turned your massive hashing advantage into an incredibly expensive joke.

ric2b · on May 28, 2021

> I.e. These rules CHANGE, and since they change(and have changed in the past) if you can convince a majority, you can change the rules to be whatever you want.

False, everyone running a node is enforcing whatever rules their node is written to enforce.

You could have 99% of the hashpower, if you generated a block saying that Coinbase is handing over all their Bitcoin to you (with an invalid signature because you don't have their private key), their node and the nodes of most/all exchanges and businesses would just go "lol, wtf is this shit, not a valid block because it has a transaction with an invalid signature, ignored".

A majority of hashpower is not the same as a majority of economic participants.

ric2b · on May 28, 2021

> If I had >50% hashing power of BTC(for exactly 1 transaction), I can make the current chain say anything I want, like give me all the BTC, and it would become valid and "permanent".

No, this is a big misunderstanding of how it works. Having more than 50% doesn't give you superpowers, it just means that you're able to create VALID blocks faster than the rest of the network combined (creating invalid blocks is useless), which allows you to censor transactions (by not including them in your blocks, which will be the chain with more work because you have 50%+) and you can TRY to do double-spends.

A double-spend gets exponentially harder and more expensive to pull off the longer your target waits before considering the original transaction final, because you'll have to rebuild all the blocks generated since that transaction was included in the chain, so that your parallel chain that doesn't include that transaction becomes the chain with the most work that everyone follows.

XorNot · on May 26, 2021

What I'm saying is that, if enough big players like Coinbase et al. started saying "oh there's a BTC glitch, you need to pull a new state file..." then a huge number of people would do it. Not all, but you don't need all - just enough.

zie · on May 26, 2021

Again, if they don't control 51%, their new statefile would be pointless, unless they did a hard-fork, but if I have the resources for a 51% attack on the current fork, chances are I have the same capability on the new fork.

The only way what you propose would work would be if they could cobble together enough resources to break my 51% control.

So BTC would go 100% bust if someone managed a 51% attack. The question is, can someone with 1 transaction of 51% get enough converted to USD/etc before enough people noticed. Otherwise the financial incentive isn't there to try. I'd guess no matter what, it would be a huge fricking mess and if you did it in a country that didn't like you, it probably wouldn't end well for you years later when whatever govt you live in gets around to ruining your life, even if you managed to extract a few billion.

Because you know the exchanges like Coinbase as soon as they noticed would do their best to stop you(as it's in their best interest).

ric2b · on May 28, 2021

> This is ignoring that if you own 51% of the network, you are free to just rewind the chain back to whatever you think is suitable, and rebuild it from that point on.

Not exactly, for every block you rewind there is extra work you have to do, to the point where it may take you close to 2 years to rewind 1 year's worth of blocks (because while you're doing that the rest of the network is still creating new blocks).

It gets exponentially harder/more expensive the further back you want to go.

kdragon · on May 25, 2021

You can't rewrite the history of blocks that have already been distributed. You may fool SPV nodes but any node with a copy of the blockchain (even if pruned) will reject your version.

foepys · on May 26, 2021

This is false. Please read the white paper. It clearly states in section 4:

> The majority decision is represented by the longest chain, which has the greatest proof-of-work effort invested in it. If a majority of CPU power is controlled by honest nodes, the honest chain will grow the fastest and outpace any competing chains.

If the majority of the hashpower (not nodes!) is dishonest, you can rewrite history. It's the reason why the current difficulty is part of a block.

This has been done numerous times in the past for small but highly traded altcoins.

theamk · on May 25, 2021

You can still have the extra verifier nodes, but those don't have to be on the critical read/write path.

Presumably you can create a config where have your "main" beefy server where all the activity is -- which is backed up, redundant, etc... And a bunch of "client" servers, which just pull and verify the data all the time. And the client servers notify if there are any errors using some out-of-band channel, probably using the same system you use for general server health monitoring.

So you are getting same security guarantees as "private blockchain", but with drastically higher performance, and only needing one beefy server. And the downside is that you won't auto-stop all operations on tampering, you'll only get an alert for it.

akiselev · on May 25, 2021

> immudb doesn't actually seem to replace the use case when you need something actually tamper-proof.

I think that's an unrealistic requirement. There's tamper-evident and tamper-resistant but AFAIK nothing is tamper proof. Best you can do is an HSM with a tamper resistant HMAC with keys and a running checksum in unrecoverable ROM coupled to the packaging.

capableweb · on May 25, 2021

> nothing is tamper proof

I beg the differ.

If I place a signed message in the Bitcoin chain, can you then modify that message?

If you can prove that you can somehow modify the message, I'll give you $1,000,000 USD tomorrow.

akiselev · on May 25, 2021

That's "tamper proof" in the colloquial sense. As a term of art, it means something very specific. For example, see FIPS 140-2/3 [1]

It makes no sense to say that the blockchain is tamper proof because the blockchain is just information. Tamper "proofness"/resistance is first a property of the devices storing the information - once you get into custody chains, provenance documents, etc. that's when a system becomes tamper resistant. At best the blockchain as a system is "tamper evident" in the colloquial sense because the network of all the other nodes decides which bits of information form the "real" blockchain. However, without verifying the (physical) identity and data integrity of the devices that run (at least?) 50%+1 of the nodes, you have no idea whether the system has been tampered with.

[1] https://en.wikipedia.org/wiki/FIPS_140

munk-a · on May 25, 2021

If you want to raise the bet to a few billion dollars I'll happily take you on.

But that's just a question of scale - if you have a rando-blockchain you use for immutability internally then how trivial would it be for me to spin up five servers to outhash you and rewrite history?

k_ · on May 25, 2021

Not after you post it, but by infecting your device before you make that message, and tampering when you place it in the Bitcoin chain

capableweb · on May 26, 2021

So you agree, once it's on the blockchain, it's tamper-proof?

jeroiraz · on May 25, 2021

the entire state of the database gets captured by a hash value. By having light-weight clients (or auditors) keeping track of it is how tampering is detected in despite of where the database server is running

exfalso · on May 25, 2021

This is insufficient. The strongest guarantee you can get without consensus is that the state of the DB you see on the client is/was a correct state at some point, it doesn't provide freshness/rollback attack prevention, aka that the state you see is in fact the latest one.

Keeping track of the "HEAD" hash on the clients is what consensus protocols achieve. You can also achieve it with trusted counters like the one SGX provides (depends on Intel ME so not exactly recommended, also most probably switched off in cloud environments). Alternative is an implementation of something like https://dl.acm.org/doi/10.5555/3241189.3241289.

You can of course say that it's the clients' responsibility to do this, but in practice they won't and they'll implicitly trust the server state.

Having said this, the project does look promising, we may end up using it in a confidential compute setting where clients can verify the server code running, and we'll add rollback protection on top

toolslive · on May 25, 2021

> aka that the state you see is in fact the latest one.

This is an impossible guarantee. Suppose the state that is sent to you from the server needs some time to get to you. meanwhile the state on the server could have changed. You don't even need a remote server to have this issue. Your thread (where you see the latest state) is put to sleep for a while (sheduler, os, ...) It wakes up. Is the state it observes still the latest? That's impossible to know. The only thing you can do is to refuse future updates if the state they were built upon is not the current state of the database.

exfalso · on May 26, 2021

That's what I meant. If you have many transactions building on a state hash X racing to be committed, only one of them will succeed.

With crypto protocols in general "guarantees" are always prefixed with "if the protocol completed successfully, then ...". For example authenticated DH + e2e encryption guarantees that you will send data to the intended participant only. But an attacker can still disrupt the network packets, so the true guarantee is "if the protocol completed successfully, then you have sent the data to the intended participant only".

Same thing here, you cannot of course guarantee the "latest state", if we want to go into the extreme, one could even argue that actually time doesn't work like that because of relativity/speed of light limitations:D. What you can guarantee is that if your commit protocol succeeded, then it updated the latest state at the time of consensus/monotonic counter update.

toolslive · on May 26, 2021

  > That's what I meant.

Good. Sorry to be picky, but wording is important here and you don't wanna know how often I failed to convince people of the impossibility of exactly that guarantee.

  >  time doesn't work like that because of relativity/speed of light.

You're right.

capableweb · on May 25, 2021

I see. It's a blockchain without calling it a blockchain, so people who hate blockchain can use it without having to realize they use a blockchain.

jacquesm · on May 25, 2021

Blockchain is just a special case of Merkle trees, there isn't anything original about them other than that Bitcoin served as a marketing engine for the term blockchain because some people made a ton of money with it.

https://en.wikipedia.org/wiki/Merkle_tree

joshuak · on May 25, 2021

No block chains are different than Merkle trees entirely. Block chains include previous hashes in each block, whereas Merkle trees, as the name implies are trees of hashes. In Merkle trees blocks do not include the previous block's hash.

aboodman · on May 26, 2021

In git each "block" includes the previous "block"(s)' hash. Is it a blockchain or a hash tree?

I would say that in practice what differentiates a blockchain from other applications of hash trees is a mechanism for consensus, not whether the blocks being formed into a trees conceptually represent time or not.

joshuak · on May 26, 2021

Git is a block chain. Block chains require a previous block's hash to denote sequence. The hash for any block of data can be appended to Merkle tree, even duplicate blocks.

In a block chain it is easy to find history because the link to the history is included. Merkle trees require n-1 additional hashes.

aboodman · on May 26, 2021

Disagree. There are not formal definitions for these terms, but I don't think your definition here is what most people would think of as a blockchain.

joshuak · on May 28, 2021

Looking at the wikipedia article I can see where one might be confused.

    A blockchain is a growing list of records,
    called blocks, that are linked together using 
    cryptography. Each block contains a cryptographic
    hash of the previous block, a timestamp, and
    transaction data (generally represented as a
    Merkle tree).

The Merkle tree referenced here is with respect to the organization of the transaction data contained within a block, not the blockchain itself.

Merkle trees are used in various ways within cryptography in general and cryptocurrencies specifically, but blockchains and Merkle trees are distinct data structures with different uses. The colloquial use of "blockchain" has perhaps made the word somewhat ambiguous in some contexts but not in the context of cryptographic data structures, and Merkle trees are in fact formally defined.

dmacvicar · on May 26, 2021

You can visualize how immudb Merkle tree grows as you insert data on https://play.codenotary.com

ric2b · on May 28, 2021

> there isn't anything original about them

Cool, so let's say we're using this to track financial transactions.

Server A has an Immudb at some state x and server B has an Immudb at some state y. Which one is correct, how do I decide?

decodebytes · on May 25, 2021

Rekor is just that. It's a merkle tree implementation (with extras such as timestamping)

https://github.com/sigstore/rekor

f38zf5vdt · on May 25, 2021

And with git being the most superior blockchain of them all.

ric2b · on May 28, 2021

Except it has no way to achieve consensus automatically. That's left as an exercise to the reader.

ForHackernews · on May 25, 2021

It's only the actually-useful bits of a "blockchain" without the planet-cooking proof-of-waste consensus algorithm brute-forcing sha256 over and over again.

ric2b · on May 28, 2021

And also without the useful "automatically achieve consensus between untrusted parties" bit.

rhacker · on May 25, 2021

you can do something quite simple like posting a tweet or inserting something into a public chain, like Etherium. Then follow that back to the private immudb hash.

simias · on May 25, 2021

Was there a gap in the first place? We could design temper-proof data storage since way before the blockchain. All you need is checksums, public key cryptography and a way to publish your signed checksums.

I'm not saying that this isn't a good project but it's a bit strange to frame it as if it was a major technical breakthrough.

If anything what catches my eye in this announcement is the "time travel" feature as well as the wire protocol compatibility with Postgres, that's pretty cool.

XorNot · on May 26, 2021

Getting it all for free in the product is kind of a huge win though. It's always been possible with other systems, but you still have to implement it.

nuclearnice1 · on May 26, 2021

> All you need is checksums, public key cryptography and a way to publish your signed checksums.

I agree that’s sufficient.

One non technical addition I see in Bitcoin is the incentive to verify the checksums.

c01n · on May 25, 2021

Does immudb offer mechanisms for distributed consensus, because that is one of the top features in blockchains, they do this while remaining P2P.

jeroiraz · on May 25, 2021

the order of changes is not subject to consensus, but clients have the tools to ensure no history rewrite happened

judge2020 · on May 25, 2021

sounds like git :)

stingraycharles · on May 25, 2021

I think both blockchains and git are based on the concept of merkle trees, so that sounds about right.

https://en.m.wikipedia.org/wiki/Merkle_tree

dmacvicar · on May 26, 2021

immudb has a website where you can visualize the Merkle tree in real time as you insert data: https://play.codenotary.com/

c01n · on May 25, 2021

Can Immudb work in a decentralized network while remaining secure from attacks in such networks or is Immudb meant for centralized systems if so I think you cannot compare it to Blockchains. Maybe a better comparison is Git.

jeroiraz · on May 25, 2021

immudb is not meant for public decentralized networks, although it might be possible to use embedded immudb to build a public blockchain... but that's a different story. immudb server is tailored to provide a database where any tampering will be subject to detection by any single client application consuming its data.

simtel20 · on May 25, 2021

Yeah, this interests me because I'm thinking about how to use grafeas - it's role is critical for reliable software development going forward - but storing it's data in a backend like this would add one more layer of trust and verifiability to a software supply chain. There are some interesting possibilities with making e.g. public software repos' metadata clonable verifiable and queryable via local immutable copies.

decodebytes · on May 25, 2021

Maybe take a look at rekor, part of the sigstore project, it's built specifically for software supply chain transparency (disclaimer I am one of the community):

https://github.com/sigstore/rekor

decodebytes · on May 25, 2021

Maybe take a look at rekor, part of the sigstore project, it's built specifically for software supply chain transparency (disclaimer I am one of the community). Being a transparency log, you get much better guarantees around inclusion proof (it uses a merkle tree):

https://github.com/sigstore/rekor

cryptonector · on May 25, 2021

It sounds a lot like ZFS.

What I really want is a way to get a hash of a root node / snapshot.

ampdepolymerase · on May 25, 2021

How does this compare feature wise to https://aws.amazon.com/qldb/

jeroiraz · on May 25, 2021

there are many differences (as immudb contributor): - immudb can be used embedded or client-server database while qldb is a aws service - immudb behaves as a key-value store but also provides SQL support while qldb provides a document-like data model with PartiQL language - immudb provides time travel features - immudb it's faster, built-in with a mode of operation designed for fast writes which works with eventual indexing.

Finally but super important, immudb can be deployed anywhere and it's open source!

giaour · on May 25, 2021

QLDB provides time travel features, too (if by "time travel" you mean being able to query the state of a record at an arbitrary point in the past): https://docs.aws.amazon.com/qldb/latest/developerguide/worki...

jeroiraz · on May 25, 2021

immudb already included history support for key-value entries in previous releases. But since v1.0.0, immudb provides query resolution at a given point, using the current data on that specific moment but also being able to combine data at different points in time on the same query. Is not clear to me if it's something that can be achieved with “SELECT * FROM history”, it requires up most one result per different entry (the most recent one)

giaour · on May 25, 2021

QLDB is a document DB, so you are limited to a single point or range per query. Also keep in mind `history` in QLDB is a function, not just a store of previous values; given a table "foo" and a key "bar", getting its immutable state from last Tuesday at 4 PM EDT would be:

SELECT * FROM history('foo', `2021-05-18T20:00:00`, `2021-05-18T20:00:00`) as t WHERE t.metadata.id = 'bar';

jeroiraz · on May 25, 2021

temporal features provided in immudb allows query (and subquery) resolution based on older states of the database. So for instance, it can be thought as retrieving the documents on its current state in a given time range. Querying the history of changes of a given key or document is slightly different and it's also covered with history operation in immudb.

giaour · on May 25, 2021

Ok, that sounds extremely similar to the history function in QLDB.

In the examples shown in the AWS docs, the results of a historical query are not changes made to the document, but the fully resolved state of a document at the requested timestamp (or within the timestamp range). Like other threads on this page mention, this is an unusual but not uncommon DB feature these days.

softveda · on May 26, 2021

Also Microsoft Azure SQL Database Ledger https://docs.microsoft.com/en-au/azure/azure-sql/database/le...

codetrotter · on May 25, 2021

> this product may actually fill the gap where tamper resistance is desired

I think in the future, all enterprise storage solutions will be append-only by default. To protect against cryptolocker malware. But also with isolated functionality for actually deleting data, for example because of GDPR requests or because of malware that tries to fill all writable storage with garbage. So that data can still be deleted, but not from any of the regular servers that are reading and appending data to the system. Instead from separate servers that are isolated and for data storage management only.

anentropic · on May 25, 2021

> immudb is the first database which allows you to do queries across time.

I don't think it is

e.g. Datomic already had this for a long time, no?

dspillett · on May 25, 2021

Several databases (MS SQL Server, MariaDB, Postgres with appropriate extension) support system versioned temporal tables (added in the SQL2011 standard, though I don't know if any DB entirely follows the standard) which I'm pretty sure counts as "queries across time".

Maybe they are claiming to be the first with it built-in as a core part of the engine that it is specifically optimised for, but even that might not be true.

refset · on May 25, 2021

> even that might not be true

It's not. For example, see SAP HANA's "Timeline Index" https://websci.informatik.uni-freiburg.de/publications/sigmo...

branko_d · on May 25, 2021

Oracle has had flashback queries for a long time.

Though this does not do what immudb claims:

> immudb is the first database to provide tamper-evident data management, immutable history and client-cryptographic proof.

And:

> Clients do not need to trust the server and every new client adds trust to the deployment

chatmasta · on May 25, 2021

We're building something similar to this at Splitgraph, at least in the sense that we have immutable data in a Postgres-compatible DB with point-in-time queries across versioned, addressable snapshots. In our case, we apply the idea of immutability to "data images" that are analogous to Docker images. You build and push them in the same way, and then you can reference any "image" (version) [0] of data by addressing it with the correct tag.

For example, here is a link to a live query on our Data Delivery Network (DDN) that runs a JOIN on two daily snapshots (20200809 and 20200810). [1] In this case, these images are the result of a daily script that builds and pushes a new image each day. The storage costs are minimal, as each new image only needs to store the changed rows, rather than a duplicative snapshot.

Each immutable image is comprised of a set of small content-addressable cstore fragments uploaded to object storage, which we only load into the database when they become necessary to satisfy a query. When a query arrives at the DDN, we intercept it at the network level by scripting PgBouncer with embedded Python to orchestrate the infrastructure required to answer the query. The embedded code parses the AST of the query for table references, which it uses to "mount" a temporary schema for serving the query. The temporary schema includes an FDW that implements a "layered querying" protocol (think AUFS) to lazily download only the fragments required to satisfy the query.

(Also, we support live data. But that's for another time!)

[0] https://www.splitgraph.com/docs/concepts/images

[1] https://www.splitgraph.com/workspace/ddn?layout=hsplit&query...

ignoramous · on May 25, 2021

Doesn't Bigtable, according to the 2006 paper, allow for this too?

> Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp. Bigtable timestamps are 64-bit integers. They can be assigned by Bigtable, in which case they represent realtime in microseconds...

https://research.google/pubs/pub27898.pdf

endymi0n · on May 25, 2021

Datomic does, so does Oracle, Snowflake and BigQuery.

CharlesW · on May 25, 2021

Teradata Vantage, too.

cbsmith · on May 25, 2021

Yeah, that line was a real head scratcher. I think someone in the marketing department got a bit ahead of their reality.

foobarbazetc · on May 25, 2021

It’s not marketing as much as it is “try to get a patent for something that’s been done for decades by doing it slightly differently”.

foobarbazetc · on May 25, 2021

Yeah, it’s not. Which makes the rest suspect.

waheoo · on May 25, 2021

Yes. https://youtube.com/watch?v=Cym4TZwTCNU

LukeEF · on May 25, 2021

There seems to be a growth in the number of time traveling immutable-first databases available. We have OpenCrux, Datomic, TerminusDB, Noms, Dolt, and now Immudb. Three using datalog for query and two forcing SQL (not sure about Noms).

What sort of use cases are most common? GitHub repository says:

> Companies use immudb to protect credit card transactions and to secure processes by storing digital certificates and checksums.

But I am not sure how people are building that into their architecture to be honest.

clusterhacks · on May 25, 2021

I have used "immutable" schema designs when there were strong requirements for full audit needs over time. It works very well even in a normal RDBMs system. It also allows some very neat reporting e.g. compare the same report at different points in time.

The basic idea was that every operation (create, update, delete) are actually normal SQL inserts and all reads are against views defined such that the most recent tuples are returned unless they are flagged as "deleted."

I have typically used these types of designs in mostly simple applications with tables where the row counts are in the low millions of tuples. Dealing with this design in the billions of tuples (probably sharded somehow) might have motivated us out of normal RDBMs and into one of the specialized immutable DBs mentioned.

ak39 · on May 25, 2021

Thanks. Can you comment on how this differs from a “mutable” RDBMS model but one with automatic history based on triggers for example?

akra · on May 26, 2021

I would imagine there are a few differences. Events are the primary entity, and the current state is simply a projection of that not the other way around. Those events may come from other systems and are often defined in business terms, not SQL terms. For example an event may also constitute a business update which can update one or many tables. Think of a transaction event updating a balance for two accounts.

TL;DR My thinking it allows you to capture more the intent of that event. Although I'm not sure you need an immutable database to do this from scratch - I've seen this in schema designs in the past?

robto · on May 25, 2021

Reminds me a lot of Fluree[0], an immutable, cryptographically verifiable, temporal database, but with RDF as a query language, which I think is very nice. SQL is nice because it's familiar but it's honestly not that hard to improve on.

[0]https://flur.ee/

nerdponx · on May 25, 2021

So is this something I would want to use for a basic CRUD application, and reap the benefits of time travel and immutability?

Or are there downsides that would relegate it to specific use cases? A what would those use cases be?

brokencube · on May 25, 2021

It wouldn't be suitable for any application where you care about GDPR (i.e. you store personal information and have users in the EU)

The "right to be forgotten" is not compatible with immutable data. You can't simply need to mark data as deleted, you need to 'purge' it from your system (and possible backups, depending on how long you keep historic backups) - that isn't possible in a system with immutable data.

blablabla123 · on May 25, 2021

I mean there are solutions for this. About CQRS/Event sourcing I've read that it's possible to solve it by encrypting the data with different keys and then rotating/throwing away the keys every now and then. Seems a bit hacky but probably there are more elegant approaches.

hutrdvnj · on May 25, 2021

What happens if you have to delete some data e.g. due to law?

jacquesm · on May 25, 2021

You have several options here:

- store the data encrypted using a secondary protocol, lose the key

- rewrite the whole db

If either of these is not feasible then you should have thought longer about what tech is suitable for which application. Operating your company in a legal manner is a pretty strong factor when making such choices.

remram · on May 25, 2021

Is losing the key sufficient to comply with the law? "We didn't actually delete anything but I promise I don't remember how to decrypt it" would be acceptable for the court to not e.g. seize your drives?

speed_spread · on May 25, 2021

It's the same as "we actually deleted the data and I promise we didn't keep any backup copies", except it's probably even easier to enforce, since you already to have to secure the key instead of the whole database.

imhoguy · on May 25, 2021

IANAL With GDPR right to forget you need to get rid of any identifable subject information. If you can't tell a subject from data then you comply. Encrypted data without a key is just a noise.

You are allowed to keep aggregations and hashes of data. These shouldn't allow to identify a subject. E.g. you can keep list of banned emails as MD5s to verify on sign up etc.

remram · on May 25, 2021

In this situation though, any client who still knows the key can access the data, since there is no way to remove data from the database server, or make it unavailable at the server level.

Assuming the clients and server are operated by different entities (otherwise the immutability and verifiability are not that interesting), if someone comes to the server operator with a court order and ask that data be removed, it seems like there is nothing they can do.

setr · on May 25, 2021

You can’t do much of anything if you’ve already given away the information in question — the same is true if someone copied the data itself.

You have to not give away the key in the first place, at least not to any clients that you don’t own.

E.g. following the rule “any problem can be solved with a level of indirection”, external clients get some Auth key A, which they feed to internal client, who internally maps it to some data key B, and decrypts the data and hands it back to the external client.

When the data is removed, you delete the mapping from your internal client.

hutrdvnj · on May 25, 2021

> store the data encrypted using a secondary protocol, lose the key

Thing is that you have to do this upfront. I think it's very possible to get into a situation where the data you have to delete is in plaintext. Dropping the whole DB and recreate it from scratch is a bit hefty.

cyberge99 · on May 25, 2021

I love what you’ve done. I think you may have an issue with the TimeTravel trademark however. Snowflake uses it in your exact market segment (not to mention where else it may be used in a similar context). Good stuff though, I’ll be checking it out.

tutfbhuf · on May 25, 2021

I would like to have such a database based on git. Where every change is a git commit. This should then work with things like github where you can connect to your database via github api. The db git repositories could be either private or even public. You can then deploy a serverless webpage to gh-pages and use a serverless gh-gitdb as storage.

serverless := you don't have to operate the infrastructure yourself

quasiperson · on May 25, 2021

You should check out https://www.dolthub.com/ then. They are working on something very similar.

lifty · on May 25, 2021

Check out https://replicache.dev and https://github.com/attic-labs/noms

adamgordonbell · on May 25, 2021

It seems like this is somewhat in that direction. It looks like it is using merkle trees to store the history.

parentheses · on May 25, 2021

Seems like a database that stores content hashes. Very cool but what makes it better than simply adding a table to my database (or a DB specifically for this) and running `insert into content_hashes...`?

The above approach also allows me to choose any database because I can model this data however I want.

jeroiraz · on May 25, 2021

immudb can hold the actual data. An equivalent approach using an existent database without this features will involve creating a cryptographic data structure which captures not only individual content but the entire history of changes. Also having the functionality to construct and verify the cryptographic proofs to validate read data

endisneigh · on May 25, 2021

How is this any different than taking every mutation, signing it using whatever signing mechanism you'd like and adding a column, in addition to the ones you'd like with the hash.

Then, if anything changes you know it's been mutated because the computed signature has changed.

jeroiraz · on May 25, 2021

In some way, it’s basically that but on steroids… Note that if the signature includes the previous one then you are protecting the history of changes. However, this simple approach may not scale when dealing with considerable amount of data, proving some older entry was not tampered may require to validate all signatures from that point up to the latest one. immudb employs hash trees to optimise these proofs.

softveda · on May 26, 2021

Microsoft recently announced in preview Azure SQL Database Ledger with similar mechanism as you have suggested.

https://docs.microsoft.com/en-au/azure/azure-sql/database/le...

ianpurton · on May 25, 2021

Your solution wouldn't handle the case of row deletion.

It's a little harder than you might think to make a database with tamper resistance.

endisneigh · on May 25, 2021

Oh I'm sure - but without delving into philosophy, how would you know that something was deleted and tampered with vs. Immudb (for example) being compromised and turns out it's possible to delete something without you knowing vs. it never existed to begin with?

In my mind the only way to guarantee is to maintain a copy yourself and check against the "original", but if you're going to do that, then what I described is sufficient, no?

I only mention this because the project mentions that the history is protected by clients, which I imagine is similar to what I'm describing, e.g. copying and checking against the original.

ianpurton · on May 25, 2021

> In my mind the only way to guarantee is to maintain a copy yourself and check against the "original", but if you're going to do that, then what I described is sufficient, no?

The attacker in that case could update your copy. But you have somewhat started to fix the issue.

To cover the case where a bad admin has access to the DB and any copies, you need to send a hash every so often to an outside source. In this case they use clients (I'm not sure exactly how they do this).

In fact you need a list of hashes one for every 100 rows for example. Re-generated the hashes and checking against an external source should detect a tamper.

In the case of Bitcoin (which is extremely tamper resistant) every node operator is a validator. The hashes are stored in a merkle tree.

hypertele-Xii · on May 25, 2021

According to its own description, this database does not support deletion at all.

"You can [...] never change or delete records."

rad_gruchalski · on May 25, 2021

Aha, can one take the nodes offline or if I have PBs of data, it all has to stay online, always?

JulianMorrison · on May 25, 2021

If this is deployed in a situation where record volumes are large, example: recording credit card transactions, there is going to have to be a process to "retire" old records (and perhaps, move them to external archives). The alternative is endlessly growing storage, and the resulting performance degradation.

At a first glance, I don't see anything like that in there.

dmacvicar · on May 26, 2021

The team will host a release party on Monday, May 31st at 6pm CET (18:00) - 10:00 AM PDT.

If you have questions about immudb, you are welcome to join us!

https://www.codenotary.com/blog/immudb-release-1-0-release-p...

artemonster · on May 25, 2021

Can someone ELI5 what is an "immutable database"? If you can add to the table, that means mutation, right? I am missing something...

dspillett · on May 25, 2021

> immudb is the first database to provide tamper-evident data management, immutable history and client-cryptographic proof. Every change is preserved and can't be changed without clients noticing.

Sounds like they are recording all changes (like SQL2011's system versioned tables, as implemented more-or-less by several common DB engines) but with some sort of hash-chain ledger so that history can be verified and therefore any tampering detected.

> If you can add to the table, that means mutation, right?

It isn't keeping the current view of the data immutable, but is keeping an immutable history of the data. It is immutable in the sense that nothing written to it is ever lost, and you can use the "time-travel" query functions (like SELECT stuff FROM atable FOR SYSTEM_TIME AS OF '2021-03-05') to retrieve it even if it looks to have been completely mangled or deleted if you use a non-time-travelling query.

goto11 · on May 25, 2021

It basically means "append only". You can add new data to the database, but you can't change or delete existing data.

qsort · on May 25, 2021

It's immutable in the same sense a purely functional data structure is immutable. You represent mutation by making a new version of the data structure. Of course you don't literally do that on the database because it would be inefficient, but there are several algorithmical tricks that can expose an interface that works as if.

artemonster · on May 25, 2021

that makes sense on a language level, when you hold a reference to some data and you can assume nothing can be changed about it. how does that hold on DB level?

qsort · on May 25, 2021

In the same way. A database is basically just a giant data structure, a table is not unlike a B-Tree (in some engines it literally is a B-tree). Data warehouses already do something like this informally, as they are structured in a star schema around a single "append-only" fact table.

ianpurton · on May 25, 2021

You would be able to query and INSERT but not DELETE and UPDATE.

This is useful for example in banking applications that keep an audit trail for example.

A sysadmin would not be able to update or delete items in the audit table and so can't cover up a crime.

If the database is tampered with at the file level, they have a way to detect that. (Probably some kind of merkle tree.)

artemonster · on May 25, 2021

allright, makes perfect sense. thank you!

f38zf5vdt · on May 25, 2021

SQL system versioned tables but with git hash tree versioning for every mutable command.

arpinum · on May 25, 2021

The QLDB performance comparison looks quite dodgy, but I can't find their QLDB benchmark code to see what they are doing wrong.

0xbadcafebee · on May 25, 2021

> This new functionality allows travel back in time through the data change history, and even compares these values in the same query!

So we can actually treat our databases like immutable infrastructure and actually roll back changes now without the hulking cludge that is snapshots/restores and database migrations? That's game-changing.

deknos · on May 25, 2021

this is hugely interesting, i have to look into this, but... for dev/test environments, can i have a "unverified" version, where clients reget/reset the state?

foobarbazetc · on May 25, 2021

Definitely not the first database to allow time travel, TM or not.

vchain-dz · on May 26, 2021

its the combination of cryptographic client-verification, SQL (that includes verification for every return value present and past) and being able to travel in time

slver · on May 25, 2021

I think it's the first to allow it with TM.

1cvmask · on May 25, 2021

Any major customers using this and if so how?

boshomi · on May 25, 2021

GDPR requires to erease user date if users withdraw their consent or their data are no longer required for purpose which you originally collected or processed it for.

Therefore, you must carefully check that no personal data is stored in immutable databases.

vchain-dz · on May 26, 2021

stay tuned - this is on the immudb roadmap not too far in the future

vchain-dz · on May 26, 2021

the importance is to maintain full history and verification of your actions as well, so you have proof of the value deletion.

pihentagy · on May 26, 2021

So should you also delete userdata from existing backups? :)

alrs · on May 25, 2021

> For any question contact us on Discord.

Hard no.

dmacvicar · on May 26, 2021

Alternatively, the team is hosting a virtual release-party on May 31st, 2021 at 6pm (18:00) CET.

https://www.codenotary.com/blog/immudb-release-1-0-release-p...

supergirl · on May 25, 2021

not exactly immutable is it? their docs say you can do UPSERT for example. the key is that once you update something, the clients can check using crypto that something was changed. you can't do this in regular databases.

dmacvicar · on May 25, 2021

Immutable in the sense that the old value is preserved, even if you update it, and you can't change the history (tamper-evident).

endymi0n · on May 25, 2021

There, I did it for you in PostgreSQL: ALTER TABLE table_name SET (autovacuum_enabled = false);

Snark aside, it‘s still not 100% clear what‘s the upside of using a completely different database, just for that use case.

_bohm · on May 25, 2021

Huh? Dead tuples are not queryable in Postgres.