Time-travel queries in CockroachDB

josephg · on June 23, 2016

Very cool. For a long time I've wanted databases to support this so I could run query 1:

`SELECT (, $querytime) FROM table1 WHERE ...;`

Then based on some code in my application run a second query:

`SELECT FROM table2 WHERE ... AT $querytime;`

So much frontend code makes assumptions that databases aren't modified between consecutive queries (like graphql-js). With an API like this it would be trivial to make those queries correct. (And we wouldn't need to abort, and we can cross database instance boundaries in interesting ways). The timestamp is also super useful for doing isomorphic rendering of live-bound data - the server can send the timestamp of the data queries it used to render the page. When the client JS loads it can reconnect to the server and pass the rendered timestamp back to check for deltas between the rendered version and the current database view. This in turn lets you to safely cache server renders, as well as a bunch of other fun things.

Hats off to the CockroachDB team! Fingers crossed some of these features start making their way into other databases I love. (Looking at you, Rethinkdb!)

derefr · on June 25, 2016

That's essentially the working model of Datomic. In all other Database systems, you open "a database connection" that directly reads from live state and writes to live state (giving or taking MVCC.) With Datomic, you instead retrieve a handle to a snapshot of the database state at a given moment (which can be the present moment, or any previous moment) for reading.

For writing, you compute a change-set against a particular snapshot (making explicit the fact that you're always updating based on assumptions from possibly-stale data) and then submit that changeset to a write-linearizing "transactor" server-process that outputs new snapshots. The transactor is programmable, so it can do whatever interesting things you like to coalesce updates based on their submission order and the snapshot they were calculated against (CRDTs, last-write-wins using the snapshot-ordering, etc.)

gaius · on June 23, 2016

Very cool. For a long time I've wanted databases to support this so I could run query

This has been in Oracle for a few years now https://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns...

toyg · on June 23, 2016

Yup, but it's one of those things where you need the most expensive license, IIRC.

gaius · on June 23, 2016

I'm a little bemused by I’m not aware of any database that has implemented it

You would need to be hiding under a rock, like a cockroach I suppose, not to know that Oracle and SQL Server 2016 have this :-)

toyg · on June 23, 2016

I think we've reached a point where a lot of OSS folks never actually touched a commercial database in their life. OSS alternatives are so good that less and less people feel the need to go out and pay for something better.

In a way it's similar to some folks going crazy because you can do this or that in the browser when desktop OSes have been able to do it for ages... generational shifts.

gaius · on June 23, 2016

Certainly modern open source DBs are very capable... It is just a little frustrating to see the wheel being reinvented rather than genuine progress being made. And the big commercial DBs aren't standing still either.

mjibson · on June 23, 2016

I think CockroachDB is making genuine progress in the distributed, consistent space. We now have to add many of the existing features on top of that.

mjibson · on June 23, 2016

I searched for the syntax listed in the standard and didn't find it in production use anywhere else. I've used MSSQL in production before (last time in 2015) but never Oracle. I'm happy to now know that other SQL DBs have this feature.

aspirin · on June 23, 2016

Couldn't you do it with transaction isolation levels? for example in Postgres "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;"

Inside the transaction you will see only changes made within the transaction.

scribu · on June 23, 2016

Yes, but you would need to keep a transaction open for each user session.

I think the stateless nature of time-travelling queries would map much better to the stateless nature of HTTP requests.

grogers · on June 23, 2016

Impressive that it works correctly across schema changes.

I assume the indexes are laid out like (col, timestamp)? Can you handle an access pattern where keys are continuously inserted with the same value, and then deleted shortly after? Will queries for the value work efficiently, or have to scan all rows inserted with that value for the last 24 hours? Is there a way to tweak GC to run more aggressively on a particular table (knowing that you wouldn't be able to query back in time)?

We have an access pattern like that in our database. Rows are inserted into the table with x=null, then later x is set to some unique value, but we need to query where x is null in order to set the actual value. At one point we had a long running transaction that prevented the database from cleaning up old records, so the query for X is null grew slower and slower to the point where it almost browned us out.

state_machine · on June 23, 2016

[engineer at cockroach]

Basically, yes, all keys the KV, including the index entries, include a timestamp suffix.

Repeated edits to the same value will accumulate MVCC revisions until they are GCed, and those will be iterated over during access, so, as you point out, a use-case like yours would likely benefit from shorter a GC cutoff and GC _is_ configurable in cockroach.

pjungwir · on June 23, 2016

Yes that is fascinating!:

  However, we are able to support schema changes
  between the present and the “as of” time.
  For example, if a column is deleted in the present
  but exists at some time in the past,
  a time-travel query requesting data
  from before the column was deleted
  will successfully return it.

I've noticed that hardly anyone writes about DDL for temporal tables. Snodgrass's book doesn't cover it. What happens when you add a `NOT NULL` column or change a column to `NOT NULL`? That kind of thing. There is plenty of research on temporal tables, but I haven't found any attempt at defining rules for these changes. (If someone knows of a paper or book chapter, I'd love to hear about it!)

assface · on June 23, 2016

This is an old idea. It was in the original version of Postgres from Berkeley in the 1980s. They ended up taking it out in 1998 because systems kept running out of disk space:

https://www.postgresql.org/docs/6.3/static/c0503.htm

It's actually very easy to implement if you use the "append" version of MVCC (which Postgres does). All you need to do is just disable garbage collection (e.g., the vacuum).

clarkdave · on June 23, 2016

There's an extension to add partial support of this to Postgres: https://github.com/arkhipov/temporal_tables

I have used it in production for years and it works great, but does indeed use a lot of disk space if you enable it for all tables and have many updates.

It's not a full implementation (it lacks the new syntax like `AS OF SYSTEM TIME`) but you can work around that with `UNION` queries and the containment (`@>`) operator.

nxzero · on June 23, 2016

>> "They ended up taking it out in 1998 because systems kept running out of disk space."

Reminds me of Microsoft SQL Server removing support for natural language data queries, only to add it back years later.

It would be interesting to see a list of features from major database management systems that have been removed over the years.

assface · on June 23, 2016

> It would be interesting to see a list of features from major database management systems that have been removed over the years.

The only other thing I can think of right now are in-memory optimized indexes (e.g., T-Trees) from the 1980s. These were later removed and replaced with B+trees (or skip lists if you're MemSQL) because CPU caches (SRAM) got much faster than memory (DRAM).

makr17 · on June 23, 2016

it was also in illustra, stonebraker's attempt to commercialize postgres in the 90s.

every table had meta columns defining bounds for validity for this version of a given row. an update copied the row, applied changes, updated validity timestamps on the old and new copies.

aside from the disk space problem you mention, the other issue was that every update required a write to every index on the table, even if the indexed column hadn't been changed. makes simple updates much heavier than one might naively think...

mjibson · on June 23, 2016

Blog post OP here.

I originally wrote that I wasn't aware of any other DB that had this. I should have specified any other SQL DB, since I was aware of Datomic.

While working on this feature I searched on Google for the syntax specified ("AS OF SYSTEM TIME"), but didn't find it referenced except about a third-party Postgres extension. It is unfortunate that Oracle and MSSQL didn't come up in my searching, since they support this feature.

I've edited the blog post to be more accurate in listing some of the previous work.

baq · on June 23, 2016

i'm still surprised that not a single link to https://en.wikipedia.org/wiki/Temporal_database is present anywhere either here or in your post. on that page there's a link to a very in-depth treatment of the problem: http://www.cs.arizona.edu/people/rts/tdbbook.pdf.

mjibson · on June 23, 2016

That's the first link under the link to the SQL:2011 section about this, so it's not too difficult to get to there.

vyrotek · on June 23, 2016

It looks like SQL Server 2016 has it as well. They call them Temporal Tables.

https://msdn.microsoft.com/en-us/library/dn935015.aspx

dspillett · on June 23, 2016

A feature that made it to standard edition too, unlike CDC and related which were Enterprise only, which is nice.

Similar support can be found in quite a few places. It is in the SQL:2011 standard: https://en.wikipedia.org/wiki/SQL:2011

mjibson · on June 23, 2016

Ah, I didn't know that. It looks like you have to enable those explicitly. CockroachDB built time travel over it's existing MVCC layer, which means it works out of the box for all tables.

talles · on June 23, 2016

Amazing.

I always hated with passion keeping timestamps and messing the database design so I can have some sort of time awareness, I always thought that this is supposed to be handled by the database itself, I'm just a user.

I know there's Datomic[1] but its stack is way different than what people are used to (around here) and I'm yet to find an application that would justify using it (yeah, I'm not proud of that).

If anyone from Cockroach happens to read this, I have some questions:

* Are you guys the first database vendor to provide this as first class feature?

* How is the time travel performance-wise? What if I have a product that rely on this heavily?

* How the database would handle some deleted columns on the schema reappearing after some time? It would consider the same as before? What if the type is different?

Kudos for the feature, I barely know CockroachDB but now I have a good reason to try it.

[1]: http://www.datomic.com/

mjibson · on June 23, 2016

Blog post OP here.

No, we are not the first SQL vendor to provide this. And many other non-SQL databases also have this feature.

Performance is faster than other reads, in general, because of less risk of transaction retries.

Columns can come and go, and change types, and that is all handled correctly. If a column reappears as a different type, it will work correctly. This is because we also version the table schemas each time they change, so we can always fetch the correct schema when doing a time-travel query.

marknadal · on June 23, 2016

This is really cool but slightly frustrating because both Datomic and GUN (where I work) have talked about this for years. Several others mention Snowflake, SQL Server, and Oracle are designed for this as well.

So while cool, it seems like CockroachDB's marketing/content team is what is winning here. I don't know how they do it, but it is impressive.

mjibson · on June 23, 2016

Nah, was just me (the engineer) not being aware of Oracle and MSSQL. I should have written than I'm not aware of any other SQL DB. I was aware of Datomic, but it's not SQL. Just an error.

blandonnimrat · on June 23, 2016

Snowflake has had the same functionality with similar syntax for some time now, and for a longer default time period.

see: https://docs.snowflake.net/manuals/user-guide/data-time-trav...

disclaimer - I work there.

teraflop · on June 23, 2016

That sounds like comparing apples and oranges, though. For instance, according to the documentation, every transaction in Snowflake locks the entire table, so it doesn't seem like it would be usable for the kind of OLTP workloads that Cockroach is designed for.

(Although I admit that I haven't tested this assumption, since Snowflake appears to be proprietary.)

Johnny555 · on June 23, 2016

How far back can it go? Is there any overhead besides disk space?

pmontra · on June 23, 2016

There are two different kind of temporal queries. One is about getting the state of the db at a given time. The other is to get the state about a given time.

Example: I got paid yesterday and the operation will be recorded only tomorrow.

The first kind of query AS OF today will never show the payment. The second one will do (assuming that behavior is appropriate for the application, better examples are possible.)

CockroachDB, SQLServer and Oracle implemented the first one. That's why they are writing about backups.

Edit: see this PostgreSQL extension for the second kind of queries http://pgxn.org/dist/temporal_tables/

pjungwir · on June 23, 2016

Unfortunately the Postgres temporal_tables extension is about the first kind not the second kind: "What did the database look like on April 1st?" not "What did this customer look like on April 1st?". From your link:

> Currently, Temporal Tables Extension supports the system-period temporal tables only.

Also here is the same extension on github: https://github.com/arkhipov/temporal_tables

DanielShir · on June 23, 2016

Off topic a bit, but is anyone using CockroachDB here and can share their experience?

ddorian43 · on June 23, 2016

It's not prod yet, check their mailing lists.

guftagu · on June 23, 2016

The name itself is going to be a huge problem when you're proposing to use this at your company.

jayzalowitz · on June 23, 2016

Actually, its been easier for me in a large org. Maybe we should ask mongo if it was an issue?

gaius · on June 23, 2016

This says it is implemented in the GC - how does the history survive a restart? Oracle does it because this history is kept in the translogs on disk.

kevincox · on June 23, 2016

This is the same IIUC. They are talking about the disk gc.

Johnny555 · on June 23, 2016

Cool feature, how far back can I go? Looks like it depends on the setting of the garbage collection interval with a default of 24 hours, how high can I reasonably set that? Is there any overhead besides disk space? Would be nice to have at least a week, so if I delete something on Friday evening, I can still recover it the following week

state_machine · on June 23, 2016

[engineer at cockroach]

How high you can set it depends on your access patterns -- there is some overhead to iterating though MVCC revisions during reads, in addition to the on-disk space you mention.

If your workload involves frequent writes to the same rows, GCing some of those revisions sooner would have a greater impact, whereas if you have a write-light workload, or if your writes are spread over rows such that a given row doesn't doesn't see frequent repeated updates, then you could probably use much a higher GC threshold with minimal overhead.

brobinson · on June 23, 2016

Awesome to see this!

The only other DB I know of which has this is Oracle, where it's called "Flashback". Are there any others?

rtpg · on June 23, 2016

I don't know exactly how to do it in Datomic, but since it's append-only, it must be possible to do something like "state of DB at this point in time".

valw · on June 23, 2016

It is (without any time restriction re: GC etc.) The exact term would be 'accumulate-only' (technically, 'append-only' is more of a storage implementation detail). Interestingly, you can also make 'WHAT IF' queries i.e obtained a view as if some writes were made (but without transacting them).

kitsune_ · on June 23, 2016

Datomic

SEJeff · on June 23, 2016

Reminds me of the "Freeze Frame Filesystem" that the CS dept @ Cornell is working on:

http://www.cs.cornell.edu/projects/quicksilver/public_pdfs/f...

They want to power the backend of hundreds of millions of energy sensors for a smart power grid.

jgalt212 · on June 23, 2016

With features like this you remove 1/2 of the raison d'etre for the blockchain.

Away from applications that really need to solve the byzantine general problem, I see no reason to use the blockchain.

In fact, I wonder if there are fintech start ups using language like blockchain in their pitch docs, but in actually plan to use versioning databases like this feature here of CockroachDB or Datomic.

overcast · on June 23, 2016

This feature would actually be REALLY helpful for my http://imadefood.com project. Where I'm keeping revision history for recipe changes. Basically do a time travel select, versus a revision number select. Which I think works better in peoples minds.

Scarbutt · on June 23, 2016

Could the same be accomplish using Postgres with Kafka's Event Sourcing http://martinfowler.com/eaaDev/EventSourcing.html ?

nwrk · on June 23, 2016

Hats off to Cockroach Labs

baq · on June 23, 2016

great to see some traction for temporal databases around here. my experience tells me that pretty much every half-serious database would like to be at least system time temporal or application time temporal and preferably both at some point in time. the fact that this hasn't been more popular is kind of baffling to me. i guess it's one of very hard and totally not cool problems.