Crux as General-Purpose Database

jandrewrogers · on Dec 16, 2019

Bitemporality is an under-rated and rarely discussed database concept. It is similar to the idea of "reproducible builds" but in a data model context. It can be indispensable for some applications.

Efficient implementations of bitemporality require first-class design support in the underlying storage engine, you really don't want to paste it on top of an existing storage engine not purpose-built for bitemporality if performance matters. Write throughput in particular tends to be terrible without a fair bit of clever engineering.

A number of database engines effectively implement limited support for bitemporality exposed as narrow features that take advantage of it, but don't expose it as a general purpose facility because of the engineering implications of supporting it in the general case.

refset · on Dec 17, 2019

This is a good summary!

I think the biggest reason why bitemporality hasn't seen widespread adoption is that it's a hard problem, both for DBMS implementers to solve and for users to adopt, when constrained to a world of SQL and tables.

Crux not only has first-class support for bitemporality in the core engine but saves the user from having to worry about how and when bitemporality impacts the schema. This is because everything in the database gets a bitemporal history by default, without the user needing to make upfront designs & decisions.

Point-in-time Datalog queries traverse the entity graph using a very simple schema-on-read behaviour and this can serve as a foundation for more complex relational modelling and constraint enforcement.

refset · on Dec 17, 2019

Hello! I am the product manager for Crux. I will try to answer questions when I get back online in a few hours.

Something not mentioned in the post is that there is a Java API in addition to HTTP and Clojure.

The Beta programme is commencing very early in 2020 - please contact us in the meantime if you are interested to hear more: crux@juxt.pro

fulafel · on Dec 17, 2019

How does the document-db nature come up in practice? For example are transactions single-document in Crux? I'm trying to understand the difference in semantics and data model vs Datomic.

refset · on Dec 17, 2019

Documents are best thought about in terms of being the unit of ingestion and history - they do not have strong implications for queries asides from having a 1:1 mapping with entities.

Each transaction may contain multiple operations and each operation is typically only relating to one document. A document represents a single version of an entity and is decomposed during ingestion into an arbitrary set of attribute-value datoms that get updated atomically and fully replace all the previous attribute-value associations of the given entity.

For more on the semantic differences with datoms see the FAQ: https://www.opencrux.com/docs#faq-comparisons

sundbry · on Dec 16, 2019

What's the indexing/performance like? When I select an ID from a collection is it going to reprocess the entire history to filter on a matching ID?

refset · on Dec 17, 2019

> What's the indexing/performance like?

A key goal has been to avoid ingestion bottlenecks that are otherwise typical when indexing late-arriving temporal data. This is partly achieved by having a very simple EAV index structure over fast local KV stores like RocksDB and LMDB (as opposed to a complex distributed storage architecture with vastly more moving pieces).

As for query performance, the indexes have been designed so that graph traversals are efficient regardless of how much transaction time or valid time history is stored for the entire database of entities. One trick to this is the use of "Morton space filling curves": https://github.com/juxt/crux/blob/master/crux-core/src/crux/...

This is a great talk about the design and internals from ClojurTRE 2019 if you're curious: https://www.youtube.com/watch?v=YjAVsvYGbuU