I see 4 main reasons why someone may want to be aware of Crux:
- if you have a bitemporal problem
- if you have a graph problem, i.e. something you might initially look to Neo4j to help with
- if you want to use Datalog because it can make writing an application simpler
- if you are thinking of building something similar (immutable event log + indexes) and want to save time
Crux is very different from Cassandra (strong consistency, fat nodes, arbitrary joins etc.), but you could definitely use Cassandra in your Crux architecture.
The closest "yet another" comparison would be Datomic. Crux and Datomic both strive to reimagine what a "general purpose" DBMS should look like, with the primary goal being developer productivity/sanity, whereas Cassandra's goal is simply to be a highly-scalable document store.
Thank you, it explains a lot. However, I'm struggling to imagine from the top of my head a problem, where I'd need a DB that is both bitemporal and graph-based, while keeping in mind that both conditions are relatively "expensive" compared to what more common relational design could offer — so I really better have a good reason to chose something exotic like that! Do you perhaps have some example?
Part of the thesis of Datomic (and also Crux, as I understand it) is that cycles and storage have gotten so cheap that some of the design assumptions made by traditional databases no longer necessarily make sense for large segments of the application software market. One of the major ones being challenged is that storage is so expensive that your updates to the DB need to be destructive. That's still true for some domains, but for many, many domains, e.g. ones where historical reporting is important and write volume isn't terribly high, it'd be much better to have an immutable database (facilitating historical queries), even if it means paying a little more in hosting costs, than to build the necessary features on top of a destructive database.
A thoroughly-validated canonical example problem hasn't emerged just yet, but there are many intersecting domains...
Temporal graph analysis of "evolving graphs" is an active research field with some strong motivating use-cases, for instance: profiling networks of fraudulent transactions across N bank accounts with data pulled from M source systems. This paper discusses the analysis of research citations over time, as another example: https://pdfs.semanticscholar.org/110b/0db484a1303eda30aa7e34...
That said, Crux's indexes aren't optimal for making all kinds of analytical time-range queries efficient just yet. Instead Crux is currently focussed on point-in-time queries, but the temporal R&D is still happening as it feels very ripe.
Looking for examples more generally, I think wherever you have a meaningful use-case for a graph database you probably, eventually, will want to capture and model history. If you then find yourself with two or more such databases that you want to integrate, then you will greatly benefit from a bitemporal graph DBMS.
As a fun example, I like to envisage integrating our two federated evolving knowledge graphs. Imagine a tool for "networked thought" like Roam Research that could allow us both to visualise the evolving connections between our independently recorded thoughts, before, during and after this conversation. Graphs of knowledge encoded in time.
1) availability of valid time for domain as-of queries
2) performance of ad hoc as-of queries
3) ingestion throughput (RocksDB is _fast_)
4) eviction/excision throughput
5) a lazy query engine doesn't demand so much memory (because there is no need to hold entire intermediate result sets at the same time), and automatic join re-ordering makes the Datalog inherently more "declarative"
6) use of protocols for modularity allows you to create a massive range of possible topologies to support the non-functional requirements of your host environment
7) benefit from the RocksDB roadmap (or other embedded KV storage - see LMDB / rocksdb-cloud)
8) absence of a prescriptive data model
On the flip side:
1) absence of a prescriptive data model (though transaction functions can give you equivalent power)
2) API maturity
3) lazy caching of data at peers (vs Crux' fat nodes, though again, see rocksdb-cloud for one possible resolution)
4) query features: multiple data sources, lazy entity API, other niceties
There are definitely things still missing from both lists :)
I see 4 main reasons why someone may want to be aware of Crux:
- if you have a bitemporal problem
- if you have a graph problem, i.e. something you might initially look to Neo4j to help with
- if you want to use Datalog because it can make writing an application simpler
- if you are thinking of building something similar (immutable event log + indexes) and want to save time
Crux is very different from Cassandra (strong consistency, fat nodes, arbitrary joins etc.), but you could definitely use Cassandra in your Crux architecture.
The closest "yet another" comparison would be Datomic. Crux and Datomic both strive to reimagine what a "general purpose" DBMS should look like, with the primary goal being developer productivity/sanity, whereas Cassandra's goal is simply to be a highly-scalable document store.
Hope that helps!