I don't believe Metabase supports the Calcite SQL dialect (https://github.com/metabase/metabase/issues/6230), which is what Crux is using for the SQL layer. So I believe the answer is no - but I'm not an expert here, so don't take this answer as definitive.
As it happens there was a comment on a related issue about Dremio support [0] (which also uses Calcite) where someone shared that they got a Dremio driver working. I was able to fork their driver and get Metabase working against Crux in time for a live demo of the crux-sql module back in May [1].
I just switched the forked driver repo to public if anyone wants to test it out [2]. The Metabase driver docs are pretty straightforward to get things running. There's definitely work to be done though and I didn't get very deep into it really - but I hope to pick it up again soon!
EDIT: it's also worth a mention that the linked issue #6230 discusses a lot of problems deriving from Druid's lack of support for prepared statements, but Crux and Dremio don't have that limitation. Although reading the most recent comment it looks like Druid may have overcome that hurdle now, so with a bit of luck there might now be more traction to get mainline Metabase support for a generic Calcite driver!
I'm trying to learn datalog casually, there's lots of queries that I still don't know how they translate
I.e. group by, window aggregates, order by, limit
I'd be interested in seeing the equivalent datalog that Crux makes given some of those statements for learning what efficient datalog looks like for some of those problems
You always have the option to wire together multiple Datalog queries using Clojure/JVM-lang-of-choice (the N+1 query problem doesn't affect Crux when embedded), but we are currently analysing the possibilities for a higher-level query API, potentially in a similar vein to Mongo's pipelines [0], where a sequence of "stages" is defined. In the ideal case there would be _closure_ in the intermediate steps of a pipeline such that the output to any given stage (including Datalog queries) is a fully-fledged graph in its own right.
As for your specific examples, Crux already implements an order-by+limit+offset combination that automatically spills to disk if needed, but for efficient pagination you would probably want to maintain additional pre-sorted value ranges. For basic aggregation we have an alpha API decorator that composes Clojure transducers to great effect [1].
Just finished watching that talk, I recommend it too! Information dense and fast with lots of useful examples on how to use Datalog, especially comparing it to what most programmers use today such as relational, key/value and document stores.
Oh dear, I feel like there is some gap between the set of technologies I'm aware of and all this stuff. First of, can somebody (preferably not involved with crux) tell me, if I'm actually should be aware of crux? Is it "yet another Cassandra"? Is it preferable in any way to Cassandra (or whatever "yet another of" it is)?
I see 4 main reasons why someone may want to be aware of Crux:
- if you have a bitemporal problem
- if you have a graph problem, i.e. something you might initially look to Neo4j to help with
- if you want to use Datalog because it can make writing an application simpler
- if you are thinking of building something similar (immutable event log + indexes) and want to save time
Crux is very different from Cassandra (strong consistency, fat nodes, arbitrary joins etc.), but you could definitely use Cassandra in your Crux architecture.
The closest "yet another" comparison would be Datomic. Crux and Datomic both strive to reimagine what a "general purpose" DBMS should look like, with the primary goal being developer productivity/sanity, whereas Cassandra's goal is simply to be a highly-scalable document store.
Thank you, it explains a lot. However, I'm struggling to imagine from the top of my head a problem, where I'd need a DB that is both bitemporal and graph-based, while keeping in mind that both conditions are relatively "expensive" compared to what more common relational design could offer — so I really better have a good reason to chose something exotic like that! Do you perhaps have some example?
Part of the thesis of Datomic (and also Crux, as I understand it) is that cycles and storage have gotten so cheap that some of the design assumptions made by traditional databases no longer necessarily make sense for large segments of the application software market. One of the major ones being challenged is that storage is so expensive that your updates to the DB need to be destructive. That's still true for some domains, but for many, many domains, e.g. ones where historical reporting is important and write volume isn't terribly high, it'd be much better to have an immutable database (facilitating historical queries), even if it means paying a little more in hosting costs, than to build the necessary features on top of a destructive database.
A thoroughly-validated canonical example problem hasn't emerged just yet, but there are many intersecting domains...
Temporal graph analysis of "evolving graphs" is an active research field with some strong motivating use-cases, for instance: profiling networks of fraudulent transactions across N bank accounts with data pulled from M source systems. This paper discusses the analysis of research citations over time, as another example: https://pdfs.semanticscholar.org/110b/0db484a1303eda30aa7e34...
That said, Crux's indexes aren't optimal for making all kinds of analytical time-range queries efficient just yet. Instead Crux is currently focussed on point-in-time queries, but the temporal R&D is still happening as it feels very ripe.
Looking for examples more generally, I think wherever you have a meaningful use-case for a graph database you probably, eventually, will want to capture and model history. If you then find yourself with two or more such databases that you want to integrate, then you will greatly benefit from a bitemporal graph DBMS.
As a fun example, I like to envisage integrating our two federated evolving knowledge graphs. Imagine a tool for "networked thought" like Roam Research that could allow us both to visualise the evolving connections between our independently recorded thoughts, before, during and after this conversation. Graphs of knowledge encoded in time.
1) availability of valid time for domain as-of queries
2) performance of ad hoc as-of queries
3) ingestion throughput (RocksDB is _fast_)
4) eviction/excision throughput
5) a lazy query engine doesn't demand so much memory (because there is no need to hold entire intermediate result sets at the same time), and automatic join re-ordering makes the Datalog inherently more "declarative"
6) use of protocols for modularity allows you to create a massive range of possible topologies to support the non-functional requirements of your host environment
7) benefit from the RocksDB roadmap (or other embedded KV storage - see LMDB / rocksdb-cloud)
8) absence of a prescriptive data model
On the flip side:
1) absence of a prescriptive data model (though transaction functions can give you equivalent power)
2) API maturity
3) lazy caching of data at peers (vs Crux' fat nodes, though again, see rocksdb-cloud for one possible resolution)
4) query features: multiple data sources, lazy entity API, other niceties
There are definitely things still missing from both lists :)
Anyone with experience has some detailed info about the limitation, trade-offs and problems of using Datalog instead of SQL?
Datalog seems like a natural choice as declarative query language but has not yet become mainstream. I wonder why?
Using Crux's Datalog query language from Clojure is great. Normally if your query building library is giving you a data-like interface, then you're relying on some abstraction that will inevitably leak in places (and let's not talk about ORMs), but since Crux really does natively use EDN (Clojure's data syntax, roughly analogous to JSON) for its query language, you've eliminated an entire layer of complexity in your application, and you also have the entire programming language at your disposal for compositionally building queries.
That's not to say that using the SQL API is a bad idea, but if you're using Crux from Clojure, you'd be missing out on some stuff.
Short answer is that Datalog is originally a logic/proof language, while SQL is an arithmetic / statistical / tabular aggregation language.
One can make one solve problems of the other but they're really different mental models, which impacts the representation of the data needed to enable their use.
For Datalog one needs relationships- a graph- not what SQL calls relations, which is just data. Datalog requires a richer set of opinions- at least conceptually- about the data.
Languages like Datalog will not become mainstream until graph modeling is mainstream.
> The REST API also provides an experimental endpoint for SPARQL 1.1 Protocol queries under /sparql/, rewriting the query into the Crux Datalog dialect. Only a small subset of SPARQL is supported and no other RDF features are available.
The context here is that we benchmark ourselves against the likes of Neo4j and RDF4J using SPARQL test suites, in particular WatDiv [0] and LUBM [1] which are specifically designed for stress testing various subgraph matching queries [2]. Therefore we had to translate the data sets and queries from RDF/SPARQL to edn/Datalog etc.
There is an open issue in regards more general RDF support with details on the kinds of things Crux would need to add: https://github.com/juxt/crux/issues/317
Crux: "the decisive or most important point at issue," from Oxford's "Lexico" service. This is a common(ish) English word, it surprises me that it would be offensive in contemporary usage, though I get that the Latin origin relates to Christianity (via the crucifixion story).
And the reason the word "crux" means a "decisive point" is because of the Romans' use of the cross as a means of torturing and killing people. The fact that you're fine with it is OK--it's a free Country--but it tells me a lot about what kind of a person you are.
> Century Dictionary ascribes it to "the cross as an instrument of torture; hence anything that puzzles or vexes in a high degree ...." Extended sense of "central point" is attested by 1888.
So I guess you'd be find with calling your product "noose" or "lynching". Good luck with that.
Yeah, I think Iron Maiden should really change their name. How dare they?
And crossword puzzles shall be called intersection puzzles from now on!
But why stop there!
What kind of a sinful name is Crossfit! Burn them at the cross!
And let'sorganize a protest in front of the Center for Research in Open Source Software (CROSS) at UC Santa Cruz(!!!!!) too!