Hacker News new | past | comments | ask | show | jobs | submit login
Crux SQL (juxt.pro)
181 points by yogthos on Aug 4, 2020 | hide | past | favorite | 34 comments



Does this mean I can use tools like https://github.com/metabase/metabase with crux? Awesome if so!


I don't believe Metabase supports the Calcite SQL dialect (https://github.com/metabase/metabase/issues/6230), which is what Crux is using for the SQL layer. So I believe the answer is no - but I'm not an expert here, so don't take this answer as definitive.


As it happens there was a comment on a related issue about Dremio support [0] (which also uses Calcite) where someone shared that they got a Dremio driver working. I was able to fork their driver and get Metabase working against Crux in time for a live demo of the crux-sql module back in May [1].

I just switched the forked driver repo to public if anyone wants to test it out [2]. The Metabase driver docs are pretty straightforward to get things running. There's definitely work to be done though and I didn't get very deep into it really - but I hope to pick it up again soon!

[0] https://github.com/metabase/metabase/issues/5562

[1] https://youtu.be/StXLmWvb5Xs?t=996

[2] https://github.com/crux-labs/crux-metabase-driver

(I work on Crux :)

EDIT: it's also worth a mention that the linked issue #6230 discusses a lot of problems deriving from Druid's lack of support for prepared statements, but Crux and Dremio don't have that limitation. Although reading the most recent comment it looks like Druid may have overcome that hurdle now, so with a bit of luck there might now be more traction to get mainline Metabase support for a generic Calcite driver!


You can write your own driver for Metabase without _too_ much hassle: https://github.com/metabase/metabase/wiki/Writing-A-Driver


Its worth noting that Datomic has an analytics adapter which supports SQL and Metabase is of course a first-class citizen:

https://docs.datomic.com/cloud/analytics/analytics-concepts....


I learned datalog through Crux and I really enjoyed it. After not very long, it was more intuitive and easier to reason about that similar sql.


I'm trying to learn datalog casually, there's lots of queries that I still don't know how they translate

I.e. group by, window aggregates, order by, limit

I'd be interested in seeing the equivalent datalog that Crux makes given some of those statements for learning what efficient datalog looks like for some of those problems


You always have the option to wire together multiple Datalog queries using Clojure/JVM-lang-of-choice (the N+1 query problem doesn't affect Crux when embedded), but we are currently analysing the possibilities for a higher-level query API, potentially in a similar vein to Mongo's pipelines [0], where a sequence of "stages" is defined. In the ideal case there would be _closure_ in the intermediate steps of a pipeline such that the output to any given stage (including Datalog queries) is a fully-fledged graph in its own right.

As for your specific examples, Crux already implements an order-by+limit+offset combination that automatically spills to disk if needed, but for efficient pagination you would probably want to maintain additional pre-sorted value ranges. For basic aggregation we have an alpha API decorator that composes Clojure transducers to great effect [1].

[0] https://docs.mongodb.com/manual/reference/operator/aggregati...

[1] https://github.com/crux-labs/crux-decorators/blob/master/tes...


I highly recommend http://www.learndatalogtoday.org/ as a learning resource.

I also enjoyed this talk: https://youtu.be/oo-7mN9WXTw though it’s not so much about syntax as the logic of it.


Just finished watching that talk, I recommend it too! Information dense and fast with lots of useful examples on how to use Datalog, especially comparing it to what most programmers use today such as relational, key/value and document stores.


Oh dear, I feel like there is some gap between the set of technologies I'm aware of and all this stuff. First of, can somebody (preferably not involved with crux) tell me, if I'm actually should be aware of crux? Is it "yet another Cassandra"? Is it preferable in any way to Cassandra (or whatever "yet another of" it is)?


(I am involved with Crux - sorry!)

I see 4 main reasons why someone may want to be aware of Crux:

- if you have a bitemporal problem

- if you have a graph problem, i.e. something you might initially look to Neo4j to help with

- if you want to use Datalog because it can make writing an application simpler

- if you are thinking of building something similar (immutable event log + indexes) and want to save time

Crux is very different from Cassandra (strong consistency, fat nodes, arbitrary joins etc.), but you could definitely use Cassandra in your Crux architecture.

The closest "yet another" comparison would be Datomic. Crux and Datomic both strive to reimagine what a "general purpose" DBMS should look like, with the primary goal being developer productivity/sanity, whereas Cassandra's goal is simply to be a highly-scalable document store.

Hope that helps!


Thank you, it explains a lot. However, I'm struggling to imagine from the top of my head a problem, where I'd need a DB that is both bitemporal and graph-based, while keeping in mind that both conditions are relatively "expensive" compared to what more common relational design could offer — so I really better have a good reason to chose something exotic like that! Do you perhaps have some example?


Part of the thesis of Datomic (and also Crux, as I understand it) is that cycles and storage have gotten so cheap that some of the design assumptions made by traditional databases no longer necessarily make sense for large segments of the application software market. One of the major ones being challenged is that storage is so expensive that your updates to the DB need to be destructive. That's still true for some domains, but for many, many domains, e.g. ones where historical reporting is important and write volume isn't terribly high, it'd be much better to have an immutable database (facilitating historical queries), even if it means paying a little more in hosting costs, than to build the necessary features on top of a destructive database.


A thoroughly-validated canonical example problem hasn't emerged just yet, but there are many intersecting domains...

Temporal graph analysis of "evolving graphs" is an active research field with some strong motivating use-cases, for instance: profiling networks of fraudulent transactions across N bank accounts with data pulled from M source systems. This paper discusses the analysis of research citations over time, as another example: https://pdfs.semanticscholar.org/110b/0db484a1303eda30aa7e34...

That said, Crux's indexes aren't optimal for making all kinds of analytical time-range queries efficient just yet. Instead Crux is currently focussed on point-in-time queries, but the temporal R&D is still happening as it feels very ripe.

Looking for examples more generally, I think wherever you have a meaningful use-case for a graph database you probably, eventually, will want to capture and model history. If you then find yourself with two or more such databases that you want to integrate, then you will greatly benefit from a bitemporal graph DBMS.

As a fun example, I like to envisage integrating our two federated evolving knowledge graphs. Imagine a tool for "networked thought" like Roam Research that could allow us both to visualise the evolving connections between our independently recorded thoughts, before, during and after this conversation. Graphs of knowledge encoded in time.


Open source vs closed source aside, why would one use Crux instead of Datomic?


1) availability of valid time for domain as-of queries

2) performance of ad hoc as-of queries

3) ingestion throughput (RocksDB is _fast_)

4) eviction/excision throughput

5) a lazy query engine doesn't demand so much memory (because there is no need to hold entire intermediate result sets at the same time), and automatic join re-ordering makes the Datalog inherently more "declarative"

6) use of protocols for modularity allows you to create a massive range of possible topologies to support the non-functional requirements of your host environment

7) benefit from the RocksDB roadmap (or other embedded KV storage - see LMDB / rocksdb-cloud)

8) absence of a prescriptive data model

On the flip side:

1) absence of a prescriptive data model (though transaction functions can give you equivalent power)

2) API maturity

3) lazy caching of data at peers (vs Crux' fat nodes, though again, see rocksdb-cloud for one possible resolution)

4) query features: multiple data sources, lazy entity API, other niceties

There are definitely things still missing from both lists :)


Anyone with experience has some detailed info about the limitation, trade-offs and problems of using Datalog instead of SQL? Datalog seems like a natural choice as declarative query language but has not yet become mainstream. I wonder why?


Using Crux's Datalog query language from Clojure is great. Normally if your query building library is giving you a data-like interface, then you're relying on some abstraction that will inevitably leak in places (and let's not talk about ORMs), but since Crux really does natively use EDN (Clojure's data syntax, roughly analogous to JSON) for its query language, you've eliminated an entire layer of complexity in your application, and you also have the entire programming language at your disposal for compositionally building queries.

That's not to say that using the SQL API is a bad idea, but if you're using Crux from Clojure, you'd be missing out on some stuff.


Short answer is that Datalog is originally a logic/proof language, while SQL is an arithmetic / statistical / tabular aggregation language.

One can make one solve problems of the other but they're really different mental models, which impacts the representation of the data needed to enable their use.

For Datalog one needs relationships- a graph- not what SQL calls relations, which is just data. Datalog requires a richer set of opinions- at least conceptually- about the data.

Languages like Datalog will not become mainstream until graph modeling is mainstream.


This looks really interesting - especially the temporal aspects. Does anyone have any insight into using it with RDF/SPARQL data/queries?


From their docs[0]:

> The REST API also provides an experimental endpoint for SPARQL 1.1 Protocol queries under /sparql/, rewriting the query into the Crux Datalog dialect. Only a small subset of SPARQL is supported and no other RDF features are available.

[0]: https://opencrux.com/docs


The context here is that we benchmark ourselves against the likes of Neo4j and RDF4J using SPARQL test suites, in particular WatDiv [0] and LUBM [1] which are specifically designed for stress testing various subgraph matching queries [2]. Therefore we had to translate the data sets and queries from RDF/SPARQL to edn/Datalog etc.

There is an open issue in regards more general RDF support with details on the kinds of things Crux would need to add: https://github.com/juxt/crux/issues/317

[0] https://dsg.uwaterloo.ca/watdiv/

[1] http://swat.cse.lehigh.edu/projects/lubm/

[2] https://en.wikipedia.org/wiki/Subgraph_isomorphism_problem


What's the hosting story for crux if one wanted to try it out on a side project?


:) well I already answered your same question on r/Clojure but here's a link for others: https://www.reddit.com/r/Clojure/comments/i3gzxy/crux_sql/g0...


Thanks!


[flagged]


Are you missing an /s or is this a serious comment?

https://www.merriam-webster.com/dictionary/crux


Crux: "the decisive or most important point at issue," from Oxford's "Lexico" service. This is a common(ish) English word, it surprises me that it would be offensive in contemporary usage, though I get that the Latin origin relates to Christianity (via the crucifixion story).


And the reason the word "crux" means a "decisive point" is because of the Romans' use of the cross as a means of torturing and killing people. The fact that you're fine with it is OK--it's a free Country--but it tells me a lot about what kind of a person you are.

https://www.etymonline.com/search?q=crux

> Century Dictionary ascribes it to "the cross as an instrument of torture; hence anything that puzzles or vexes in a high degree ...." Extended sense of "central point" is attested by 1888.

So I guess you'd be find with calling your product "noose" or "lynching". Good luck with that.


Yeah, I think Iron Maiden should really change their name. How dare they? And crossword puzzles shall be called intersection puzzles from now on! But why stop there!

What kind of a sinful name is Crossfit! Burn them at the cross!

And let'sorganize a protest in front of the Center for Research in Open Source Software (CROSS) at UC Santa Cruz(!!!!!) too!


Languages and words' meaning change over time.

Just because Crux's root was a torture device doesn't mean it has any connotation to a torture device in modern English.


We all renamed our "master" repositories, didn't we?


Any relation to the CRUX Linux distro? Or is this just an unfortunate overlap of names?


Doesn't look like there's any relation.

Off-topic, but I quite liked CRUX linux when I did use it. It introduced me to BSD-style init scripts




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: