We used datomic in production at Time Inc around 2016. The idea of an immutable database where you can track changes over time, or query the state of the universe at any given point, sounded amazing for marketing or compliance use cases. Unfortunately from a dev standpoint it did not feel like mature system, and the performance was not where we needed it to be.
Probably the most advanced database for triple stores these days is RDFox ( https://www.youtube.com/watch?v=-DnmuHtywFs ). While datomic uses datalog for querying, RDFox uses datalog for database reasoning, and sparql, a w3 standard for querying. As you add data to the database you can infer new facts. If you want immutability, simply add data in append mode only with a timestamp. But this idea you can add the business rules/logic to the database, and have it incrementally apply that logic as you add data is a recent advance by oxford AI research.
For uninitiated, Nikita has created in-memory database with mostly compatible API (datalog) for ClojureScript or Javascript. My impression is that his variant actually has more wide-spread use than original Datomic, based on number of open source projects that use each.
I used DataScript for a while to get familiar with graph database querying. I was fascinated how easy it is to construct queries that mine obscure relations between distantly related entities. I hope I get to use similar tech again.
> This simplicity enables Datomic to do more than any relational DB or KV storage can ever afford
> Datomic does not manage persistence itself, instead, it outsources storage problems to databases implemented by other people. Data can be kept, at your expense, in DynamoDB, Riak, Infinispan, Couchbase or SQL database.
From what I understand, Datomic’s model is far more flexible than many other databases, and has built in time travelling capability due to its accretion of immutable data.
It’s architecture does, in fact, allow you to choose the storage provider, and it’s considered an external concern.
Those are very compelling reasons to use it, but part of the trade off is write-scalability, and potentially, raw performance.
So maybe it’s “do more” within certain limitations (which is up to you to decide if those limitations are a deal breaker).
If the argument is Datomic can't be simpler than a relational DB because it can utilize it for persistence, then you'd have to argue that a relational DB can't be simpler than directly using a hard drive for your storage solution.
One thing I've never understood is why all the indexes have transaction last. One of the selling points of Datomic is that it supports as-of queries, but using the EAVT or AEVT indexes requires it to scan all historic values of that attribute, right?
In most situations this is probably fine, but if you have data that changes frequently it seems like this could slow queries down compared to an EATV or AETV index.
It's also likely that the people who made Datomic are both smarter about this stuff than me and put more thought into it than I have, so I'd love to know what the reasoning behind the choice of index is.
(PS @dang it would be nice to have (2014) in the title)
I'm not sure EATV/AETV could be used fully instead of EAVT/AEVT as you would then lose the ability to have efficient range seeks across values. I do agree though that scanning all historical values in EAVT/AEVT is unsatisfactory for many use-cases as it makes the performance of ad-hoc as-of queries unpredictable.
By contrast, Crux [0] uses two dedicated temporal indexes: EVtTtC and EZC (Z-curve index) to make bitemporal as-of queries as fast as possible. These are distinct from the various triple indexes, which don't concern themselves with time at all. (Vt = valid time, Tt= transaction time, and C = the document hash for the version of an entity at a given coordinate)
In the article it mentions that while indexes are conceptually monolithic, in practice they're partitioned into 3 spaces: historical, current, and in memory.
New data gets written to the log for durability and updates the in memory portion for queries. Periodically indexes are rebuilt, creating new segments for current, and shifting historical data out of current. This limits how much of the log must be replayed on recovery, and allows garbage collection of data that falls out of the retention window.
It's not that dissimilar to solutions used by traditional mvcc databases.
The log index supports as-of if you know the actual transaction ID, but if you want to look up by entity/attribute efficiently it's not much help because you don't know when the data point you're interested in was last modified.
I think in this case you’d find all datoms via normal EAVT index and then sort the results by transaction id, dropping everything after your desired transaction.
To determine whether a Datom is being rectracted or added there is a fifth element in the tuple [0].
There are many similarities to modelling temporal data in SQL [1]. But Datoms are simpler and more open as you can freely build relations between them (composable), similar to a graph-db.
Too late. Read the whole thing before coming back here to see your suggestion, now everything here, and everywhere else, is yellowish, no matter what I set in the style editor.
I went to a Clojure meetup one time and they all went on about how using Datomic in production is a nightmare and it's generally an over-engineered product that isn't worth the trouble in the end. Do most people who have dealt with Datomic in production feel this way?
Yes, and that's exactly why Nubank acquired Cognitect. They are too deep into the tech to migrate to something else, cheaper to just buy the authors.
So you have deep technical debt with serious scaling issues and bugs everywhere(Datomic/Nubank) and a burnout company(Datomic/Cognitect) get together, makes sense.
Burnout because their "Datomic Cloud" product didn't worked out, it was just a horrible complex AWS cloudformation template that force you to click through tens of aws webpages. It was more complex to manage and to dev for than on-premise but you still had all the same issues and bugs.
Nubank got into Datomic not because of Clojure, but the other way around, they got into Clojure because of Datomic. If you watch their videos, the reason they picked Datomic was because they think it had "time travel", which is quite different from having "history" of transactions, use mostly for auditing and troubleshooting, not for real time travel queries.
In the end, I guess things did work out for Cognitect, and Hickey is now laughing all the way to the bank.
I have being following Datomic for a year because of a system I inherited.
This seems like an excessively uncharitable read of the situation. I've never used Nubank's software, but I have used (on-prem) Datomic and I certainly wouldn't say it has bugs everywhere. In fact, in my (admittedly low-volume and simple) usage of the system I haven't come across any bugs I can remember. Calling Cognitect a "burnout" company is inaccurate and rude.
I agree with you that the Datomic cloud stuff comes across as being frighteningly complex. I think they probably just need to work on the documentation, like making it more obvious what the differences and tradeoffs are between the deployment scenarios.
Did you inherit a Datomic system that was previously developed by a small team or a small company? Because inheriting a system that's hard to understand and change transcends languages and databases. It is the tie that binds us all as software developers.
Fair enough, hitting that bug would have pissed me off too.
On your last point, I agree that it still has a way to go. It's good for some (many?) production use cases now, as Nubank's success demonstrates, and hopefully with Nubank's resources it'll start to live up more to its promise.
Anecdotally I know of one company which is also in the same boat and generally regrets their usage of Datomic and is trying to move away from it last I talked with them. However, there's also people on HN like dustingetz who have had a great time with Datomic and use it as a core component of their product.
I just wish Cognitect would allow people to run public benchmarks of Datomic to make it easier to evaluate its tradeoffs.
What the company ran into? Unfortunately not :/. It was a quick chat in an informal setting with their VP of engineering (I think?) that really was just a "huh, interesting moment" for me (although I've coded in Clojure for a full-time job before I have essentially no personal experience with Datomic).
As for the positive side, I think dustingetz monitors Clojure and Datomic threads pretty closely so maybe they can chime in here.
> The Licensee hereby agrees, without the prior written consent of Cognitect, which may be withheld or conditioned at Cognitect’s sole discretion, it will not... publicly display or communicate the results of internal performance testing or other benchmarking or performance evaluation of the Software
That's just vile. is there any /good/ defense of this kind of agreement other than a 'think of the children' argument that people might make a mistake in their performance reviews?
That article only lists MS and Oracle though. Apart from IBM, I don't think CockroachDB Enterprise has such a prohibition, nor does Google Spanner (I think?), nor does Amazon Aurora (again I think?). And of course all the open source competitors don't have this clause.
Basically my impression is that DeWitt clauses are common enough to be well-known, but still in the distinct minority. That's just an impression though.
Never had any strict trouble with it. Maybe it's just that I've used it for a long time but I enjoy the simplicity of using it.
My biggest complaint is performance for certain use-cases. Say if you're trying to pull a lot of attributes on hundreds of thousands of datoms it's going to be rather slow (even though it's supposed to be in-memory already). But again for these kinds of use-cases I'd probably go with a completely different kind of a database either way.
The story around deletions/excisions isn't that great either. Honestly the whole log/history aspect of Datomic sounds nice but never really used it other than for reverting stupid mistakes.
The #1 thing I love is the freedom of querying you get with Datomic. You insert your data in a way that makes sense for your data, and querying is pretty much a completely separate concern. For the most part you don't need to structure your schema around the querying capabilities of your database which I love. Say back in the day I liked Mongo because you could just insert whatever you wanted [0] but eventually you'd hit problems where you couldn't easily query your data (maybe it has changed over the years, no idea).
And the syntax is just a pleasure to work with. I'd love a version of Datomic that kept the same interface but dropped some of the more esoteric features in favor of performance.
Also I noticed some of the people reporting issues used the cloud version. Never used that so can't speak to that. On-prem is free and has all the features. As long as you don't redistribute it there's no problem.
[0] Yes in datomic you do have to have a schema. But it's pretty much a simple global list of possible attributes. If you need to add something later or make a change it's pretty straightforward.
FYI, Datalevin has faster queries than Datascript, for Datalevin has given up "database as a value" doctrine that both Datomic and Datascript share, so Datalevin can cache aggressively to achieve better performance.
Datomic learning curve is relatively steep, like many higher level and more abstract things in Clojure ecosystem in general, and you should know how to cook it for sure.
After figuring out all why's and how's it works like a charm.
However, I indeed find Datomic Cloud version unnecessarily complex for most applications. Probably it is still a good corporate sales product for Cognitect.. Datomic On-premise version is much more friendly for small-medium-somewhat-larger use cases. Cloud version is also an AWS thing, so locks you in there, which is also not good.
I have heard multiple times that its rather slow, but haven’t seen any benchmarks. Would make sense, as a dynamically typed, garbage collected language Clojure is not the greatest fit to implement a database in.
The question is, are the things you gain worth it?
I would be surprised if Datomic's core code was written in Clojure rather than Java (and these days Java's performance can get you pretty far in implementing a database, see e.g. Cassandra).
Most highly performance-sensitive code in the Clojure ecosystem is a Clojure wrapper around a Java core.
But yes as I said elsewhere, it would be great if Cognitect allowed people to post benchmark results.
I‘ve used Cassandra, its not that impressive. Much slower than the C++ rewrite (ScyllaDB?), latency issues due to GC, can’t hold a candle to Clickhouse. And they’ve been optimizing it for a long time now.
Cassandra and ClickHouse are designed to do different things. To flip things around, have you compared the latency of a single-row update or delete in Cassandra vs ClickHouse?
If you care about the latency of a single row update or delete, Clickhouse is definitely the wrong tool for the job. First, it doesn’t really have deletes(afaik). Second, you need to batch updates aggressively to get good throughput.
But you’re right C* and CH are designed to do different things. I just found the difference in general performance across everything (startup, schema changes, throughput, query performance, optimization opportunities) to be quite pronounced. One feels like a race car, the other not so much.
Idiomatic Clojure is slower than JS, but you can make Clojure somewhat close to Java by writing Java with parenthesis(lots of interop from Clojure). One of the devs of Datomic brag about how it was only 200KLOC of Clojure, but if you extract the datomic tar, the lib dir has probably more that 2Million LOC of open source Java libs.
Probably the most advanced database for triple stores these days is RDFox ( https://www.youtube.com/watch?v=-DnmuHtywFs ). While datomic uses datalog for querying, RDFox uses datalog for database reasoning, and sparql, a w3 standard for querying. As you add data to the database you can infer new facts. If you want immutability, simply add data in append mode only with a timestamp. But this idea you can add the business rules/logic to the database, and have it incrementally apply that logic as you add data is a recent advance by oxford AI research.