Datomic's is perfect for probably 90% of small-ish backoffice systems that never has to be web scale (i.e. most of what I do at work).
Writing in a single thread removes a whole host of problems in understanding (and implementing) how data changes over time. (And a busy MVCC sql db spends 75% of its time doing coordination, not actual writes, so a single thread applying a queue of transactions in sequence can be faster than your gut feeling might tell you.)
Transactions as first-class entities of the system means you can easily add meta-data for every change in the system that explains who and why the change happened, so you'll never again have to wonder "hmm, why does that column have that value, and how did it happen". Once you get used to this, doing UPDATE in SQL feels pretty weird, as the default mode of operation of your _business data_ is to delete data, with no trace of who and why!
Having the value of the entire database at a point in time available to your business logic as a (lazy) immutable value you can run queries on opens up completely new ways of writing code, and lets your database follow "functional core, imperative shell". Someone needs to have the working set of your database in memory, why shouldn't it be your app server and business logic?
Looking forward to see what this does for the adoption of Datomic!
> Someone needs to have the working set of your database in memory, why shouldn't it be your app server and business logic?
This one confused me. The obvious reason why you don't want the whole working set of your database in the app server's memory is because you have lots of app servers, whereas you only have one database[1]. This suggests that you put the working set of the database in the database, so that you still only need the one copy, not in the app servers where you'd need N copies of it.
The rest of your post makes sense to me but the thing about keeping the database's working set in your app server's memory does not. That's something we specifically work to avoid.
[1] Still talking about "non-webscale" office usage here, that's the world I live in as well. One big central database server, lots of apps and app servers strewn about.
Consider this use case - in addition to your web app, you have a reporting service that makes heavy duty reports; if you run one at a bad time, bad things might happen like users not being able to log in or do any other important work, because the database is busy with the reports.
So in a traditional DB you might have a DBA set up a reporting database so the operational one is not affected. Using Datomic the reporting service gets a datomic peer that has a copy of the DB in database without any extra DBA work and without affecting any web services. This also works nicely with batch jobs or in any situation where you don't want to have different services affect each others performance.
Its true that a lot more memory gets used, but it is relatively cheap - usually the biggest cost when hosting in the cloud being the vCPUs. But usually in Clojure/Datomic web application you don't need to put various cache services like Redis in front of your DB.
Thea assumption here is that the usual bottleneck for most information systems and business applications is reading and querying data.
I appreciated this insight into other people's use cases, thank you for that! This architecture brings RethinkDB to mind, which also had some ability to run your client as a cluster node that you alone get to query. (Although there it was more about receiving the live feed than about caching a local working set.)
> Client (notice not Proxy) caches uncommitted writes to support read-uncommitted-writes in the same transaction. This type of read-repair is only feasible for a simple k/v data model. Anything slightly more complicated, e.g. a graph data model, would introduce a significant amount of complexity. Caching is done on the client, so read queries can bypass the entire transaction system. Reads can be served either locally from client cache or from storage nodes.
RethinkDB user here. I've been running it in production for the last 8 years or so. It works. It doesn't lose data. It doesn't corrupt data (like most distributed databases do, read the Jepsen reports).
I am worried about it being unmaintained. I do have some issues that are more smells than anything else — like things becoming slower after an uptime of around three weeks (I now reboot my systems every 14 days). I could also do with improved performance.
I'm disappointed that the Winds of Internet Fashion haven't been kind to RethinkDB. It was always a much better proposition that, say, MongoDB, but got less publicity and sort of became marginalized. I guess correct and working are not that important.
I'm slowly rebuilding my application to use FoundationDB. This lets me implement changefeeds, is a correct distributed database with fantastic guarantees (you get strict serializability in a fully distributed db!), and lets me avoid the unneeded serialization to/from JSON as a transport format.
We've never had any issue with it on a typical three-node install in Kubernetes. It requires essentially no ongoing management. That said, it can't be ignored that the company went under and now it's in community maintenance mode. If you don't have a specific good use for Rethink's changefeed functionality, which sets it apart from the alternatives, I'm not sure I could recommend it for new development. We've chosen not to rip it out, but we're not developing new things on it.
I remember back when it came out it was a big deal that it could easily scale master-master nodes and the object format was a big thing because of Mongo back then.
That was before k8's wasn't a thing back then, and most of the failover for other databases wasn't a thing just yet. I'm too scared to use it because they have a community but they're obviously nowhere as active as the Postgres and other communities.
I suppose I think of this the other way around. When the query engine is inside your app, the query engine doesn't need to do loop-like things. So you can have a much simpler querying language and mix it with plain code, kind of similar to how you don't need a templating language in React (and in Clojure when you use hiccup to represent HTML)
Additionally, this laziness means your business logic can dynamically choose which data to pull in based on results of queries, and you end up with running fewer queries as you'll never need to pull in extra data in one huge query in case business logic needs it.
It's definitely a trade-off! If you have 10s or 100s of app servers that has the exact same working set in memory, it's probably not worth it.
But if you have a handful of app servers, it's much more reasonable. The relatively low scale back-office systems I tend to work with typically has 2, max 3. Also, spinning up an extra instance that does some data crunching does not affect the performance of the app servers, as they don't have to coordinate.
There's also the performance and practicality benefits you get from not having to do round-trips in order to query. You can now actually do 100 queries in a loop, instead of having to formulate everything as a single query.
And if you have many different apps that operates on the same DB, it becomes a benefit as well. The app server will only have the _actual_ working set it queries on in memory, not the sum of the working set across all of the app servers.
If this becomes a problem, you can always architecture your way around it as well, by having two beefy API app servers that your 10s or 100s of other app servers talks to.
SQLite provides a similar benefit with tremendous results using its in-process database engine, although the benefit there is more muted by default because of the very small default cache size. We do have one app where we do this. There's no database server, the app server uses SQLite to talk directly to S3 and the app server itself caches its working set in memory. I can definitely see the benefit of some situations, but for us it was a pretty unusual situation that we might not ever encounter again.
All that said... can't Datomic also do traditional query execution on the server? I thought it had support for scale-out compute for that. AIUI, you have the option to run as either a full peer or just an RPC client on a case-by-case basis? I thought you wouldn't need to resort to writing your own API intermediary, you could just connect to Datomic via RPC, right?
AIUI, the full peer is Datomic; the RPC server is just a full peer that exposes the API over http and is mainly intended to be used with clients that don't run on the JVM (and so can't run Datomic itself in-process).
I think the point is that treating your database as an arms-length, RPC component that's independent from your "application" isn't necessarily the only pattern.
Strong agree. there are vast, massive cost savings and performance advantages to be had if the model is that a shard of the dataset is in memory and the data persistence problem is the part that's made external. The only reason we are where we are today is that doing that well is hard.
Is this not the case already? Database drivers (or just your application code) are allowed to cache results if they like. The problem is cache invalidation.
Understood. For a single-node or read-only system it sounds fine, but then there are a variety of ways to solve that (e.g. a preloaded in-memory sqlite).
Having the working set present on app servers means they don't put load on a precious centralized resource which becomes a bottleneck for reads. The peer model allows app servers to service reads directly, avoiding the cost of contention and an additional network hop, allowing for massive read scale.
This is true, but the tradeoff is that now your central DB is a bottleneck that is difficult to scale.
Having the applications keep a cached version of the db means that when one of them runs a complex or resource intensive query, it's not affecting everyone else.
> Datomic's is perfect for probably 90% of small-ish backoffice systems that never has to be web scale (i.e. most of what I do at work).
I don’t think I agree with this as stated. It is too squishy and subjective to say “perfect”.
More broadly, the above is not and should not be a cognitive “anchor point” for reasonable use cases for Datomic. Making that kind of claim requires a lot more analysis and persuasion.
Datomic always seemed like a really cool thing to use. However, I'm not familiar with Clojure or any other JVM based language, nor do I have the time to learn it. And I can't find any supported way to use it with other languages (I'm not even talking about popular frameworks), or am I missing something?
It doesn't feel like the people behind Datomic actually want to have users outside of the Clojure world, which will be rather limiting to adoption.
Something I've been curious about: how well (or badly) would it scale to do something similar on a normal relational DB (say, Postgres)?
You could have one or more append-only tables that store events/transactions/whatever you want to call them, and then materialized-views (or whatever) which gather that history into a "current state" of "entities", as needed
If eventual-consistency is acceptable, it seems like you could aggressively cache and/or distribute reads. Maybe you could even do clever stuff like recomputing state only from the last event you had, instead of from scratch every time
Datomic already sort of does this :) You configure a storage backend (Datomic does not write to disk directly) which can be dynamodb, riak, or any JDBC database including postgres. You won't get readable data in PG though, as Datomic stores opaque compressed chunks in a key/value structure. The chunks are adressable via the small handful of built-in indexes that Datomic provides for querying, and the indexes are covering, i.e. data is duplicated for each index.
Because, for example, your application is not tied to the JVM? You are uncomfortable using closed source software for such a critical piece of infra? As far as I can tell they don't even have a searchable bug report database! I'd hate to be the one debugging an issue involving datomic.
That's a pretty common pattern in event-sourcing architectures. It is a completely viable way to do things as long as "eventual-consistency is acceptable" is actually true.
Good question! I don't have any personal experience in that regard. I would probably have paid up for enterprise support (or bought the entire company ;))
I don't how they do it, but the obvious answer is probably sharding. Their cloud costs must be no joke. Peers require tons of memory and I can only guess they must have thousands of transactors to support that workload and who knows how many peers. Add to this that they probably need something like Kafka for integrating/pipelining all this data.
As do most distributed databases. Even when you don't store your entire database (or working set) in memory, you'll likely still have to add quite a bit of memory to be used as I/O cache.
One thing which is quite hard to do in Datomic is simple pagination on a large sorted dataset, as one can easily do with LIMIT/OFFSET in MySQL for example. There are solutions for some of the cases, but general case is not solved, as far as I remember (it’s been a while I used it extensively)
It depends! If you want to lazily walk data, you can read directly from the index (keep in mind, the index = the data = lives in your app), or use the index-pull API which is a bit more convenient.
However, if you want to paginate data that you need to sort first, and the data isn't sorted the way you want in the index, you have to read all of the data first, and then sort it. But this is also what a database server would need to do :)
Yep, I am well aware of these specifics and workarounds, but in general case where is no general solution to the question asked here, for example [0].
And for big datasets with complex sorting it will take some effort to implement a seemingly simple feature.
Guess it is just one of the tradeoffs, as while some features Datomic has out of the box are hard to replicate in RDBMS-es, things like pagination which are often took for granted is a bit of work to do in Datomic. So it is something to keep in mind when considering the switch
Datomic's covering indexes are heavily based on their built-in ordering, and doesn't really have much flexibility in different ways to sort and walk data.
Personally, I'm a fan of hybrid database approaches. In the world of serverless, I really enjoy the combo of DynamoDB and Elasticsearch, for example, where Dynamo handles everything that's performance critical, and Elasticsearch handles everything where dynamic queries (and ordering, and aggregation, and ....) is required. I've never done this with Datomic, but I'd imagine mirroring the "current" value of entities without historical data is relatively easy to set up.
The main difference between event sourcing and datomic are the indexes and the "schema", which provides full SQL-like relational query powers out of the box, as well as point-in-time snapshots for every "event" (transactions of facts).
So, "events" in Datomic are structured and Datomic uses them to give you query powers, they're not opaque blobs of data.
> doing UPDATE in SQL feels pretty weird, as the default mode of operation of your _business data_ is to delete data, with no trace of who and why!
It's a good idea to version your schema changes using something like liquibase into git, that gets rid of at least some of those pains. Liquibase works on a wide variety of databases, even graphs like Neo4j
I got the same feeling in Erlang many times, once write operations start getting parallel you worry about atomic operations, and making an Erlang process centralize writes through its message queue always feels natural and easy to reason about.
I guess NuBank (Cognitect's owners) have concluded that the paid licensing business wasn't worth the hassle compared to having the developer time involved spent on other things.
Releasing only binaries, while I understand people being grumpy about it, seems like an interesting way of keeping their options open going forwards. Since it was always closed source, it now being 'closed source but free' is still a net win.
The Datomic/Cognitect/NuBank relationship is an interesting symbiotic dynamic and while I'm sure we can all think of ways it might go horribly wrong in future I rather hope it doesn't.
Ah! Yes, but not quite! It’s not freeware. The binaries are technically open sourced, you can do with them as you please within the confines of the Apache license.
It seems like it, which is a bit of a change in direction for them. I've poked at the data from the clojure side in the past (inspecting objects), just to learn how it works, but the license was strongly worded against reverse engineering.
I've also taken a look at generated clojure bytecode. It looked like the codegen is pretty straightforward with minimal optimization. It looked like it wouldn't be too hard to reverse with maybe a little bit of backtracking (essentially a parsing problem, I believe). You'd then need a separate step to redo the macros.
It sounded like it might be a fun little project (just to see if it can be done and try my hand at decompiling), but I would have wanted to decompile datomic to make it interesting and the license precluded that.
The Apache license says I can do things with the source of it.
> You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and **in Source or Object form**, provided that (...)
Well, _technically_ you are now free to modify the binary and redistribute the result as per the Apache 2.0 license. That’s different than giving something as freeware, which would not allow/cover modification/redistribution.
Which actually does make a difference for JARs because the bytecode is portable and modifiable, not trivially so but definitely way more easily than native code.
Based on experience with Prolog, I always thought using Datalog in a database like Datomic would mean being able to model your data model using stored queries as a very expressive way of creating "classes". And that by modeling your data model using nested such queries, you alleviate the need for an ORM, and all the boilerplate and duplication of defining classes both in SQL and as objects in OO code ... since you already modelled your data model in the database.
You can definitely use Datomic in the way you describe, in at least a few different ways. I haven't often seen queries stored in the database itself. It's more common to have the queries as data in the application. Since queries are ordinary Clojure data structures, it's even more common to see functions that build queries from inputs or functions that transform queries (e.g., adding a tenant-id filter clause).
Datomic also support rules, including recursive rules. I wrote a library to do OWL-style inference about classes of entities using rules. You can see an example here (https://github.com/cognitect-labs/onto/blob/master/src/onto/...). This is a rule that infers all the classes that apply to an entity from an attribute that assigns it one class.
I would also say that building an "entity type definition" system as entities in Datomic is almost the first thing every dev tries after the tutorial. It works... but you almost never _need_ it later.
Clojure in general is all about passing around little maps of data and in particular not using OO to model. So Datomic naturally continues that by returning maps of nested structures to represent your query results and does side-step the ORM completely.
Rich Hickey has a great talk about how Objects are data structures with unique interfaces that are unnecessary complexity. He showed and example of a web server with a web request object with request headers etc etc. Doing simple things like collecting information out of that nested object structure is bespoke and harder than it should be for no real gain. If everything is a map or list or set, it becomes completely trivial to extract and manipulate data from a structure. It's a subtle difference at first but when everything is like that it makes your system far simpler. Unfortunately I can't remember exactly where the part of this talk is.
Yes, this hit home so hard. All objects are bespoke mini languages that add little to no value and simultaneously make it hard to remember and hard to use. Just use maps!
This is the benefit. Languages like Clojure have excellent support for manipulating maps and I'd wager that something like 40-60% of Clojure code is similar-looking map manipulation stuff for this very reason.
If you need that validation, you just validate the map? You can use established methods like Malli or Clojure Spec for this. If you need to use a record with a fixed schema, just use a record instead of map. In Clojure, you can use most of the map functions for records too.
This comment is talking specifically in the context of Clojure.
In the Clojure culture (so to speak) maps may also have a schema, as used by various schema checking tools, which are richer than runtime type checks. (Not the same as database schema)
Nit: I would not say that a JVM object is a schema, because there’s more to it. Rich is well known for saying that the idea of an object complects two things: (1) a record-like (struct) data structure with the (2) code to manipulate it.
Sometimes it’s even more complected because in some languages classes can make assumptions about state across all objects and threading.
Constraints always have value. Without constraints programmers are confronted with the naked complexity of mapping between the set of all possible states of N bits which is (2^N)!.
I don’t agree with this. iirc Rack ultimately uses and array to represent HTTP responses. It has three members: the status code, the headers, and the response body.
If you’re shipping a new change, is it easier to mistake response[0] or response.headers?
This is a trivial example, but the general class (ha) of trade-off is amplified with more complex objects.
I love clojure and lisp but the blindness behind a statement like “no real gain” has always kneecapped adoption.
In Clojure, response.headers is still data :) You just use the built-in ways of reading named keys, such as (:headers response) or (get headers :response).
I think there is a need for objects. An active connection, talking to the GPU, etc are not data -- identity is essential for their operation.
OOP is probably the best way to model such,.. well objects, allowing them a private, encapsulated state, and making it only modifiable, or even viewable through a well-defined public interface that enforces all its invariants.
I think OOP (object oriented programming) is abused and is not the optimal paradigm for most software services. It succeeds best at providing inter-operability structure for API design. "Objects", as mentioned here, are an abstraction. Stateful data can be organized and manipulated elegantly without use of the OOP paradigm. Many small systems that employ OOP hamper their maintenance and extendability by unnecessary dependence upon encapsulation and data ownership. Mutability is a villain here as well -- when data structures are immutable, there is little fear of panoptic architectures designed without ownership constraints. Here software is no longer corralled into walled gardens of "objects"; large complex types and their brittle method associations are avoided, greatly simplifying software architectures as a result.
Clojure uses objects for connections and things, but POJOs are harmful IMHO because it makes manipulation and collecting data a bespoke task every time. Every time you change the data, you have to change a class to represent the JSON and the ORM class and .... this is all just data
Well, that’s what Java records are for. As for having to change the type description, that’s more of a static typing discussion, though it can be generated — having a single source of truth and generating others is imo the best approach.
In his eyes? It definitely was a thing, records are just nominal product types, these are probably the single most used building block of programming languages.
I really like Clojure, but I really don’t know what some of its fans think (also true of other lisps), like there is a healthy pollination of ideas between languages, lisps are not God’s language.
Standard ML had records since the '70s. Both Clojure and Java would benefit from taking more from what came before, though Java at least had the excuse of being designed for low-powered set-top boxes.
According to the link below, ML records are mostly handled by hash-maps in Clojure, except that there’s no canonical key/val order or strict typing by default.
ML record:
{first = "John", last = "Doe", age = 150, balance = 0.12}
fun full_name{first:string,last:string,age:int,balance:real}:string =
first ^ " " ^ last
(It’s unclear from the example whether or not all of the destructured values are required in the function signature. I hope they are not, but I left them in since I don’t know. The caret positioning raises further questions.)
Destructuring a map in a Clojure function:
(defn full-name [{:keys [first last]}]
(str first “ “ last))
I don’t know if I’m missing something that ML offers with its records aside from more strict typing, which you can also have in Clojure when it’s useful. In both cases, it looks like it’s applied at the function declaration and not the record declaration.
Connection pools exist precisely because the code outside of the connection management piece shouldn’t have to care much whether or not there is an active connection. It’s all boilerplate, except for handling the “unable to connect” case.
When you call a connection or connection pool object, you’re querying its current state. This is absolutely data.
It emphatically does. It has pull-syntax just for the nested entities obviating the need for ORM. I am surprised I had to scroll so far down to see this. This is its biggest selling point IMO.
Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup. Ran into one scenario where apparently we weren't quick enough to migrate to the latest version, AWS had dropped support for $runtime in Lambda, and it became impossible to upgrade the CloudFormation template. Had to write application code to export/reimport prod data from cluster to another—there was no other migration path (and yes, we were talking to their enterprise support).
We migrated to Postgres and are now using a 10th of the compute resources. Our p99 response times went from 1.3-1.5s to under 300ms once all the read traffic was cut over.
As someone who is using Datomic Pro in production for many years now I must agree with you. One time I began a project with Datomic Cloud and it was a disaster similar to what you described. I learned a lot about AWS, but after about half a year we switched to Datomic Pro.
There were some cool ideas in Datomic Cloud, like IONs and its integrated deployment CLI. But the dev workflow with Datomic Pro in the REPL, potentially connected to your live or staging database is much more interactive and fun than waiting for CodeDeploy.
I guess there is a reason Datomic Pro is the featured product on datomic.com again. It appears that Cognitect took a big bet with Datomic Cloud and it didn't take off. Soon after the NuBank acquisition happened. That being said, Datomic Cloud was not a bad idea, it just turned out that Datomic Pro/onPrem is much easier to use. Also of all their APIs, the "Peer API" of Pro is just the best IME, especially with `d/entity` vs. "pull" etc.
I don't doubt your story of course, and I love Postgres, but comparing apples to oranges no?
Datomic's killer feature is time travel.
Did you simply not use that feature once you moved off Datomic (and if so why'd you pick Datomic in the first place)? Or are you using Postgres using some extension to add in?
We implemented it in Postgres with 'created_at' and 'deleted_at' columns on everything and filtering to make sure that the object 'exists' at the time the query is concerned with. Changes in relationships between objects are modeled as join tables with a boolean indicating whether the relationship is made or broken and at what time.
Our data model is not large and we had a very complete test suite already, so it was easy to produce another implementation backed by postgres, RAM, etc.
Yeah, it seems you could be able to substitute thoughtful schema design avoiding updates/deletes for time-travel as a feature.
I wonder if anyone has made a collection of reference examples implemented this way (and in general think that a substantial compendium good examples of DB schema and thinking behind them could be worthwhile).
I'm moderately confident you could mechanically transform a time-oblivious schema into a history-preserving one, and then write a view on top of it which gave a slice at a particular time. Moderately.
Yes, although AFAIK those hidden MVCC columns (xmin, xmax?) aren't very usable from an application standpoint -- the obsoleted rows only hang around until the next VACUUM, right?
I realize you're not claiming those columns are useful from an application perspective. Just curious to know if I'm wrong and they are useful.
Because as I understand it, the selling point of Datomic is their audit trail functionality and that is admittedly a bit onerous to implement in a RBDMS. Even though I feel like every project needs/requires that eventually.
I meant MVCC is the proof that you can automate the transform of a schema into a versioned schema. How and if the DBMS exposes that is another concern.
The garbage collection / VACUUM part of an MVCC system is the harder part, saving all versions and querying a point in time is the easy one.
Ive built a couple systems that would have been datomic’s bread and butter.
Each time the company was more comfortable with mainstream dbs, so we ended going with something like you’re talking about, built on top of a db. A couple of the projects were because a mainstream dbs wouldn’t scale.
The systems definitely worked, but it was also a lot of implementation complexity on an other wise simple business prop: “store this data as facts”
We use https://django-simple-history.readthedocs.io/en/latest/ (with some custom tooling for diff generation) for audit logs and resettability, and while you can't move an entire set of tables back in time simultaneously, it's usually sufficient for understanding data history.
Datomic's 'time travel' is an audit feature, not something for your application/business logic to depend on. Performance reasons make it impractical, unless you only have like 10 users and very little data.
That's certainly not how it sells and markets itself.
The first feature on benefits (and the only reason I've ever heard Datomic brought up and/or considered it myself for production workflows) is using that stuff in application workflows: https://docs.datomic.com/pro/time/filters.html#history
Could be you're saying it in fact doesn't work well performance-wise, that'd (surprise me but) certainly explain why it's not more popular -- but I think it's clear it wants you to use this as an application feature.
Datomic is great but as another commenter said, is good for "small-ish backoffice systems that never has to be web scale". You almost probably can rely on querying history for internal applications. I think their primary market was for companies to use it internally but they never made this clear.
> "small-ish backoffice systems that never has to be web scale".
Doesn't production use of Datomic by Nubank and Netflix (to mention just two examples) belie this assertion?
Are they _forcing_ you to use CloudFormation? Or is it just the officially supported mechanism?
> Mother Postgres can do no wrong.
I'll say that Postgres is usually the answer for the vast majority of use-cases. Even when you think you need something else to do something different, it's probably still a good enough solution. I've seen teams pitching other system just because they wanted to push a bunch of JSON. Guess what, PG can handle that fine and even run SQL queries against that. PG can access other database systems with its foreign data wrappers(https://wiki.postgresql.org/wiki/Foreign_data_wrappers).
The main difficulty is that horizontally scaling it is not trivial(although not impossible, and that can be improved with third party companies).
Yes. Postgres such a reliable and known quantity that IMO it should be the default choice for just about anything.
Don't misunderstand me. There are plenty of times when something else is the right choice. I'm just saying, when I have a say in the matter, folks need to clear that bar -- "tell me why tool xyz is going to be so much better than postgres for this use case that it justifies the overhead of adding another piece of software infrastructure."
Like, you want to add a document database? Obviously Mongo, Elasticsearch, etc are "best of breed." But Postgres is pretty capable and this team is already good at it. Are we ever going to have so many documents that e.g. Elasticsearch's mostly-effortless horizontal scaling even comes into play? If you don't ever see yourself scaling past 1,000 documents then adding a new piece of infra is a total joke. I see that kind of thing all the time. I can't tell if developers truly do not understand scale, or if they simply do not give a f--- and simply want to play with shiny new toys and enrich their resumes.
I mean, I've literally had devops guys telling me we need a Redis cluster even though we were only storing a few kilobytes of data, that was read dozens of times daily with zero plans to scale. That could have been a f'in Postgres table. Devops guy defended that choice hard even when pressed by mgmt to reduce AWS spend. WTF?
> Postgres such a reliable and known quantity that IMO it should be the default choice for just about anything.
This is being repeated so often. And yet — the above is true, IF (and that's a big if for some of us) you are OK with having your database on a single machine.
If you want a distributed database with strict serializability, where some nodes can go down and you still get correct answers, Postgres is not it.
Totally agree. That's really my thinking as well. Default to Postgres unless you have a reason not to choose it, and a need for distributed serializability is one of those cases where Postgres is an easy "nope, not suitable."
But I've also been burned by people reflexively reaching for $SHINY_NEW_TOY by default, when really there is no need. Architects and senior-level devs are the worst offenders. They throw a bunch of needlessly buzzword-compliant infra at a problem and then move on. They have the time and freedom to learn $SHINY_NEW_TOY well enough to MVP a product, but then the project is passed on to people who don't have that luxury.
I feel like there's a progression that often happens:
1. Early engineers: stick to Postgres or another RDBMS because it's all they know
2. Mid-stage engineers with "senior" in their title for the first time: reach for $SHINY_NEW_TOY
3. Late-stage engineers: stick to Postgres because it's something the whole team already knows and they recognize the true long-term cost of throwing multiple new bits of software infra into the mix
> Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup.
You should give TerminusDB a go (https://terminusdb.com/), it's really OSS, the cloud version is cheap, fast, there are not tons of baroque settings, and it's easy to backup using clone.
TermiusDB is a graph database with a git-like model with push/pull/clone semantics as well as a datalog.
As far as I could tell, there was no straightforward way to point a new instance of the compute resources back at the old storage resources since they're all provisioned in one CF template.
They say it's under the Apache 2 licence, so it is open source.
EDIT: I was wrong. They actually released binaries under the Apache licence, not the source code. Which is, mildly said, deceptive. I don't even have an idea what that actually means.
They say the binaries are being made under Apache 2.
They don't say anything about the source code being published. That's why (to me) this is so interesting. I've never seen binaries released without source code before.
What is even the point of releasing binaries under Apache 2? When I patch the binaries, do I need to release a hexdiff too to fulfill my Apache obligations? Very weird.
Making your product available for free isn't a publicity stunt, it's a huge step for a business. And, in practice, it's not that much different for the average user if only the binaries are Apache licensed. When was the last time you needed to open up the Postgres source code and modify something?
If it wasn't a publicity stunt, it certainly had the effects of one: I've never heard of Datomic before and here they are at the top of hackernews!
> And, in practice, it's not that much different for the average user if only the binaries are Apache licensed. When was the last time you needed to open up the Postgres source code and modify something?
Sure, if you're playing a game it probably doesn't make a difference. If I'm building my IT infrastructure on a product, tt makes a huge difference if I get a an open-source-licensed "binary" or access the to source:
- the package they distribute contains no less than 960 different jars. Most of those are the standard apache-project-everything-and-the-kitchen-sink-style dependencies. Say I'd like to update log4j because it contains a catastropic vulnerability that datomic decide not to fix. (not that that sort of thing ever happens)
- or say Datomic decides to abandon the product altogether or goes out of business
- or say I'm not happy with their quality of service contract around their DB they support and would like to work with a different company
Rich Hickey started Datatomic (along with Stuart Halloway & Justin Gehtland). He also created the Clojure programming language and has been on Hacker News numerous times with many popular talks. In fact they all have made famous contributions.
Many businesses use Microsoft SQL Server or Oracle and don't need access to the source. I'm not saying open source isn't nice, but it is absolutely not a requirement for IT infrastructure.
I'd imagine people rely on many cloud services that are in fact, not open source.
Again, with your hypotheticals—when was the last time you needed to do any of that with Postgres or another FOSS DBMS?
For the vast majority of use cases, a FOSS DBMS and a free-as-in-beer DBMS are indistinguishable. If you're in a category where they're not, then don't use Datomic, but this is still far more than a publicity stunt.
We must be working in a different world. In all my career I've not once worked with a serious business that did not have a support contract for their database system open source or not.
Most of those had escrow agreements for central closed source components with vendors in case the vendor went out of business. (obviously only for things perceived as critical and from companies with some perceived risk of failure).
And god knows how many times have I experienced companies biting themselves because they bought into a product that turned out not to deliver what was promised after the contracts were signed.
Free beer binaries are not mutually exclusive of Enterprise support agreements featuring all those things you mentioned above _for people that need that_.
Completely agree. I'm fine with a free beer license. The context of the post is that the binary is licensed using an Open Source license which leads to confusion.
Never. On the other hand, I have considerable confidence that I could do so, and that if something goes wrong with upstream development, someone is likely to do so.
If I use a free-binary-but-no-source product, I’m much more likely to get stuck.
(Of course, as a regretful MySQL user, I am pretty stuck, but largely because MySQL is, in many respects, a terrible product. It does, quite reliably, get security updates at the kind of general maintenance that keeps it working no worse than it ever did.)
Today I looked up pgvector's NixOS availability. For the past 15 years I have relied on postgis source being available and improved by the community for my day to day business.
My point is that the option to modify the source results in software bein available and community maintained in a way that binary only isn't. Even if I change the source myself just twice a decade.
Someone (forget who but he worked there) was giving a presentation of Datomics in some downtown (NYC) bank circa 2014 iirc. Per the presenter -- iirc someone asked a specific technical question -- even people working for the company don't get to see the full source. Only a small team has access to the full source, and he said he wasn't one of them.
But Maven Central has strict rules around what can be published there. I just double checked and it's a requirement to publish the source as well as the binaries:
it seems you're right, but it also says the following, so i'm confused on whether it's a hard requirement?
"If, for some reason (for example, license issue or it's a Scala project), you can not provide -sources.jar or -javadoc.jar , please make fake -sources.jar or -javadoc.jar with simple README inside to pass the checking. We do not want to disable the rules because some people tend to skip it if they have an option and we want to keep the quality of the user experience as high as possible."
Datomic is an event-sourced db, and it makes it hard to introduce retroactive corrections to the data when your program's semantic already rely on using datomic's time travelling abilities: at one point you'll need to to distinguish between event time and recording time as explained in this excellent blog post:
Datomic is an operational database management system - designed for transactional, domain-specific data. It is not designed to be a data warehouse, nor a high-churn high-throughput system (such as a time-series database or log store).
It is a good fit for systems that store valuable information of record, require developer and operational flexibility, need history and audit capabilities, and require read scalability.
Just the temporal properties alone make it very useful for anything where it matters like billing, finance, inventory. Else you are in views/schema/indexing hell to do it on top of SQL.
There is some SQL temporal support but it's not great and varies a lot. Also since it's not native to the storage it has a lot of complexity issues under the rug making it not great.
Many financial systems use Event Sourcing (OOP + ORM). I had to suffer this at a previous employer.
The temporal support seems handy, but time is still going to be really tricky for financial systems. Datomic only covers what the physical state of the database was at a particular time, but there's also the effective legal time (maybe a payment was dated a day before the system actually processed it) as well as requirements to remove data after a period of time (including point in time stuff).
Indeed, it depends a lot on the domain. Datomic only has "technical" database time, and doesn't have any built-in way of modelling domain time. You can set the transaction timestamp manually when you write, but you can't set it to be earlier than the latest transaction that was committed. So, if you want your domain modelling to piggyback on Datomic time travelling, you can only do things like delaying writes for, say, an hour, and hope you have all the data by the time you commit to db.
This is not an answer, it's the beginning of a question. Yes sure, we know `create table` and we know it's a good idea to record the execution timestamp. What exactly do you put in the place of the three dots?
Fyi, Datomic lets you look at the entire database at any point in time, as an immutable value. Also, you can annotate transactions with metadata, and query for "which tx wrote this specific value for this row/column" and look at custom metadata you added to the tx to reason about your system. Doing all of that in SQL is not trivial, to say the least.
You have to add a lot of scaffolding to postgres to make it semi-immutable. datomic just is, wanna know previous user email, just go back and see. Out of the box, without thinking about it.
Immutability is certainly tempting for certain kinds of data. Does it handle use-cases where data needs to be deleted though? i.e., privacy compliance.
Temporality in general becomes super handy if you have something like reports that need to be consistent across time. Or if you want to ask questions about the past. Or questions about the future without affecting the present.
Because its a different model of integrating your database and your app.
It allows you to write queries in a pull style, it can be trigger based, datalog or raw index access. Its by default immutable and allows historical query. It allows meta data on the transaction themselves.
A lot of the time the user builds much of that himself or relays on frameworks to do it.
A tangent but it would be interesting to see survey data of how many devs default reach for SQL first these days. A lot of people use various other kinds of DB models which are preceived to have smoother learning curves.
> Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
So... no?
(I say that, but "Datomic binaries" presumably refers to compiled JVM class files; and JVM bytecode is notoriously easy to decompile back to legible source code, with almost all identifiers intact. Would Apache-licensing a binary, imply that you have the right to decompile it, publish an Apache-licensed source-code repo of said decompilation, and then run your own FOSS project off of that?)
Oh, that's today and tomorrow. I had been waiting for that, I could have sworn earlier this year they said there was going to be a live stream available. Maybe it didn't pan out.
If that's the case then all the people running around praising the ""simplicity"" of golang got the wrong end of the stick completely from Rich's classic presentations.
The thing is, Hickey was entirely right to reject this idea of “Simplicity” being the same thing as “Easy”. But he then decided to conflate in “comprehensible” which it turns out is very much a matter of aesthetics.
Turns out if you really focus on composability above other concerns you get Haskell.
> But he then decided to conflate in “comprehensible” which it turns out is very much a matter of aesthetics.
Did he? I seem to remember a quip in one of the presentations about German not being comprehensible to him being his own problem, because he never learned German.
The problem is his disdain for "easy". It's one thing to create composable building blocks, but frequently, 99% of users will use them in the exact same way, so you might as well produce the easy thing on top of the simple thing, too.
The Clojure tools.build process has this exact problem. It's too low-level, so everyone got to write their own build scripts on top of that to do the same thing as everyone else. Now there's a situation of a 1000s of bespoke build scripts and 3-4 different front-ends, all effectively doing the same thing.
"In my line of work, sister, sometimes a second chance is a chance to mistake the same mistake twice" - State and Main
"Easy" vs "comprehensible" is exactly the trade-off Haskell makes: it might be difficult to learn, but it's easy to reason about once you learn it. (Of course, both sides of that equation come with their own caveats...)
Clojure is just really niche. Even an ecosystem that stays the same size overtime is dwarfed by the continually growing, continually polished competitor ecosystems that will pull people from the small niches.
The best things about Clojure are things you don't really appreciate until you've already done the work to learn them.
For example, I never would have known how amazing it was to evaluate code inside the editor until I did the work of learning Emacs + evil-mode + nrepl/cider + whatever so that I could spin up my http server in-process and then modify code without restarting everything. Even today I'm doing `nodemon index.ts` like a goofball.
I stopped using Clojure simply when I met someone who wanted to build some big projects with me and, despite appreciating that Clojure was probably amazing, they simply couldn't be bothered to learn it. Fair enough. It was when Javascript was just getting `yield` coroutines (before async/await) which finally made Javascript bearable for me enough to switch to it for server work.
Clojure just has increasingly compelling languages and ecosystems to compete with, yet it has a huge ramp up, being a lisp, that make it hard for people to choose it.
Just consider how, to even write Clojure comfortably, you really need something like Paredit. Else, what exactly are you going to do if you want to nest or unnest a (form) deeper in the ast structure? Manually rebalance parens? Cut and paste it? Only Paredit lets you easily move a (form) around the ast. And it's amazing but yet another tool you have to learn just to truly evaluate Clojure.
People who use Parinfer rather than Paredit when learning Clojure really like it. The concept is available in the Clojure mode in all the usual suspects of IDEs (IntelliJ Cursive, emacs Cider, etc.)
The size of the community is not as important as the value of what is available. It's just not widely known, or maybe there's still much potential yet unrealized.
This is an oldie but a goodie on how interactive an experience you can have when using Clojure on the front end and backend of a web app:
This project is the most recent and most promising iteration of someone making a very visual interactive tool for introspecting on data structures in the REPL environment
As far as the metrics state, Clojure never faded away. It's on an upward trajectory still, albeit slowly. Perhaps it faded away in terms of "HN hype of the day".
That said, I've been (and currently am) a Clojure engineer for the past 5 years and loving it. Quite a lot of jobs out there, more and more each time I look, healthy ecosystem and friendly community. It doesn't hurt that it's the most paid programming language as well.
The jobs seem to be pretty available in Europe, but I've been struggling to find Clojure work over here on the west coast of North America. I've had to bite the bullet and am looking for work in other languages now, though I'd really rather stay in the ecosystem.
Are you active on the Clojurians Slack channel? On #jobs and #remote-jobs channels every now and then a US job pops up there. You can also post on those channels that you are open for work, might help. I'd also check out some EU job offers (unless the salary is way lower compared to your location) because there are companies that do async and wouldn't care for timezones as much.
I've used Clojure personally for >10 years and done a bit with it commercially along the way. (I also have a personal affection for Lisp-like languages that goes back to at least the mid-90's.)
> How did that work out for you?
For the personal projects, it's been incredibly useful. The language fits the way I think, and being built on the JVM, it has both a performant runtime and lots of access to a wide ecosystem of libraries.
The Clojure-specific ecosystem library has been accused of being stagnant. I tend to take a more charitable view. The libraries I've used have tended to be smaller in scope and more well defined in terms of feature set. This makes it easier for them to converge on a feature-complete state, and many of them have done just that. If you don't mind assembling multiple smaller libraries into the useful whole you need, this can provide a stable platform on which to build and develop expertise and higher level libraries.
For larger scale commercial work, it's a harder sell. As you've pointed out, Clojure is not hugely popular, so it's fundamentally a minority language. This can make VC's touchy about funding. This is true to the extent I'm aware of at least one organization that started moving away from Clojure for that reason.
There's also the shape of the learning curve. It can be hard to get started with Clojure because of the issues around the syntax and associated tooling. The more piecemeal aspect of the library ecosystem can then make it harder to get to hit the early successes a larger framework-oriented approach can give you out of the box. You can get there, but it at least takes more initial effort. The same is true for all the abstractive power of Clojure (and other Lisps). Abstractions are nice, but they take time to develop and the payoff is on a considerable lag. The useful rule about waiting to abstract until after you see 2 or 3 instances of a pattern means you need to at least have spent enough time to see those 2 or 3 instances (and maybe a few more) before you really start to see the payoff in your own code.
The net of all this is that it's a language that may make it more difficult to get funding, will be initially somewhat confusing to most developers, and the payoff may well be deferred to the point you don't see it before you give up (either out of frustration or due to external factors). All in all, a considerable set of headwinds.
So what does that mean? It's probably better for projects on a longer time horizon that have a team willing and able to put in extended effort to effectively use the language. (And if the team is not self-funded, good to have a funder with some ability to accept the risk of a non-conventional solution). Not saying these projects don't exist, just that they're not common enough to build a 'popular/mass-market' ecosystem on.
The whole clojure ecosystem and the wonderful tools around it never really took off due to unclear documentation, poor onboarding and too few evangelists. Datomic and other products are really cool but are now being given away as scrapware due to this lack of effort to make the whole ecosystem more palatable, colorful and easy to get into for new audiences.
Even as a Clojure hobbyist I feel like all of these points are off? Between 4clojure, clojuredocs and the slack channel and the surprising number of books available, the onboarding and docs are great. And when I think of my favorite lang evangelists, hickey and nolen are absolutely #1 and #2, and have influenced me heavily, despite my day job not involving Clojure at all.
Every single day I wish the architects at my current job had chosen Datomic instead of Postgresql.
It would have saved us so so much time and trouble.
The time traveling ability alone would have been so useful so many times.
Also the ability to annotate transactions is awesome.
This doesn’t quite reflect the history. Datomic had various free/trial options. They evolved a little bit. Someone who watched the pricing and licenses very closely probably could do a better timeline than I could.
Right but it was always very clear “you can have it in dev for free, but prod is $$$$$”. It was not something like “use it free in prod as much as you want, or pay us for support”
I had a few projects it would have been cool on, but I just did postgres instead and won in the long run.
Not complaining about the actual announcement itself here: seems pretty sweet all things considered, But: the "Is it Open Source?" section should lead with "No." It's not a complicated question, and it's not a complicated answer. I think it's weird to talk about having "all the same rights" without explaining why that matters particularly (it does matter, it's just not explained much!) but it is somewhat tangential to the question being posed which has a very clear and straightforward answer.
I hope more companies consider this unusual arrangement at least as an alternative to other approaches. Permissively licensed binaries can come in handy, though it certainly comes with it's risks. For example, Microsoft released the binaries for its WebView2 SDK under the BSD license; this is nice of course, but the side-effect is that we can (and did) reverse engineer the loader binary back to source code. I suspect that's unlikely to happen for any substantially large commercial product, and I am not a lawyer so I can't be sure this isn't still legally dubious, but it's still worth considering: the protections of a EULA are completely gone here, if you just distribute binaries under a vanilla permissive open source license.
Probably. They were hoping that the less-informed (like me) would see "The Datomic binaries are being released under the Apache 2.0 license" and think "Datomic is relicensed under Apache 2.0".
Came here to search for the mention of XTDB (formerly known as Crux). I have never tried Datomic and only trivially played with XTDB, does anyone know if there's a comprehensive comparison of the two?
update: So, I asked chatGPT and this is what I got:
----
ChatGPT> Do you know how Datomic and XTDB differ?
Yes, Datomic and XTDB are both database systems, but they differ in a few ways:
1. Architecture: Datomic follows a client-server architecture, where the client sends requests to the server for data manipulation. On the other hand, XTDB is designed as a distributed database system, where data is automatically partitioned across multiple nodes for high availability and scalability.
2. Query Language: Datomic uses the Datalog query language, which is similar to Prolog, for querying the database. XTDB, on the other hand, provides a modern and expressive query language based on Clojure.
3. Data Model: Datomic stores data in the form of entities, attributes, and values. XTDB uses a document-based data model, where each document is a self-contained unit of data.
4. Distributed Transactions: XTDB supports distributed transactions while Datomic does not. This means that XTDB can guarantee the consistency of data across multiple nodes in a distributed system, while Datomic cannot.
5. Open Source: XTDB is open-source, while Datomic is a commercial product, which requires a license to use.
Overall, both Datomic and XTDB have their own unique features and advantages, and the choice between the two depends on the specific requirements and use case of the application.
----
I dunno, it sounds like XTDB is a clear winner here, no?
I'm tickled you asked ChatGPT but I think it misses the the forest for the trees somewhat.
1. Architecture - XTDB is designed to work with Kafka, but it also works with typical JDBC databases as a datastore. Datomic can use SQL databases, DynamoDB or Cassandra if you want distribution. I personally would think long and hard before I introduced a distributed database to my organisation "because it scales".
Part of the value proposition of Datomic is it easily scales read workloads horizontally and by isolating read and writes into separate processes it improves write performance significantly. The metric usually thrown around is 75% of CPU cycles in a traditional RDBMS are concurrency coordination, which is avoided by the Datomic model. That number is quite old now so I don't know if it's still accurate as of 2023.
2. Query language - both use Datalog and support the Datomic `pull` syntax. XTDB also supports SQL.
3. Datomic's EAVT quadruplets are a compelling feature because they are so generic and can be used/re-used in many contexts. A document database would have to fit your use case pretty directly.
4. Datomic has a single transactor process. Do you need distributed transactions? Does Datomic need distributed transactions? You'd have to find someone from say, Nubank, and ask them for war stories. :-)
5. Datomic is now free-as-in-beer.
In my unqualified opinion XTDB is appropriate to choose in the following situations:
- You need to model "valid time" as part of your domain.
- Do you want a document database and are happy with everything that entails?
- You need access to the source code of your database.
- Do you have existing analysts who know SQL but don't know or can't learn Datalog?
I didn't realize it was possible to release binaries under a different license from the source code that generated them. In this case, is the "source code" just the physical machine code or bytecode in the binary?
You can attach any license text to anything, but most open-source license make little sense when applied to binaries. Like in this case, the Apache 2 license doesn't make distinction between the source and binary, referring to both as "the Work":
> You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that (...)
Applying this only to their binaries directly contradicts what the license says.
Disclaimer; I'm not a lawyer, nor do I play one on TV.
A copyright license is a copyright license: in theory, all a copyright license does is give you additional rights to use something. Using a license like Apache 2 for binaries is somewhat unconventional, but it's totally possible. It (obviously) does not give you access to the source code, and I think this could never work with the GPL and other copyleft licenses because they use wording that implies you need to distrubute the source code, which you don't have.
The copyright owner, of course, has ownership, so their obligations don't really change by virtue of giving someone a copyright license. As far as I know, they could give someone a license to use something that is completely invalid and could never actually be used, and they can definitely do things like stop distributing under one license and switch to another. They own the source code, and they own the binaries (I believe the binaries would be considered a sort of derivative work in copyright terms, but again, not a lawyer.) So when they distribute a binary under a given license, it's unrelated to any particular distribution of source code. The only time this gets complex is when the ownership of a asset is split among many disparate parties, at which point everyone is pretty much beholden to the copyright licenses; like open source projects without CLAs. But if they own the source code entirely, they could, for example, distribute some source code under GPL, but then distribute modified binaries under a commercial license with a EULA, and not redistribute the modified source code, since it's their code to license to others, not a license they are subjected to themselves.
If you think about it, that's happening every time that you get a closed-source binary distributed to you. They're giving you a license to the binary, but not to the source.
It's certainly weird for the binary license to be Apache, rather than some proprietary EULA, though.
I am not a lawyer, but doesn't the Apache license specifically grant the right to redistribute the source code?
> You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form
It’s runs on a JVM, right? Presumably it could be decompiled and cleaned up (might be a massive task - but possible), and the reconstructed source would fall under the Apache license.
I thought the reason was the explicit patent language and lack of license virality.
The Apache license 2 is pretty clear that binary only distribution is allowed, but I think it’s also clear that the assumption is that source is available in some form. Otherwise, why would you care about derivative works?
As is, it would be possible to decompile the JVM code into something resembling source code and then distribute that with or without modification. Which just seems odd to me.
There are projects in Clojure to decompile Clojure-derived Java bytecode to Clojure. It works pretty well overall since Clojure's compiler isn't too fancy.
I'm not sure, but I think from a marketing perspective, you do not want to write "No" anywhere. I might be wrong, and I'd love if someone with knowledge in this area could answer. It's something I recall from the back of my mind...
Marketing to developers is also decidedly different than for non devs. We've been baited and switched, and screwed over a million times (thanks Google). It takes a lot of trust building to get me to trust any organization.
I especially don't trust free shit if I don't have access and control of the future of the code. Even then it's not a sure thing.
I don't know what you are calling "good". That is a super loaded word. I would say it is effective though. The baseline evidence for that is how much money gets spent in that industry, I guess.
While I see your point, please note people actually spending money don't just look at how much money is in the industry. They have internal metrics and can track performance of individual marketing campaigns. I called it a baseline argument because I'm not a marketing expert so this it the only real KPI __I__ understand.
My argument is that you are not alone: people in the industry don't have much else to go on either.
Marketing performance is an incredibly confoundable variable.
For example, I have heard one commercial at least 5 times in the last week: advertising open positions at McDonald's. Part of that very commercial stated that, "1 in 8 people have worked at a McDonald's." I am literally one of those people! How could anyone possibly measure the effectiveness of that commercial? Is it even meant to be effective at all?
The overwhelming majority of advertising I see is from "household name brands". The notion I have heard is that the goal is not to introduce themselves to new customers, or even drive more traffic to their brand: it's to keep their status as a "household name". Do they do this because it is effective, or because they are simply big enough to afford it?
Given the recent bait and switch moves by many companies from free-for-everyone open source to free-for-users-only (eg apache to agpl), it seems like doing this with "binaries only" practically admits to a bait and switch plan.
> Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
That doesn't answer the question at all. I assume the answer is no, because otherwise they would just say yes, and have a link to the source code somewhere. But that is such a weird, and possibly duplicitous way to answer.
I really like Clojure and the ideas behind Datomic but free without source is a trap, every time. They have to make money somehow, but they already sold to a bank. If that bank wants devs willing to work on their systems after the current generation moves on, I think they'd be better off going open source and to continue paying good devs to work on it. Everyone already knows lock-in is bad for businesses. Devs will seek non-proprietary solutions first, if they can't find it, there are already plenty of proven proprietary solutions they'll settle on way before Datomic. Open the source, sell the support.
Since the conversation seems to be focusing on the Apache 2.0 license, what would you do? Clearly there isn't a lot of precedent for "closed-source, free-to-use" licenses.
In this case Datomic maintains development control over their product and "source of truth" is still themselves, and the implicit assumption is that you enthusiastically use their product for free with no strings attached because you respect them as the source of truth.
My personal experience was using Datomic backed by DynamoDB, at the second Clojure company I worked at. In particular I remember feeling like it was hard to anticipate and understand its performance characteristics in particular, and how indices can be leveraged effectively. Maybe if we had chosen Postgres as a backing store that would have been better? I dunno.
Using it was pretty nice at the scale of a small startup with a motivated team, but scaling it up organizationally-speaking was a challenge due to Datalog's relative idiosyncrasy and poor tooling around the database itself. This was compounded by the parallel challenge of keeping a Clojure codebase from going spaghetti-shaped, which happens in that language when teams scale without a lot of "convention and discipline"--it may be easier to manage otherwise. All of that said, this was years ago so maybe things have changed.
At this point I'd choose either PostgreSQL or SQLite for any project I'm getting started with, as they are both rock-solid, full-featured projects with great tooling and widespread adoption. If things need to scale a basic PostgreSQL setup can usually handle a lot until you need to move to e.g. RDS or whatever, and I'm probably biased but I think SQL is not really that much worse than Datalog for common use-cases. Datalog is nice though, don't get me wrong.
EDIT: one point I forgot to make: the killer feature of being an immutable data store that lets you go back in time is in fact super cool, and it's probably exactly what some organizations need, but it is also costly, and I suspect the number of organizations who really need that functionality is pretty small. The place I was at certainly didn't, which is probably part of the reason for the friction I experienced.
Newer releases have improved significantly in this area. It's now possible to understand perf implications with the addition of io-stats[1] and query-stats[2].
Although it is true that "time traveling" queries are relatively rare for production needs, the basic architecture supports things that many applications really need:
- It is possible to make queries against the database PLUS additional data not yet added, that is, "what if" queries
- Having a stable database-as-value is really useful for paginating results; you don't have to worry about new values being inserted into your results during execution, the way you do with traditional databases no longer how long (minutes, hours, even days) you take to traverse the data
- Reified transactions makes it possible to store extra data with each transaction, trivially, such as who made the update and why
- Immutability is amazing for caching at all layers
https://sayartii.com/ is using Datomic stored on postgres that I have set up on Linode. That was all done back in 2020 and haven't needed to touch it. Site now gets ~180M monthly reqs and I store an enormous amount of analytic data on Datomic (was supposed to be temporary) so users can see impressions/clicks per day for each advertisement. I'm surprised it's still working.
Development experience is extremely nice using clojure. I've used it for two other projects and has been very reliable. My latest project didn't really need any of its features compared to a traditional rdbms but I opted for it anyways so I don't have to write sql.
> “Datomic added to DynamoDB was the only option that didn’t force us to sacrifice features or add the expense of developer time. Without it, we would have had to push back a lot more, as the features would have been too difficult.”
some of the other testimonials mention keeping revision history, which is neat, but why Datomic vs. others? it's pretty easy to keep revision history with other databases too.
It's not simply revision history, it's a complete record of everything with time, without re-architecting your data or app. IIRC datomic structures your data so that all transactions and state have a time dimension so you can go forward or back in time trivially (no special query, no temporal sql, etc.)
There's https://docs.datomic.com/pro/reference/excision.html - but like in other data models you also might choose to not store sensitive infromation like PII in cleartext in the main DB at all. At least in earlier versions excision wasn't supported in the Datomic Cloud version.
I wonder if this is related to lack of guarantees about actual data erasure on delete in the backing storage. A lot of users probably don't take this into account when building on top of cloud storage servies.
Worth keeping in mind that Nubank owns the company that makes Datomic, so that might colour their opinion. On the flip side they probably wouldn't have bought the company if they thought their product was crap.
Congratulations to Rich Hickey's children!! I hope your college experience was excellent. Disclaimer: that is how Rich explained why Datomic stayed closed source.
There are already a few open-source alternatives that run datalog variant query languages. I'd point the curious towards TerminusDB [1] and TypeDB [2]. TerminusDB is implemented in prolog (and rust) so an alternative with datalog in the heart.
It gives the licensee the ability to distribute the binary, use or include it in their products in the same way as an open-source product. It just merely prevents modification without decompiling (which i assume is not easy given it's clojure, not to mention obfuscation?). And presumably it makes it less likely someone would just produce a competing product if they should choose to re-monetize it?
Yeah, it's a complete bullshit move. Mongoose OS (an embedded iot Plattform not the db) does something similar. It's extremely weasely and doesn't instill trust at all.
There is nothing wrong with saying "It is not open source but you can freely use the binaries". That is the same thing, but upfront about it, but this feels like open-source-washing.
Nothing wrong with that at all. But that's not what was said. Answering the question (that you posed to yourself in your own FAQ): "Is it Open Source?" by stating: "we've licensed the binary using an Open Source license" seems a bit disingenuous. A more matter-of-factly answer would have been: "No".
I saw Fabrice Bellard do it this year (MIT): The CPU version is released as binary code under the MIT license. The GPU version is commercial software. Please contact fabrice at bellard dot org for the exact terms.
> perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
Doesn't this mean, that, as soon as I (somehow) get hold of the source code, I can distribute it as I want?
That would be very ugly source, as Datomic is written in Clojure and AOT compiled to Java bytecode. Due to the architecture of Clojure (especially, the use of macros) it is not exactly possible to work backwards from JVM bytecode to anything that looks like the original source code. It's not like Java where a clever decompiler can exploit output patterns generated by the Java compiler to make reasonable guesses at the structure of the source code.
But this is all besides the point; Datomic is now free (as in beer) with a great license (Apache 2.0). You can use this amazing tool for free, and you have as much need to look at the source to do so as you might need to look at PostgreSQL's source.
Some of us have been hoping for this day since Datomic was first announced, but even as an insider (I have been working at NuBank NA for less than a year) I was stunned at the speed with which this decision was made and implemented.
> Due to the architecture of Clojure (especially, the use of macros) it is not exactly possible to work backwards from JVM bytecode to anything that looks like the original source code.
I mean, presumably if you run a Java decompiler over it you'll get perfectly-legible Java source code. Just Java source code that makes a lot of calls to methods defined in the Clojure stdlib.
I'm guessing it would look a lot like what an Objective-C or Swift program looks like when you throw it into a C decompiler: a lot of boilerplate for imperatively building stack-local data structure temporaries to be passed into runtime functions, but otherwise the same code you'd expect.
> But this is all besides the point; Datomic is now free (as in beer) with a great license (Apache 2.0). You can use this amazing tool for free, and you have as much need to look at the source to do so as you might need to look at PostgreSQL's source.
Personally, I don't want to use Datomic as a tool; I want to use Datomic as a bag of parts. I want to pilfer the major components and libraries within the Datomic codebase, reusing them inside other things. (You know how CockroachDB and Clickhouse both implemented PostgreSQL-syntax-compatible binary wire protocol in part by taking the parser.y file directly from Postgres's codebase? I want to do that kind of thing, with Datomic. And probably using some pieces of Neo4j and/or Apache BEAM at the same time.) I also want to study the data structures and algorithms used to implement Datomic, to better port the concepts used in Datomic to other databases.
If Datomic was a true FOSS project, doing all of that would be simple/easy. With just a binary, though, I can't do any of that.
Some of the things that Clojure generates are valid bytecode for which there is no Java source code equivalent.
Reusing the pieces:
In general, Clojure works so well because it is all of a piece, with many decisions and subsystems working together. Datomic's source is the same way, you can't really consume just part of the elephant, even if you had the source code. Many things that Datomic does simply don't make sense at all out of context.
There are already some open source alternatives to datomic. TerminusDB (https://github.com/terminusdb/terminusdb) for example is implemented in prolog (and Rust) so has the datalog variant query power that makes datomic so powerful. If you want free as in speech (thou I love free beer).
XTDB is also worth mentioning, especially since they’re on the HN front page with a v2 early access announcement. There are differences in how they do things. I can’t meaningfully comment on business usage of either or what the trade-offs between them are.
Hasn't everyone learned that "store all the history of changes" is an anti-feature? The Legal departments generally do not care for this (its just more data to make sure you deleted). And it makes schema migrations more painful as not only do you have to migrate the data you have now, but all of your historical data too! If you add a new property do you backfill it in your old data (to keep your code working)? Or start special casing old version in your code? Neither is pretty.
If you want historical audit trails, make them intentional and subject to the same rules and patterns as your regular data.
My view is that Datomic is a novel upstart in the persistence space. Most of their competition - Postgres, Mongo, Cassandra - is open-source, so they're just shooting themselves in the foot. The "pay us extra for convenient hosting and consulting" model isn't threatened by open-source in the slightest.
The only thing I can think of is that they're trying to compete with Oracle/Db2/SQL server, but I can't imagine an enterprise eyeing any of those solutions ever giving Datomic a chance.
Sounds fantastic - I'd love to try it. I've been keeping an eye on Clojure and Datomic for years.
I always wonder if this sort of move portends an exit of some of the core technical team, who would very much like to fork the codebase and move on, but in this case with only the binaries being opened up, it feels more as though they want some more people to try Datomic out. Databases such as Neo4J do this as well - free to run, but you'll probably want to pay for support.
I saw some people said that they love Datomic but still felt that Datomic's performance is not as good as Postgres, especially in OLAP queries.
Actually, you get the best thing from the two world. Plenish is a library that allows you to sync the content in Datomic to Postgres.
https://github.com/lambdaisland/plenish
> This section only applies to Datomic 990-9202 and lower. Newer versions of Datomic Cloud will be free of licensing related costs, and you will only pay for the hardware that you use to run the system.
RDFox is worth a try as an alternative, also datalog but C++ based, has incremental reasoning, and explainability. It's a database but also a rules engine that can chain any number of rules. As far as I know datomic is unique for its "query the database at any point in history" and incremental tracking of schema changes, easy to use UDFs, it really shines above other databases in that context.
2007 - the Clojure programming language is announced by Rich Hickey and gains quite a bit of traction over the next 5 or 6 years. It never becomes a "top 5" language, but it could still today be arguably considered a mainstream language. It's been endorsed as "the best general purpose programming language out there" by "Uncle" Bob Martin[1] (author of Clean Code) and Gene Kim[2] (auther of The Phoenix Project, the seminal DevOps book). The fact that Rich spent two years working on it without pay and without the commercial backing many other languages enjoy is a real testament to his commitment and his vision. A Clojure-related emacs package[3] quotes Rich when it starts a REPL: "Design is about pulling things apart."
2012 - the Datomic database is announced by Rich Hickey's company. The database is praised for its ingenuity and its "time travel" features. It was designed to be deployed anywhere in the beginning, but, over time, it became difficult to deploy outside of AWS environments and even the AWS deployment path was quite cumbersome--the Datomic marketing page used to feature a maze-like diagram of all the AWS-specific features needed to make the thing work (it would be nice to find a link to that picture); I'd think most companies would have trouble digesting that and integrating it into their technology "stack".
2020 - Nubank (a Brazilian fintech backed by at least one US venture firm and a large production user of Datomic) acquires Rich Hickey's company. It appears Datamic never gained much use outside of a handful of companies. Making it free of charge (2023) may be the cost-effective thing to do in such a situation if it costs more to handle billing and payments than are brought in. The reason they're not releasing the source code could be legal one or simply the fact that open sourcing a large piece of software takes a lot of effort--something a for-profit financial services company like Nubank doesn't prioritize (rightly so).
The 2012 section seems not correct. In the time between 2012 and 2020 I deployed Datomic in various non AWS environments. Datomic was never particularly tied to AWS. I think your timeline also misses Datomic Cloud, which was an AWS exclusive product that launched much later than 2012.
If I recall correctly, Datomic gives you the ability to query the database at a given timestamp. Are there other DBs with this feature that folks are aware of?
Curious why you're asking this question when it seems to have little to do with Datomic going "free"? Did you mean Datomic WASM on the horizon? Or am I missing some other connection between SQLite and Datomic?
Anecdotally, all the ones I've worked it where I've used SQL have needed it. I've always ended up wondering when, why and who changed an attribute to its current value, but that's not knowable unless you jump through hoops and manually implement it.
It seems that they give the binaries for free but they won't release the source code. Can somebody explain to me what's the point of keeping the source closed in this case? I really can't think of any reason
Licensing issues, keeping the door open to making it not free again, greed, lack of understanding from management/lawyers/whatever, not wanting to deal with contributions (though here you can do what SQLite does), false sense of security, etc.
There are valid reasons not to release the source code that have nothing to do with “security by obscurity”: legal, various notions of “control”, and more
> And I’d add that security by obscurity is also a valid reason. It’s bad as a standalone strategy, but good as a complementary strategy.
As the thread you link mentions, the phrase “security by obscurity” historically means (more of less) “security primarily by obscurity”. But sometimes this point gets lost. The thread you mention is interesting.
Wikipedia:
> Security through obscurity (or security by obscurity) is the reliance in security engineering on design or implementation secrecy as the main method of providing security to a system or component.
Summary:
Layers of security (which can include a wide range of techniques, including obfuscation, etc): useful, because delaying attacks and/or making them less likely is useful.
Obscurity as a main method: theatre, because it often leads to self-deception about the true risks involved
Nubank’s goal to keep the Datomic source code private remains secret is based primarily on IP law and internal security controls (on employees, contractors, and possibly obfuscating compilation). Disagree?
nice. now everyone can experiment the pain of running s3+pgsql+a huge blob of binary using container ram ! /rant - it's an awesome piece of software regardless.
General PSA: Any open source startup touting their AGPL license is delusional. They will never be used at Google, Meta, or similar because of specific legal directives to avoid these kinds of licenses.
Writing in a single thread removes a whole host of problems in understanding (and implementing) how data changes over time. (And a busy MVCC sql db spends 75% of its time doing coordination, not actual writes, so a single thread applying a queue of transactions in sequence can be faster than your gut feeling might tell you.)
Transactions as first-class entities of the system means you can easily add meta-data for every change in the system that explains who and why the change happened, so you'll never again have to wonder "hmm, why does that column have that value, and how did it happen". Once you get used to this, doing UPDATE in SQL feels pretty weird, as the default mode of operation of your _business data_ is to delete data, with no trace of who and why!
Having the value of the entire database at a point in time available to your business logic as a (lazy) immutable value you can run queries on opens up completely new ways of writing code, and lets your database follow "functional core, imperative shell". Someone needs to have the working set of your database in memory, why shouldn't it be your app server and business logic?
Looking forward to see what this does for the adoption of Datomic!