Wow, very cool indeed. We worked on an open-source community[1] tool using Datomic/Codeq[2], and the ops portion was/has been pretty unpleasant. I'd love to see Datomic-as-a-Service popup, even as a only-meant-for-starter-projects, just so that experimenting with it could be that much easier. That might be possible with the new license, depending on how the Cognitect team sees it.
I've been working with Datomic, as a developer, in a production environment for the last 4-5 months. In terms of the work done by your ops guys, I'd say it really depends on how you're using it. Datomic has it's own AMI, which can be a hassle if you're used to rolling your own (which we ended up doing). I'd say one of the biggest challenges to overcome on the ops side is dealing with transactor failures. If a transactor dies, then the data from the peers will likely just fall on the floor. If you use DDB as a store, you have to keep an eye on your provisioning and scale appropriately.
Just to clarify, I don't know what problems sgrove ran into, I've asked him in IRC and will elaborate if he tells me, but at present, I dunno.
So here's what I have encountered:
Datomic's primary limitation is at the transactor level. Storage is a non-issue, especially if you're using DynamoDB. Even with PostgreSQL write-throughput was ridic because of the way they're using the database.
Transactor fail-over I'm not sure I'd take too seriously, but I don't think they'll drop any writes on the floor either.
You shouldn't attempt to run a datomic transactor on a cheapo VPS. Try to put together at least 4 gb of RAM. Which is the same as what's currently standard for consumer laptops. No excuses.
Deploying Datomic itself is straight-forward. The config is super simple and all you need as a dependency is Java. I'm actually a little puzzled that people want "Datomic as a Service". Datomic and PostgreSQL were the simplest parts of my stack to deploy.
You want beefy peers as well so as to make maximum use of the peer limitations on the license. This isn't as big of a deal as people think because you're not structuring your app the way Rails and Django apps do on Heroku. You don't do 1 beefy database + 5,000 terrible VPSes. Not only is that a management overhead (automation or not), but you're better off vertically scaling a couple peers first before running a shitload of dumb VPSes running your service.
The running assumption is that you should use DynamoDB, but PostgreSQL was seriously efficient for our purposes.
Indexing adds write overhead, but given the current limitations on what you can change in the schema online (nothing), you want to plan ahead here.
If you fail to anticipate something in your schema, you'll have to use a migration toolkit such as I plan to release soon.
The componentized nature of Datomic makes vertical scaling fairly pleasant as you can more precisely target what you're upgrading/improving.
You probably want to include a pass-through querying API in your peers accessing Datomic so that you don't have to run the REST service. This'll enable access to the data from a long-tail of servers that don't necessarily need to be that close to the data and are performing dumb queries. This solves the "but my webapp needs 100 crappy VPSes to perform!" problem.
Any kind of heavy-duty aggregation/analytical workloads against a Datomic dataset should be performed on a big peer with memcached. Use roll-ups for pete's sake! You can use a bi-temporal timeline with analytics data but I haven't fully explored the implications beyond Datomic making querying along that dimension nicer.
The Doc Brown array store presented at the Clojure meetup last night is a very interesting option for people doing multi-dimensional slicing of analytics data.
I'd personally use Datomic as a stand-in for any SQL database doing an OLTP workload, where previously I would've used PostgreSQL. Especially if history or reproducible results are important. That having been said, it appears to be adaptable to workloads I wouldn't have expected (analytics).
I'm interested in what and how quick the development/testing cycle is. How quickly can you spin up, populate, and destroy a test instance for integration testing? How do you iterate on feature development against it?
Super fast, better than any SQL database or MongoDB.
Datomic (free and pro) has an "in memory" version that I used when developing my Clojure application. Makes firing off test-cases super simple and nice.
I used fixtures (edn) for the schema and for example data as well.
I would reload/dump/interact with the in-memory database in my REPL directly. Quite nice.
You can run a test instance (free/dev, which uses H2), that's no big deal to setup either, same as running a local MongoDB or PostgreSQL instance. Ease-of-configuration is on par with MongoDB but with sane defaults. It "just works".
I use environment variables for configuration, so the default Datomic database is just an in-memory instance, but I can point it to a localhost free instance easily.
Whether I use a free instance or an in-memory instance depends on whether or not I need the full transaction log. Typically no, but the migration toolkit I'm working on requires a persistent database so the defaults there are different for testing/development.
I'm seriously in love with a database that has a embedded and "real world" modes of operation with parity in their functionality (except for the log).
This is great. Having a free Datomic version that matches Datomic Pro's architecture and supports all of its external storage engines certainly simplifies the scale path and makes design decisions easier.
Can someone clarify for me. If data is never deleted and updated, one can build a very nice architecture, rich caching clients and all that, which is what they've done. I like that.
What happens in these 2 cases (and sorry for repeating myself I already mentioned them below).
1) Runaway data generator. Someone either messed up a test or confused the units on a timer and now you are logging thousands of times the rate you expected. All this ends in your database. Does just adding a deletion record for each on of those "fix" the problem, but if it is immutable doesn't it still suck up the storage.
2) Sensitive data. Someone somehow shoved plaintext passwords, social security numbers, ICBM launch codes in the database. What do you do in that case.
Note that even in this case, you can remember that you decided to forget, as a key premise of Datomic is that you always know how and why you came to record (or forget) a fact.
Alright, that makes sense. Now it is interesting how the fact that you decided to forget would work. I guess if one always accesses the data via the official API and at the latest known state, that will work. But presumably the data will still be in stored in binary form in the back-end.
Another perhaps related question, does one have the option to "travel in time". Say I record that I forgot my mistakenly added passwords. But an attacker knows when that happens. Can they go back and inspect the data state right before the point when I forgot the data?
What you're describing is plain ol retraction. If you retract your credit card number, then someone can just surf back in time to grab it. That's why you'd excise it.
Companies really need to do a better job of making ELI5 descriptions. This means nothing to me:
> Datomic is a database of flexible, time-based facts, supporting queries and joins, with elastic scalability, and ACID transactions.
> Datomic can leverage highly-available, distributed storage services, and puts declarative power into the hands of application developers.
What does that even mean? I'm pretty good at databases, and I know what transactions and queries and joins and stuff, but this explanation does a pretty poor job of explaining what Datatomic is and why I should use it.
What's a "time-based fact"? What does putting "declarative power in the hands of application developers" mean? Don't all databases support queries? Don't most relational databases support joins and transactions?
It just seems like a buzz-word filled fluff phrase that says "Datatomic is a database," but I don't know why I would use it over, say, MySQL or Mongo or whatever.
>Datomic is built on immutable data; facts are never forgotten or overwritten. This means complete auditability is built in from the start - including the transactions that modify your database. And because Datomic is built on immutable data, you can explore possible future scenarios by issuing transactions against a database and decide to commit them only after verifying the results.
Ok, that makes a little more sense, but it's still not clear what differentiates this from other database systems.
I'm not affiliated with them at all, but I think SiftScience.com does a great job of explaining their product simply and efficiently: https://siftscience.com/
> Fight Fraud with Machine Learning
> Sift Science monitors your site's traffic in real time and alerts you instantly to fraudulent activity.
Simply detects fraud. Done.
Or EasyPost.com:
> Shipping for developers
> EasyPost allows you to integrate shipping APIs into any application in minutes.
Don't just spit buzz words and technical terms at me. Tell me what it does.
> Datomic is a database that, among other things, specializes in tracking data over time, allowing you to test a transaction before saving the data. True accountability from the beginning!
Probably not even accurate (again, I don't understand Datomic's premise), but that sounds more explanatory than their current buzz word-filled blurb.
To any startups watching: have something on your front page that is an ELI5 description of your product.
Generally speaking, five year olds should not be choosing databases for anything critical. But I agree that Datomic and company can do better with their marketing and general product description.
Here is one of the better explanations I have seen:
Atomic: Transactions are all-or-nothing. You never see half of a transaction. You either get the world before the transaction, or after it.
Consistent: The before and after worlds are always valid, never corrupt.
Isolated: From the outside, it appears that transactions occur one-after-another, never overlapping in time. In Datomic, this is actually the case: Transactions are fully serialized.
Durability: If a transaction succeeds, you can be confident that the consistent "after" world is safely on disk.
"Time-based fact" = entity-attribute-value at time. Example: Alex likes nachos at 5 pm.
"Declarative power" = among other things, datalog queries. SQL is also relatively declarative. Datomic has the benefit of declaring queries as data, not as munged strings.
"Facts are never forgotten or overwritten" = as the world changes, add a new record at a new timestamp. Traditional relational dbs modify the old record destroying previous values. The database notion of time (series of transactions) is made explicit and available as part of the query (query the db at a particular point in time or even join databases from different points in time).
UPDATE: Jernau gives a great practical overview in his screencast (https://www.youtube.com/watch?v=ao7xEwCjrWQ) on building a basic app with Datomic, Clojure, and Light Table.
Why would I waste my time reading all of those articles or watching a screencast, if I don't have any idea what Datomic is? In order for me to be interested enough to click any of those links, I would have to think Datomic is a) interesting, b) relevant to me, or c) worth my time. The current website doesn't accomplish any of those objectives. That's what i'm getting at.
I'm sure it's a wonderful app, but its first-impression presentation doesn't tell me what it is I'm looking at. And as such, I'm just going to click away and find something else.
In theory sounds good but this is just not feasible.
Couple of scenarios:
1) A runaway data generator. Somehow you end up confusing milliseconds with seconds and now you start logging 1000x more data than you intended, you leave for the weekend and find your storage is full of junk
2) Sensitive data. Felix has just inserted all the passwords and credit card numbers in the database.
Do you assume by default that people can't anticipate obvious things like this in general or only when it's talented programmers working on a database?
> or only when it's talented programmers working on a database?
I'll pretend I don't know who Hickey is and just say that I expect a lot of stupid decisions can be made when coming up with "a completely new way of looking or storing data". Some databases (cough...cough...starts with M) shipped their database product with unacknowledged writes as the default, oops. Everyone can make a database.
Now let's dig deeper. The reason I asked what I asked it because given the architecture of Datomic and the claims made, that corner case (excision) would not be trivial or and would have to be implemented as a hack on the side. That is why I asked. In some databases it is easier to do, in that architecture it is hard.
> or only when it's talented programmers working on a database?
Who are these talented programmers? Are you a talented programmer?
what's really worrying is that in all this time I've never seen a convincing benchmark, to motivate diving 'head first' into Datomic. For all the hate that the MongoDB guys get, they alteast piqued interest with some arresting charts and figures.
Right now, the impression I get is, "ooh, because Rich Hickey..."
I'm self-taught at Relational Database Design. I spent two years worth of about all of my free time learning, and creating, large normalized SQL schemas. One of the things that most struck me was the capacity to get virtually all the advantages of a relational database without the baggage and rigidity. Add to that the fact that you have a distributed database with almost none of the baggage usually associated with such was another big thing for me.
Granted I'm coming from a do-it-myself green-field project, but the things that Datomic's structure and design opens up for my personal project mean that me, a one man show, can far more easily go toe to toe with the big guys with massive teams to design and deploy their apps and databases.
Potential is what I see, and it's far more than I've seen with just about any other technology that I can think of. I really think Datomic is a game changer, at least for the few awake enough to see it.
Datomic is a very cool project, but it's worth being aware that triple stores generally exhibit significantly worse performance than relational DBs for most query workloads. For non-huge projects they're perfectly capable though, and do offer some flexibility benefits.
Please note that Datomic's architecture is substantially different than most traditional triple stores so I don't think it is useful to extrapolate performance for one from the other.
That's true, and if your workload is such that you're able to predictably query over a relatively small fraction of the database, then I can imagine it working well - it effectively buys you auto-partitioning. If your queries run over a larger subsection of the data, you're still going to run into the fundamental problem of most triple stores: they're join and random i/o heavy. I've not heard that Datomic has anything special going on with regards to this issue, although I may be out of date.
This isn't intended to bag on triple stores - I work on them for a living - or Datomic specifically. Triple stores are very useful and in many ways liberating to work with, but they do present challenges when querying over lots of data.
I'd just eco PureDanger's comment regarding the architecture. I'm no expert on the matter but the concept of having the components of the database taken apart, as they are with Datomic, would give you potential arrangements that you couldn't get with traditional architectures.
I haven't actually used Datomic yet, but what piques my interest isn't necessarily "ooh, Rich Hickey" as much as "ooh, the Clojure/functional approach to data immutability applied to a database". The appeal, to me at least, is in the potential to make my life easier as a product engineer, rather than as a DBA or devops guy.
Very much one of the biggest draws in my opinion, as I've been using Datomic in production for several months. We're a clojure shop and one of the things I've been working on recently is developing a library for parsing/storing/querying all of clojure's data structures into datoms. I hope I get to open source it :)
valuable software isn't always easy and quick to pick up. simple things can be difficult to familiarise yourself with.
learning the value of functional style took ages for me to grok, but it's been one of the biggest values ever.
i find it a little ironic considering one of the first and best talks i've seen from hickey is 'simple made easy'. you'd likely learn a lot from a watch of that, or perhaps more detail about datomic in any of the talks on datomic.
It is a NoSQL database with a novel architecture. It uses schemas, supports transactions, historical queries, and it doesn't use a client server model.
I guess where I'm getting lost is the terminology - I was under the impression that end-users are peers[0], so I'm worried about maxing out at two users. But if the limit is whatever a transactor can handle then I doubt I'd ever hit it and I might as well commit.
I'm looking at using it as part of a Clojure backend, if that matters.
"A Peer is a process that manipulates a database using the Datomic Peer library. Any process can be a Peer - from Web server processes that host sites or services, to a daemon, GUI application or command-line tool. The Datomic-specific application code you write runs in your Peer(s)."
For anybody wondering what clicked: I'd have Datomic running with one peer, and that peer would serve as many end-users as it could handle. If necessary I could add a second peer and double the read-query power. And that's more than enough for my little web-app.
[1] The service is a frontend to Codeq, having imported many Clojure repos: http://www.jidaexplorer.com/
[2] http://blog.datomic.com/2012/10/codeq.html
EDIT: Added links.