Datomic: Can Simple be also Fast?

ryanbrush · on Sept 22, 2013

Whether through Datomic or something else, it seems like some version of "database-as-a-value" (as described by Rich Hickey) is bound to happen. We're keeping immutable version histories of as much as our hardware and systems will allow, from source code management through other types of media. Why in the world wouldn't we do this with our data if we could?

It'll be interesting to see how this plays out. Widespread adoption of technology tends to follow a path of least resistance. Does Datomic offer a simple enough path to pull many people over? I also know many of us have a preference for building on open source systems; will this be an obstacle for Datomic?

t0mas88 · on Sept 22, 2013

Take a look at Event Sourcing, commonly used with CQRS. The approach is indeed to store data as change-events in an event-store (which could by a SQL database or NoSQL store, that depends on your needs) and then built one or more views (separate databases, Lucene indexes, files, anything really) to service the type of requests that the application handles.

For an example of CQRS with Event Sourcing in Java: http://axonframework.org

strictfp · on Sept 22, 2013

SQL databases also do this. They have events called 'SQL statements' which are written to a 'journal' and applied to the domain which resides in the 'tables' of the database :)

Tuna-Fish · on Sept 22, 2013

You seem to not understand the point. SQL databases don't count because these mechanisms are not visible, and the programming model is changing data in place.

No-one really cares how DB's are implemented under the hood, we just care about the programming model. (And speed, CAP, ACID, etc...)

strictfp · on Sept 22, 2013

I do get the point, in fact I'm building such an app right now, but my statement is still true and I find that quite amusing.

jimbokun · on Sept 23, 2013

Given the underlying architecture of most relational databases, is it realistically possible to do "point in time queries", without requiring playing back the entire log history into a clean database up to the point in time you want? That's not going to be very efficient.

I've only seen SQL logs presented as solutions to replication and disaster recovery, not point in time functionality.

What I'm getting at, is it even possible in principle to expose part of the inner workings of some relational database systems to get this kind of capability?

strictfp · on Sept 23, 2013

Snapshotting and log replay should be fairly straightforward, if you save all sql statements in their entirety. Snapshots would stop the world, which might pose a problem, but I presume this is no different to other event sourcing solutions. Given the concurrency of a db, I'm also not sure that there's a strict ordering of all incoming statements.

Edit: Seems like at least Oracle is already doing this: http://en.wikipedia.org/wiki/Redo_log

twic · on Sept 23, 2013

Not only is this possible, Oracle has had exactly that feature since 2002:

http://docs.oracle.com/cd/E11882_01/appdev.112/e17125/adfns_...

dm3 · on Sept 22, 2013

There is also http://geteventstore.com/ - an immutable event database by one of the main CQRS aficionados - Greg Young. BTW, clustered HA setup has been opensourced very recently.

Really solid and fun to work with.

strictfp · on Sept 22, 2013

And there is also the old-school http://prevayler.org/ :)

joevandyk · on Sept 22, 2013

One database pattern I use is to have new data/events be inserted into tables. Then use either window functions, views, and/or triggers to get a snapshot of the most current version of the data. It takes more space, but it allows me to figure out why the database is in the state that it is.

For example, if I want to track ups shipments, I'd set up a shipment_events table. Every time I check the status of a shipment, I'd insert into the shipment_events table. If the shipment changes state from in_transit to delivered, then I'd set shipments.state="delivered".

philjackson · on Sept 22, 2013

Never Trust Sheep - isn't it the sheep that spot the bugs, scaling issues in real-world scenarios, indirectly help get the funding for the parent company so that they can scale teams and isn't it sheep that write tools, clients and community documentation?

rgbrgb · on Sept 22, 2013

If you're talking about enterprise companies--the type of company where suits often make engineering decisions--they seem to rarely release any of their "tools, clients and community documentation".

philjackson · on Sept 22, 2013

I'm talking about companies like mongodb, what was mysql and postgres - the ones with the large communities (sheep).

dvanduzer · on Sept 22, 2013

Not all flocks are made of sheep.

sliverstorm · on Sept 23, 2013

Ah, you fancy yourselves as flocks of wolves?

dvanduzer · on Sept 23, 2013

I don't like Mongo or MySQL. That doesn't mean other people don't have good reasons to use those platforms.

est · on Sept 22, 2013

I am always interested in Datomic's approach. How can you build a simple thread-safe counter with it?

If you do it in a time-series db fashion, there will soon be too many facts to count.

vmind · on Sept 22, 2013

You can define functions inside the database and then use them inside a transaction [1] so that you can get atomic update, if that's the question. You can also mark attributes as noHistory [2] if you don't care about the past state.

[1] http://docs.datomic.com/database-functions.html

[2] http://docs.datomic.com/schema.html

sethev · on Sept 22, 2013

From the FAQ (http://www.datomic.com/faq.html):

When is Datomic not a good fit? Datomic is not a good fit if you need unlimited write scalability, or have data with a high update churn rate (e.g. counters). And, at present, Datomic does not have support for BLOBs.

wheaties · on Sept 22, 2013

Isn't HornetQ the queue that allows one "ack" to speak for the past five messages. That is, if I were to lose one of the past five messages I sent, I would never know because the system would consider the message delivered. To me, that is a severe flaw and could open you up to a nightmare of debugging if used improperly. Either go full acks or go ack-less.

leif · on Sept 22, 2013

Sounds like mongodb's batched write semantics.

twic · on Sept 23, 2013

I haven't heard of any such problem with HornetQ. If this was unavoidable behaviour, then it would be impossible to correctly implement the JMS specification, i think. I fear you may be mistaken, although i would be very interested to hear more about this if not.