Rich Hickey's new project: datomic.com

jamii · on March 5, 2012

"Datomic is not an update-in-place system. All data is retained by default."

I'm becoming more and more convinced that your canonical data store should be append-only whenever possible (see eg [1][2] for detailed arguments). It's nice to see first class support for this.

[1] http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

[2] http://martinfowler.com/articles/lmax.html

EDIT: Just read through the whitepaper. Looks like the indexes / storage engine form an MVCC (http://en.wikipedia.org/wiki/Multiversion_concurrency_contro...) key-value store, similar to Clojure's STM. Peers cache data and run datalog queries locally.

This could be either an available or consistent system, depending on how cache invalidation in peers works. In the available, eventually-consistent case you have the added benefit that all queries see a consistent snapshot of the system, even if that snapshot is not totally current.

Like most of Hickey's work, the whole thing seems really obvious in hindsight. It also bears a lot of similarity to Nathan Marz' recommendations for data processing and schema design.

lurker17 · on March 5, 2012

It's rather standard already: https://en.wikipedia.org/wiki/Slowly_changing_dimension

breckinloggins · on March 5, 2012

It looks like a very cool product/service, but there's something... off... about this landing page. I can't quite put my finger on it. Two things I can think of right off the bat:

1. The use of the term "whitepaper". It's very "enterprisey"

2. It took me a bit of perusing to figure out what the product IS. I think the lead paragraph may need some tweaking

In all, the landing page makes the product feel intimidating. Contrast to Parse's landing page (https://www.parse.com/) where it feels like I'm free to jump right in and tinker with it, but I also get the impression that it will scale up if I need it to. (Yes, I know the two services aren't offering the same thing).

sramsay · on March 5, 2012

I had the same reaction. I think Rich Hickey is a genius, and I'm an absolute Clojure nut. I have every confidence that this is probably a very cool thing.

But judging from that opening description, I gather that Datomic puts the data and analysis in the same application. As a description of what the thing IS, it's about as informative as saying, "This new language allows you to take control of your computer by allowing you to give it coded instructions!" or "Our storage solution allows you to persistently store data!"

Scriptor · on March 5, 2012

I agree with you on the landing page. The introductory paragraph seems rather "fluffy". That combined with the fact that it uses a whitepaper immediately gave me the feeling that it's not really meant as something for regular programmers to check out and hack with. It's surprising, since that's how many Clojure programmers get their start.

On the other hand, it's very new so maybe they'll add more developer-friendly pages soon. Or maybe it's only meant for "enterprise" environments? Time will tell.

nickik · on March 5, 2012

Maybe its so new and unique that its hard or impossible to explain it in a pragraph.

rjn945 · on March 5, 2012

I think this has a lot to do with it. After an hour of reading, watching and thinking, I can't come up with any way to put it into one paragraph.

Here's the shortest what and why I could come up with:

Questioning Assumptions

Many relational databases today operate based on assumptions that were true in the 1970s but are no longer true. Newer solutions such as key-value stores ("NoSQL") make unnecessary compromises in the ability to perform queries or make consistency guarantees. Datomic reconsiders the database in light of current computer set-ups: millions of times larger and faster disks and RAM, and distributed architectures connected over the internet.

Data Model

Instead of using table-based storage with explicit schemas, Datomic uses a simpler model wherein the database is made up of a large collection of "datoms" or facts. Each datom has 4 parts: an entity, an attribute, a value, and a time (denoted by the transaction number that added it the database). Example:

  John, :street, "23 Swift St.", T27

This simple data model has two main benefits. It makes your data less rigid and hence more agile and easier to change. Additionally, it makes it easy to handle data in non-traditional structures, such as hierarchies, sets or sparse tables. It also enables Datomic's time model...

Time

Like Clojure, Datomic incorporates an explicit model of time. All data is associated with a time and new data does not replace old data, but is added to it. Returning to our previous example, if John later changes his address, a new datom would be added to the database, e.g.

  John, :street, "17 Maple St.", T43

This mirrors the real world where the fact that John has moved does not erase the fact that John once lived on Swift St. This has multiple benefits: the ability to view the database at a point in time other than the present; no data is lost; the immutability of each datom allows for easy and pervasive caching.

Move Data and Data Processing to Peers

Traditionally databases use a client-server model where clients send queries and commands to a central database. This database holds all the data, performs all data processing, and manages the data storage and synchronization. Clients may only to access the data through the interface the server provides - typically SQL strings which may include a (relatively small) set of functions provided by the database.

Datomic breaks this system apart. The only centralized component is data storage. Peers access the data storage through a new distributed component called a transactor. Finally, the most important part, data processing, now happens in the clients, which, considering their importance, have been renamed "peers".

Queries are made in a declarative language called Datalog which is similar to but better than SQL. It's better because it more closely matches the model of the data itself (rather than thinking in terms of the implementation of tables in a database). Additionally, it's not restricted like SQL. It allows you to use your full programming language. You can write reusable rules that can then be composed in queries. Additionally, you can call any of your own functions. This is a big step up in power and it's made practical because of the distribution. If ran your query on central server, you'd have to be concerned about tying up a scare resource with a long-running query. When processing locally, that's not a concern.

When a query is performed that data is loaded from central storage and placed into RAM (if it will fit). Later queries can use this locally cached data for fast queries.

----

That's definitely not all it does or all the benefits, but hopefully that's a good start.

programnature · on March 5, 2012

I would add the following

*Transactions as first-class entities

Transactions are just data like everything else, and can add facts about them like anything else. For example, who created the transaction. What did the database look like before and after transaction.

Additionally, you can subscribe to the queue of transactions, if you wanted to watch for and react to events of a certain nature. This very difficult in most other systems.

anamax · on March 5, 2012

> a time (denoted by the transaction number that added it the database).

Do transaction numbers have total order or just partial order? Total order is serializing. (And no, using real time as the transaction number doesn't help because it's impossible to keep an interesting number of servers time-synched.) Partial order is "interesting".

programnature · on March 5, 2012

It is totally ordered.

The transactor is a single point of failure.

However, since its only job is doing the transactions, the idea is it can be faster than a database server that does both the transactions and the queries.

snprbob86 · on March 6, 2012

Hmm... presumably an application can act in read-only mode in the absence of a transactor. That's an interesting thought :-)

weavejester · on March 6, 2012

The transactor is only used for writes, so if the transactor went down, you could still run queries.

alkby · on March 7, 2012

I think their statement about ACID is too bold.

How does somebody do read-"modify" style of transactions ?

Say I want to bump some counter. So I delete old fact and I establish new fact. But new fact needs to be exactly 1 + old value of counter. With transactions as simple "add this and remove that" you seemingly cannot do that. So it's not ACID. Right?

richhickey · on March 7, 2012

Transactions are not limited to add/retract. There are also things we call data functions, which are arbitrary, user-written, expansion functions that are passed the current value of the db (within the transaction) and any arbitrary args (passed in the call), and that emit a list of adds/retracts and/or other data function calls. This result gets spliced in place of the data function call. This expansion continues until the resulting transaction is only asserts/retracts, then gets applied. With this, increments, CAS and much more are possible.

We are still finalizing the API for installing your own data functions. The :db.fn/retractEntity call in the tutorial is an example of a data function. (retractEntity is built-in).

This call:

    [:db.fn/retractEntity entity-id]

must find all the in- and out-bound attributes relating to that entity-id (and does so via a query) and emit retracts for them. You will be able to write data functions of similar power. Sorry for the confusion, more and better docs are coming.

danieljomphe · on March 7, 2012

From what I remember, compare-and-swap semantics are in place for that kind of case.

If that was not the case, you could still model such an order-dependent update as the fact that the counter has seen one more hit. Let the final query reduce that to the final count, and let the local cache implementation optimize that cost away for all but the first query, and then incrementally optimize the further queries when they are to see an increased count.

That said, I'm pretty sure I've seen the simpler CAS semantics support. (The CAS-successful update, if CAS is really supported, is still implemented as an "upsert", which means old counter values remain accessible if you query the past of the DB.)

danieljomphe · on March 8, 2012

Forget my last paragraph. Anyways, richhickey answered. :)

anamax · on March 6, 2012

> However, since [the transactor's] only job is doing the transactions

Huh? How is that consistent with:

> access the data storage through a new distributed component called a transactor.

If "doing the transactions" consists of more than passing out incrementing transaction tokens, won't the transactor be a bottleneck?

rjn945 · on March 6, 2012

Yeah, it looks like I got that part wrong. (I intentionally skimmed over the transactor, because I was avoiding "how" issues and because my understanding of it wasn't that clear.)

The transactor is involved in just writes, not reads. (So that helps.) It's not distributed and cannot be distributed, in this system, because it ensures consistency, so yes, it is potentially a bottleneck. In blog comments by Rich Hickey[1], he states:

"Writes can’t be distributed, and that is one of the many tradeoffs that preclude the possibility of any universal data solution. The idea is that, by stripping out all the other work normally done by the server (queries, reads, locking, disk sync), many workloads will be supported by this configuration. We don’t target the highest write volumes, as those workloads require different tradeoffs."

Presumably, 1) the creators of Datomic think that performance can be good enough to be useful, 2) this is a new model that probably requires testing to prove is practical.

[1] Multiple people have linked to it, but for convenience: http://blog.fogus.me/2012/03/05/datomic/comment-page-1/#comm...

sausagefeet · on March 6, 2012

Isn't this the same compromise we would already have had to make if we just used postgres?

weavejester · on March 6, 2012

It's actually slightly better than a SQL database. If your master SQL database gets fried, there's a chance you could lose some data. Datomic's transactor only handles atomicity, not writes, so if the transactor dies, nothing written to the database will be lost.

fogus · on March 5, 2012

Datomic is a UFO filled with advanced alien technology that has landed right on the National Mall.

nickik · on March 5, 2012

Probebly an UFO from the Land of Lisp.

alkby · on March 5, 2012

Any access to source? Looks nice but I have a bunch of questions about transactor.

vdm · on March 5, 2012

This 'Architecture' page summarises it best.

http://datomic.com/company/resources/architecture

Zak · on March 5, 2012

The use of the term "whitepaper". It's very "enterprisey"

As a tangent, I'm really curious as to why this document is in a PDF file instead of simply being a web page. I can't see that doing much other than making it less convenient to read.

spullara · on March 5, 2012

If I got one thing out of this article, it is your comment and link to Parse. That looks pretty nice! The landing page for datomic is horrible and I didn't make it past the small, dense text.

jasonkolb · on March 5, 2012

I have to admit I'm a little confused about what this is. I'm taking a coffee break and not really into reading a whitepaper, so take that with a grain of salt, but I'd call that a landing page failure.

That said, it sounds like a database-as-a-service? If so, is the primary benefit the reduced database management load? Or is there some special sauce in here that makes it more capable than other RDMS or NoSQL databases?

JulianMorrison · on March 6, 2012

It's a remote distributed MVCC data store with a total ordering of writes maintained by a transaction service, nonblocking reads bypassing the writer, and a local data analysis system that reads and caches the relevant facts from the remote data service. Roughly.

nickik · on March 5, 2012

Its something that is quite unique and and new not a new K/V-Store, it will take time to really understand this. I watched the Video and things become clearer but I wouldn't know how to discribe it in a paragraph.

The "special sauce" is that much of the work is done localy (in memory), you can use very powerful datamanipulation, and its ACID. That my understanding so far.

lukev · on March 5, 2012

It's a database, quite different from any that are currently out there (you should read the white paper). Although one of the initial pricing models is to sell it as a cloud based service, that's incidental to the technology and why it's cool.

arscan · on March 5, 2012

There's a video on the home page that gives a decent overview and speaks to that "special sauce" that makes it different than relational / nosql dbs. That's probably a good place to start if you aren't in the whitepaper reading mood.

puredanger · on March 5, 2012

Rich will be discussing Datomic in his keynote at Clojure/West next week in San Jose (Friday March 16th). Schedule: http://clojurewest.org/schedule

Tickets for the conference are available, including Friday-only tickets for $250. Friday will include Rich's keynote and a keynote by Richard Gabriel as well as lots of other Clojure-y goodness. http://regonline.com/clojurewest2012

gfodor · on March 5, 2012

One question I have is the cold start problem. How can I ensure dropping in a new peer is not going to have a large negative effect on response times? With memcache, you can just prewarm a new node or have clients only round-robin it a few times per request to warm it up. It seems like pre-warming here is going to be more cumbersome since it's not a simple k-v store but will require you to pre-emptively run queries to get there. (Similar to Lucene.)

Edit: Rich's response here:

http://blog.fogus.me/2012/03/05/datomic/comment-page-1/#comm...

Seems to imply that non-cached performance won't be so bad anyway. Looking forward to seeing some benchmarks.

fogus · on March 5, 2012

Stuart Halloway provides more information in his Datalog querying in Datomic screencast: http://www.youtube.com/watch?feature=player_embedded&v=b...

nickik · on March 5, 2012

This is really cool. Highly recommended. You see some querycode in clojure and in java.

omaranto · on March 5, 2012

I'd say all the query code is written in Clojure, but if you insist on using Java, you can put the queries inside Java strings.

nickik · on March 6, 2012

I would rather say the syntax of the DSL is the same as the Clojure Datastructures witch makes its easier to work with in Clojure.

Confusion · on March 5, 2012

Won't this just be another leaky abstraction[1] in which the remoteness of the data will be impossible to ignore[2]? I like the idea of a transparant local LRU 'query'-cache for a remote database[3], but I fear Hibernate-like (or Haskell-like) problems in locating performance bottlenecks.

[1] http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...

[2] "A Note on Distributed Computing" (http://labs.oracle.com/techrep/1994/smli_tr-94-29.pdf)

[3] Please correct me if that synopsis is wrong

politician · on March 6, 2012

The product seems to share characteristics with triplestores and the Sparql query language and append-only persistence mechanism from the Linked Data sphere/movement. Could someone more knowledgeable comment on this similarity?

Some differences: 1. No concept of inference/reasoning 2. No mention of a graph 3. Interesting use of clientside caching / data-peering 4. Clojure serialization vs N3/Turtle/RDF

Some similarities: 1. Quadstores have are parameterized by graph, Datomic by time 2. subject-predicate-object model 3. query-anything ( including [ ?s ?p ?o] ??) 4. query anywhere (sending an rdf to a client for local query seems similar)

edit- I give up trying to get HN to render an ordered list. Any help would be... helpful.

brianm · on March 5, 2012

If I read correctly, it is pretty expensive. $0.10 / connection (peer) / hour, plus dynamodb and transactor instance charges. For 100 clients, and not including the dynamodb or transactor instance(s), this makes it a hair more per year then a quad core oracle instance.

mapleoin · on March 5, 2012

You have to factor in the DBAs that come with that Oracle db, too.

weavejester · on March 6, 2012

The $0.10 price tag is for the transactor, which just ensures that writes are atomic (but doesn't do any writing itself). I guess we'll need to do some benchmarking, but I suspect that $0.10 per peer per hour won't be as expensive as it first seems.

DanWaterworth · on March 6, 2012

This is pretty cool, it's very similar to a project I'm working on: Siege, a DBMS written in Haskell [1]. Siege uses roughly the same approach; I didn't know anyone else was working on a distributed immutable DBMS, so this is really exciting.

[1] https://github.com/DanielWaterworth/siege

gtani · on March 6, 2012

this was mentioned in /r/cloj

http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-17...

rbarooah · on March 5, 2012

I'd like to know how its model of transaction isolation works given that reads and writes are claimed to be independent.

It seems as though a 'transaction' is defined as an atomic set of updates, but doesn't involve reads.

psykotic · on March 5, 2012

> I'd like to know how its model of transaction isolation works given that reads and writes are claimed to be independent.

Any MVCC-style model allows full concurrency between readers and writers. The bigger problem is managing concurrency between conflicting writers in what amounts to a distributed database system. None of the material on Datomic's website explains how they intend to tackle that issue, which seems especially tricky with their model of distributed peers. All they say is that the Transactor is responsible for globally linearizing transactions and that this is better than existing models. However, if there is a genuine conflict, the loose coupling among peers seems to make the problem much worse than existing models, not better.

I'd love to know more details.

jamii · on March 5, 2012

The FAQ says that writes favour consistency over availability, so I guess that means synchronous calls to the transactor.

olivergeorge · on March 6, 2012

Some kind of compare-and-set! operator which occurs at the transactor perhaps.

Update:

1. you can do synchronous transactions.

http://datomic.com/docs/javadoc/datomic/Connection.html#tran...

2. transactions can include data functions.

"The database can be extended with data functions that expand into other data functions, or eventually bottom out as assertions and retractions. A set of assertions/ retractions/functions, represented as data structures, is sent to the transactor as a transaction, and either succeeds or fails all together, as one would expect."

puredanger · on March 5, 2012

When data is immutable, append-only, and tagged by timestamp, there is no conflict. Rather, there are facts on the same entity that are asserted at different times. In this case where changes come in at two times (which are subject to all the raciness of the real world that exists regardless), one will win.

psykotic · on March 5, 2012

> In this case where changes come in at two times (which are subject to all the raciness of the real world that exists regardless), one will win.

That describes many methods of optimistic concurrency control, but it doesn't answer my question of how this supposed to work in practice with high write contention, the higher latency of a distributed peer model, the long-running transactions the video mentions (or maybe that remark only applied to long-running queries), etc. My point being, if the distributed transaction problem was easily solved by sprinkling on optimistic multi-versioning concurrency control, it would have been solved a long time ago. There must be some special sauce they're not mentioning.

stevelosh · on March 5, 2012

From the FAQ:

    Thus, Datomic is well suited for applications that require write consistency and read scalability.

Seems like they're not focusing on high-write situations.

lukev · on March 5, 2012

Correct, it's not write-scalable in the same way it is read-scalable. The transactor is a bottleneck for writes.

However, that doesn't mean it has slow writes - it should still do writes at least on a par with any traditional transactional database, and probably a good deal faster since it's append-only.

stevelosh · on March 5, 2012

I'm more concerned with what happens when the transactor goes down, or gets silently partitioned from some of the clients. I assume reads will continue to work but all writes will break?

I'd also like to know more about how the app-side caching works. If I've got a terabyte of User records and want to query for all users of a certain type, does a terabyte of data get sent over the wire, cached, and queried locally? Only the fields I ask for? Something else?

lukev · on March 5, 2012

1. You're correct, however, the architecture does allow you to run a hot backup for fast failover.

2. The database is oriented around 'datoms', which are an entity/attribute/value/time. Each of these has its own (hierarchical) indexes, so you only end up pulling the index segments you need to fulfill a given query. You'd only pull 1TB if your query actually encompassed all the data you had.

edwardw · on March 5, 2012

I noticed that in FAQ, too. Since read is relatively easy to scale (simple master / slave setup), I wonder how to scale datomic on write side.

spitfire · on March 5, 2012

Okay, I don't quite get this. The processing gets moved to the client. But what if the dataset involved is too large for the client to hold?

edwardw · on March 5, 2012

Get from server through network, of course. In the meantime, peers cache "facts" using LRU replacement policy.

swalsh · on March 5, 2012

So lets suppose I have several billion integers sitting in a data store, and I want to sort, count, and sum them. Do I have to collect all this data to my local cache first? What if millions of people are using my application who want the same value?

lukev · on March 5, 2012

Remember, the 'peer' doesn't have to be embedded in your front-edge application (even though that's one use case). You could have a single 'peer' which sits on it's own beefy server expressly for this kind of calculation.

sausagefeet · on March 6, 2012

But ten what is then what us the benefit over just using Postgres there? And tossing a transaction ID sequence number onto your rows?

snprbob86 · on March 6, 2012

It's about writing software which is simpler and more flexible.

If you write a traditional shared-nothing web app client with a traditional bag-o-sprocs database server, you'll probably be good as long as your workload doesn't change much. Assuming your write volume never exceeds that of a single (as beefy as necessary) server (which seems to be working out so far for Hacker News!) then you're ok.

However, products/services evolve and requirements change. Let's assume, for example, that you want to do some heavy duty number crunching. This number crunching involves some critical business logic calculations. Some of those calculations are in sprocs, but some of them are in your application code's native language. How do you offload that work to another server? You may have to juggle logic across that sproc/app boundary back and forth. It's pretty rigid; change is hard.

You can think of Datomic as a way of eliminating your sproc language and moving the query engine, indexes, and data itself into your process. Basically, you get everything you need to write your own database server. Furthermore, you can write specialized database servers for specialized needs... as long as you agree to allow a single Transactor service to coordinate your writes.

Back to the big number crunching. You've got the my-awesome-app process chugging along & you don't want to slow it down with your number crunching, so you spin up a my-awesome-cruncher peer & the data gets loaded up over there. Now you have the full power of your database engine in (super fast) memory and you can take your database-client-side business logic with you!

Now let's say you're finding that you're spending a lot of CPU time doing HTML templating and other web-appy like things. Well, you can trivially make additional my-awesome-app peers to share the work load.

You can do all this from a very simple start: One process on one machine. Everything in memory. Plain-old-java-objects treated the same as datums. No data modeling impedance mismatch. No network latency to think about. You can punt on a lot of very hard problems without sacrificing any escape hatches. You get audit trails and recovery mechanisms virtually for free.

Again, all this assumes the write-serialization trade offs are acceptable. Considering the prevalence and success of single-master architectures in the wild, that's not a hugely unacceptable tradeoff. Furthermore, the append-only model may enable even higher write speeds than something like Postgres' more traditional approach.

I hope my rant is helpful :-)

swalsh · on March 5, 2012

Thanks! This thing is actually a bit of genius.

rjn945 · on March 5, 2012

Yeah, I had a similar concern but it's just a confusion from thinking that every peer has to be equally powerful (i.e. all weak) and has to run on the end-user's computer.

If you performed this type of calculation before with a traditional database, you had to have a powerful enough to computer to perform the calculation. In this model, you would still have that computer; it's just now a "peer".

If millions of people want the same piece of data that requires a huge calculation to get, then you would set up one powerful machine of your own just to do this calculation and then write the result to the database, so the many "thin" peers can just read the result.

spitfire · on March 5, 2012

So I need some sort of storage area network of clients. Okay, so I'm chopping up my database server somewhat, Making it easier to scale horizontally. I can live with this, but I wish it were stated explicitly.

The SaaS model they're offering won't work for the sorts of things I'm interested in.

nickik · on March 5, 2012

Nice little find in the comments of a blog. Rich Hickey himself speakes about some of the things people probebly care about http://blog.fogus.me/2012/03/05/datomic/.

nuttendorfer · on March 5, 2012

Can't see content on any of the pages in Opera.

gebe · on March 5, 2012

Same here, it's a bit messed up in IE9 as well.

vetler · on March 5, 2012

Yeah, me neither. Pretty strange.

rml · on March 5, 2012

I'm also unable to read in Opera. Though I don't have JS enabled by default.

dpritchett · on March 5, 2012

Neat to see that there's a VM appliance available on launch day. Downloading that now, gonna give it a spin!

JulianMorrison · on March 6, 2012

This strikes me as yet another NoSQL with a niche in which it will be great. In this case, it's good for a read heavy application with minimal writing, where its working set is a small subset of the total data set and you care a lot about write consistency. It would fail in a smoking heap under heavy write load (single global lock, and the need to push every write to every client cache). It would blow the cache if you tried to do a range scan.

nickik · on March 6, 2012

The Idea is that the transactor does a very small amount of work an does can scale much better then other "single point bottelneck". The problem is still there but smaller that way.

Read the comment here, it gives some information on the problem your discribing: http://blog.fogus.me/2012/03/05/datomic/

mark_l_watson · on March 5, 2012

Reading through the site reminds me of append-only CouchDB (or even better, BigCouch) both Datomic datoms and CouchDB documents have time stamps so the state of data is available for different times.

This looks new: local query peers that cache enough data to perform queries (I don't understand how that works, but it looks like indices might be local, with some data also cached locally).

Also interesting that it seems to use DynamoDB under the hood.

lukev · on March 5, 2012

It can use DynamoDB under the hood, as a data store.

The data storage system is strongly decoupled from the transactor and the peer, so there are a number of options, ranging from a filesystem to S3 to DynamoDB.

danieljomphe · on March 6, 2012

Thus Datomic would be very great for centrally-operated systems, but not so much with highly distributed systems where many peers are often partitioned out because, for example, they have no Internet connectivity for a few days, and they still need to operate within their limited universe.

So if such a highly distributed system was to use Datomic, it would be harder to guarantee that each peer can work both for reads & (local) writes while being partitioned from the transactor. One would need to program the software to log those new facts (writes) locally before submitting (syncing) them to the transactor. And make that durable. Also, one might also need to make the query/read cache durable, since there's no network to fetch it back in case of a reboot of the peer. So it seems there's a missing local middleman/proxy that needs to be implemented to support such scenarios. At least, thanks to Datalog, the local cache would still be able to be used with this log, using db.with(log).

What do you think, is this use case quite simply implementable over/with Datomic, without asking it to do something out of its leagues?

weavejester · on March 6, 2012

I don't believe there's such a thing as local writes in Datomic. All writes appear to go through the transactor to maintain atomicity.

danieljomphe · on March 6, 2012

Right. So the only way to make Peers resilient to network partitions is to install a middleman between them and the DB/Transactor. One whose responsibility is to ensure this Peer's app always has durable access to everything it's ever going to need to be able to read for its queries, and always has durable access to some local write log that doesn't exist in the current implementation.

Thus my question is: is introducing such a middleman in the system going to denaturate Datomic?

weavejester · on March 6, 2012

I don't believe Datomic is designed to operate in a scenario where Peers don't have network connectivity. The local cache Peers keep is to cut down on network traffic and improve performance, not as a reliable "offline mode".

danieljomphe · on March 6, 2012

Notwithstanding what it's initially designed for, I think it may be quite good at supporting an "offline mode" as long as:

1. the app developer can confidently predict which queries the app will need through its lifespan, and

2. the app developer is willing to program and configure a layer that can persist and make durable a cache that spans all the data needed to run those queries (thus, persisting locally what amounts to a dynamic shard of the DB), and

3. the app developer is willing to program a layer that can persist and make durable all writes intended for the Transactor, and synchronize those to the Transactor when the app recuperates from a network partition, and

3.1. the app developer is willing to plan-or resolve-potential conflicts in advance of-or when-eventual conflicts, thus he's willing to sacrifice global consistency in the event of a network partition, in order to obtain availability, and

4. the app developer is willing to plug into the query engine in such a way that queries will include the local write log when there's a network partition.

Now, solution-wise:

1. depends on the requirements but most small to medium apps can predict the queries they'll need;

2. seems to be quite easy for small to medium apps:

2.1 run all possible queries at regular times, and

2.2 use a durable key-value store to keep the db values;

3. (1) make sure you're subscribed to events on partition and recovery; (2) coordinate writes over the same key-value store, probably using Clojure's STM and/or Avout; (3) on network recovery, replay those writes not present in the central DB;

3.1 due to the immutable nature of things and total ordering of the DB transactions, I expect to see no issue regarding eventual consistency when write logs are replayed centrally after a local Peer recovers from a network partition;

4. considering how Datalog works and is integrated into the Peer, this seems like a piece of cake.

So isn't this quite feasible to support the highly distributed case for apps in which each local Peer represents its own logical, dynamic and relatively natural and autonomous shard of the database?

nickik · on March 6, 2012

Seams to be true but the intressting part is that peers can be made parallel and if one datacenter explodes you can go to an other without losing information. The only "Single Point of Failure" is the transactor and only for reads.

weavejester · on March 7, 2012

You mean only for writes :)

_sfvd · on March 5, 2012

From a UX point of view, i didn't realize there was a menu until i scrolled down and your js menu popped in on the top. once that happened I scrolled back up to see where the menu initially was, because why would they do a pop-in menu if there wasnt one initially. ah ha! i see it. my eyes completely looked over it. yes, i realize it's giant but it's also about the same size as an ad banner (which my eyes typically just ignore). also, the colors are quite bland and do not set any type of priority. just some constructive criticism for ya. good luck!

swalsh · on March 5, 2012

Clojure is an amazing language, so i'm willing to go the extra mile to attempt to understand this work. However there's one thing that I can't get over. From my understanding, the big idea is the query engine is brought local, and the storage would eventually come local too. It seems like for smallish db's this is fine. What happens though if you're working with a rather large database?

Additionally, If local means the users client, how is security of the data ensured?

drcode · on March 5, 2012

Not the _entire_ database is moved locally, only enough to efficiently handle the queries of interest. (i.e. queries run against a dynamic local cache of the database)

puredanger · on March 5, 2012

The key is that you can replicate peers and partition at the app level so that each peer in your infrastructure has the working set in memory. Compare with sharding as a partitioning mechanism in the database layer. The nice thing is that the data is cached on demand so the working set can change over time.

Terracotta and other data grid architectures do something similar.

nickik · on March 5, 2012

What does this have to do with clojure? This system could have been build in any language, clojure only uses some of the same ideas (working with values). I cant help you with your question, sorry.

swalsh · on March 5, 2012

From my understanding the same guy wrote both. My point was that I personally view Clojure as a bit of genius.

nickik · on March 5, 2012

Right. Agree.

davidrupp · on March 5, 2012

Local data is immutable (read-only). Also, the Peer cache uses a LRU replacement strategy; older data will drop off as new data comes in to keep most-recent local data in memory.

bilalhusain · on March 6, 2012

As a developer, I find datomic easy to use. The getting started, running examples, tutorial, reference, the in-memory environment, the downloadable appliance - everything is so smooth. Last time I had a similar feeling was when I tried CloudFoundry.

Things should be like this - intuitive, some seed data and kickstart code w/ just enough documentation for when you get stuck.

vdm · on March 5, 2012

Webdevs will be all over this when the Peer runs on Javascript runtimes. Who's taking bets that it's written in Clojurescript?

drcode · on March 5, 2012

I'm pretty sure the whole thing is built on the JVM, but I agree with you that having a peer run inside a js browser app via clojurescript would be a logical next step. (and arguably really useful)

edwardw · on March 5, 2012

From FAQ:

Is Datomic just for JVM languages? At the moment, yes. We have ideas for how to enable Datomic on non-JVM languages while preserving as much of the embedded power as possible.

snprbob86 · on March 6, 2012

That would be freaking awesome.

However, my big concern here would be security. You'd need to be able to supply a predicate for which datums are allowed to be synced to the client.

edwardw · on March 5, 2012

Damn, I misread webdev as webdav.

makepanic · on March 5, 2012

somehow the page is broken in opera. Disabling the content: " ."; in #main resolves the problem.

yvdriess · on March 5, 2012

Datomic reminds me a lot of the tagged-token dataflow architectures of the day. Really cool.

gtani · on March 5, 2012

Very interesting. It occurs to me that clojure will be widely adopted without a killer app, but a few near killers (I thought incanter and a web app framework around ring and enlive would be the first).

locopati · on March 5, 2012

The idea seems very interesting, but the non-free aspect of this seems likely to limit its uptake. I cannot install a version of this for small-scale, personal, or not-for-profit needs other than using a non-durable VM that saves state only when suspended. Even if I buy into the Datomic pricing model and that pricing is not prohibitive, I am still bound to Amazon's pricing model (though hopefully that will expand over time to other cloud services to prevent vendor lock-in).

drcode · on March 5, 2012

I think Rich & the gang need to focus on a niche market for now as this technology matures. I expect they will work with a handful of enterprise clients for now but roll out "convenience features" in the future (i.e. easy to use and inexpensive hosting for smaller customers.)

locopati · on March 5, 2012

Certainly that's one approach. The one that seems more likely to generate widespread use is to put a tool in the hands of lots of developers, making it easy for more people to get involved and for unexpected uses to happen (e.g. MySQL v Oracle - though I'm not suggesting that the cost scales are equivalent here, it's merely the first example that comes to mind).

jwhitlark · on March 5, 2012

Rich & co. have historically been more interested in being correct than being popular, especially at the early stages. I would expect to see more components show up fairly quickly, but not before they're ready for use.

twodayslate · on March 6, 2012

Can someone explain what I can do with this thing? Can I use it to backup all my files and make a dropbox sorta thing? I don't understand.

jfarmer · on March 5, 2012

Why is this interesting? It sounds like yet-another data store.

nickik · on March 5, 2012

I disagree its not at all like anything I have seen befor. It has a very powerful querylanguage you can use your hole programming language not just what the designers wanted (like in SQL). http://www.youtube.com/watch?feature=player_embedded&v=b...

There is other novel stuff.