Hacker News new | past | comments | ask | show | jobs | submit login
An Opinionated Clojure CQRS/ES Using Onyx, Datomic, DynamoDB, Kafka (yuppiechef.github.io)
111 points by CmdrDats on Feb 12, 2015 | hide | past | favorite | 33 comments



The current relational database is too limiting. We're dropping all sorts of interesting data on the ground because we don't have suitable pigeonholes to put it into.

That is the best description of the problem event sourcing tries to solve that I have seen. Short and to the point.


> That is the best description of the problem event sourcing tries to solve that I have seen.

Huh. I always thought that the problem event sourcing was trying to solve was more that modeling complex interactive systems through their static state rather than modeling the interactions directly as the principal domain objects is invariably futile. While "we're dropping all sorts of interesting data on the ground" may be one manifestation of that, I think the more common manifestation is "we bear too high of an expense when adapting our information system to changes in business process".


modeling complex interactive systems through their static state rather than modeling the interactions directly as the principal domain objects is invariably futile

Now that actually makes sense to someone not already familiar with the context.


I agree - that was the best explanation of CQRS I've seen.


CQRS != Event Sourcing. ;)


I'd add: "All data-models have strengths and weaknesses. The event-log is weak on its own, but its strength is that it nurtures an ecosystem of other models for specific use-cases."


Maybe to someone who's already familiar with other, longer descriptions of the problem. I don't know what problem this is solving and that description doesn't do much for me.

Can someone either explain, or point to a fuller explanation, of the purported problem?


There are some overlapping architectural ideas here, I find they build upwards like this:

CQRS: You have one "write" data-model which is specialized for transactions and applying business rules... and other "read" models that take care of things like dashboards, detailed information pages, searching, generating reports, etc.

Chronological events: A really handy way to propagate changes from the write-model into the various read-models.

Event-sourcing: This is "dogfooding" your own events. Instead of just emitting optional messages to other systems, you make them essential, and use them as the authoritative record for your own system's state. This adds certain kinds of flexibility that you may (or may not) need.


If one describes CQRS as being mostly about a write (e.g. master data) + read (e.g. derived data) combination, how far could one get with a relational system with views and triggers?


Pretty far, I imagine, especially with things like materialized views... I'm just biased against putting more logic into the DB than strictly necessary, given the difficulty of versioning that logic and how the DB is often the bottleneck among multiple webservers.



Very nice!Here's a good follow up on the inner workings of onyx. https://github.com/MichaelDrogalis/onyx/blob/0.5.x/doc/user-...


Thanks jamii, that's an excellent read - I forgot to add it to the Further Reading section! Will do that now :)


I disagree. I mean sure, I put humans and radiation in the same pingeonhole, but thats more of a devops problem to work out.


I was under the impression that Datomic alone could handle everything involved with an eventstore, just by adding the domain event as an attribute on the "transaction" (because transactions themselves are also entities). I don't understand what this adds.


Technically it could - but I felt Datomic is particularly strong as an aggregate view. Adding transactions that just have events attached to them isn't a particularly useful aggregate to query.

Having the eventstore separate means that you can keep Datomic focussed at what it's really good at. When you find good aggregate views for your events, you can scan through old events and populate Datomic.

This separation also means that you can make the events as rich and plentiful as you want without burdening Datomic with stuff that you don't have a good way to query anyhow.


The event is the transaction.

For example of I have a 'make_user_preferred' event, I just transact the database and then add any metadata I need to the Datomic transaction.

Datomic is Event Sourcing if you create a realized version of your database on every event. It is just smart about it and keeps diffs.


> Adding transactions that just have events attached to them isn't a particularly useful aggregate to query.

I still don't understand.

The transaction log, in datomic, can be queried as easily as you would query the rest of the data. The view exists on the client. Changes are already queued by the transactor. And, changes are pushed to the clients from the transactor.


So to have a concrete example: in our preliminary test of Datomic, we threw all our page view events directly at Datomic. Every single attribute in the event gets indexed at least 3 times and it grew out of a manageable scale really quickly. We had to find a solution where we can still maintain that data and aggregate it into Datomic when and where we can make useful sense out of it.

In certain cases you can definitely put the events directly into the transactions. In our case, it's just not the right fit and I suspect that many others will find the same constraints apply.


> Every single attribute in the event gets indexed at least 3 times and it grew out of a manageable scale really quickly.

Ok, so it was a technical issue. I assume you attempted to change the indexing.

The technical reasoning did not come through in the project page or your comments.


It's not a technical issue. The fact that field "x" in a record changed from 4 to 5 (even if you can know every value that has ever been in that field) will never be able to tell you why that value changed.

Was it because a CSR got a call from a pissed-of-customer and agreed to change it? Was it because a third-party system fucked up and decided give the customer a little more "x"?

This is where Domain-Driven-Design comes in and CQRS/Event Sourcing really shines. The net effect of those two events I just described may be exactly the same for MY domain, but if there are other systems that are listening for the domain's events they may come to radically different conclusions. In CQRS/ES you would actually record those two things as different events, semantically.

It really upsets me that people don't get this about CQRS/ES. Maybe I should just start spamming links to Greg Young's talks...


I believe you misunderstand my comments.

A Datomic transaction is also an entity, with associated facts about that transaction. You can add custom facts, such as your application's domain event, directly to the transaction entity to further describe the event/transaction. You can then query over the transactions as you would any other data. In CQRS/ES terms, it is an event store with snapshots on every event without the cost of duplicate data.

The technical issue appears to be that their event stream was of a high enough throughput that the indexing caused a space issue.


My apologies - my example was to illustrate a fit mismatch, not so much technical deficiencies.

Let's try it from a different angle: If you wanted to include every attribute of every event into Datomic as well as the aggregate view, you would have to add schema attributes for every event property AND the logical aggregate properties.

The extra schema doesn't really buy you anything other than being able to query your events, which you'll usually do by event type and date range. If you do need richer querying you're likely looking at an aggregate that happens to match the events. That's ok, and a design decision you can make.

Using DynamoDB raw means that events can have arbitrary shapes (including nested data structures) and you just dump them in verbatim. Then you only worry about your aggregate schema in Datomic.

In cqrs-server, we are also tagging the transaction with the event uuid, should you need to pull out the raw event from Dynamo.


> you would have to add schema attributes for every event property AND the logical aggregate properties.

In that case, you can serialize the hash-map with the fressian serializer (the one Datamic uses internally) and store it in an attribute of type :db.type/bytes. If the data is large (you didn't specify), then the link to an external data store is a good choice. The lack of an arbitrary shaped data type is a missing feature of Datomic, but is easy enough to work around.

I'm not sure Datomic is a mismatch. But, I don't know all the details of your situation. My main issue is with the lack of an explanation as to why you needed all those extra parts in the project page. The technical issues you solved, with those extra pieces, didn't come through in the project page.


You raise a good point about not being clearer in the project page. I'll give it some thought and probably expand a bit on the dynamodb and datomic sections to incorporate the reasoning.

Again though, you are right, the events could be added to the datomic transactions with a fressian serialized byte array (we're fressian serializing into dynamo anyhow). It doesn't seem ideal to commit transactions with only the event data attached to the transaction and then at some later time possibly create an aggregate derivative from the event in a separate transaction.

The bottom line is: The event store and the transactional aggregate view are two very different things - putting both into one place conflates the two. We simply picked what we felt was better suited for each job, independent of one another. You could trivially point the event queue at Datomic instead - it really just doesn't buy you that much.


Oh, I see. You're right -- I misunderstood. I guess I have a sort of reflex/twitch going on about CQRS/ES vs. Datomic. Mea culpa. :)


Datomic is a great event log, but it is explicitly not meant to be used in high write volume environments, as it is limited by the serial transactor.


I think of Datomic as much more than a write log, since it includes the Datalog query language.

Also, how are you defining "high write volume"? Have you looked at metrics of how fast Datomic's serial transactor can be?


From the FAQ: "When is Datomic not a good fit? Datomic is not a good fit if you need unlimited write scalability, or have data with a high update churn rate (e.g. counters). And, at present, Datomic does not have support for BLOBs."

I agree Datomic is far more than a write log but in the context of this thread, the discussion was about write logs.


Interesting. Couldn't you achieve the same thing by aggregating data into an RDMS "when and where [you] can make useful sense of it"?


I really like the ideas behind CQRS and Event Sourcing. So much so we are building a platform using these patterns.

Want to live in Austin, TX and build a data platform from the ground up? We're hiring:

https://careers.stackoverflow.com/jobs/80205/senior-develope...


There is a guy by name Udi Dahan. He is a self proclaimed CQRS guru. He also sells his trainings for at least few grands. I have listened to his podcasts and read his blogs but I never got any clue what is CQRS and under what scenarios it is best to go with this pattern. Here, I just read the first page and I got it what it is and when I can use it.


I like this trend of indicating articles as being opinionated. It acknowledges the fact that there's no be-all end-all solution and prevents having to include useless fuzzy apologies and hand waving in the article.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: