Hacker News new | past | comments | ask | show | jobs | submit login
Gizzard - Twitter's open source framework for creating distributed datastores (github.com/twitter)
69 points by abraham on April 7, 2010 | hide | past | favorite | 9 comments



THIS is the future of "NoSQL." It's all about custom, distributed datastores. This is going to make generalized database software look like shrink-wrapped software sitting next to custom, purpose-built. When done right, there's just no comparison.


I think we're going to see more and more motion in this direction. The problems of block storage, of building structured data out of block storage (even key-value is a structure!), of partitioning/replication, and of read caching are fundamentally different, and there's no good reason why they should all be squashed into the same codebase. (Depending on how partitioning and replication are done, they are sometimes separate and sometimes need to be done together.)


Hm, another passing mention of their distributed graph database, FlockDB.


it's coming...


From the article: "In order to achieve "eventual consistency", this "retry later" strategy requires that your write operations are idempotent. This is because a retry later strategy can apply operations out-of-order (as, for instance, when newer jobs are applied before older failed jobs are retried)."

What do you do when idempotency is not possible? If you have Relational databases, how do folks tackle this?

I can understand that it would be excellent for storing search indexes and non-relational databases though.


It's hard to come up with examples where idempotency is impossible (I'm sure there are some)... but there are definitely cases where it is difficult. Counters are one of the most obvious examples; to make them idempotent you need to jump through a lot of hoops. Usually you assign a transaction-id to each increment/decrement operation and you keep a log of which have been applied. Suffice to say this explodes the cost of storing a counter (which would otherwise only require 32/64 bits).

Other things are hard to make idempotent but it's stil practical. Examples of this include operations like "delete all rows matching query Q". This either means "delete all rows for now and forevermore" or "delete all rows that exist at time T". In either case new rows matching Q might arrive in the future (but be antedated to the past) and you have to store the operation around in some way to apply the delete operation in the future. This can be easy if your query is easy to represent, and there is a limited class of such queries.

Sorry, it's hard to be precise about this in comments. The bottom line is Gizzard is not perfect for everything but idempotency is worth jumping through hoops a lot of the time (Gizzard or no!)


Why should relational db writes not be idempotent? MySQL example:

  INSERT INTO table (id, a, b) VALUES (1,2,3)
    ON DUPLICATE KEY UPDATE a=2, b=3;


Wow, I thought that I was getting pretty good at coding in Scala until I just spent 30 minutes reading Gizzard's code base. Ugh, now I feel like I have only been using about 20% of the language.

Good on Twitter for releasing this - looks like good infrastructure code. I want to look at the Rowz sample application when I get some free time.


this sounds and awful lot like dns. i wonder whether they evaluated deploying this solution using existing dns software. the additional features of this solution that justify its cost are not obvious from the link.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: