They're used (amongst many other applications) by Amazon-dynamo patterned storage systems, such as Dynomite, Riak or Voldemort. My presentation from November < http://behemoth.strlen.net/~alex/Voldemort_NoSQL_Oakland.ppt > explains vector clocks and their use in a storage system, but in my opinion the parent article does so even better (I wasn't too good at OmniGraffle and only used PPT out of expediency).
</Shameless plug>
You should definitely read the Dynamo paper < http://www.allthingsdistributed.com/2007/10/amazons_dynamo.h... > as it's just plain interesting (although, it's by no means a "blueprint" to building a storage system). Here's my attempt at cliff notes to the Dynamo paper, explaining the application of a Vector clock:
Suppose you have a system where you want to achieve N-way replication of data, but don't want to lose availability for writes by going to two-phase commit. So in this case, you do this:
* Write your data synchronously to W nodes
* Perform a background (async) write to N - W nodes
* When reading, instead read from R nodes. Now, if R + W > N, you can now be sure that at least one of the R nodes will have the most up to date version of the data, achieving a very basic "read your writes" consistency, without having to write to all of the nodes.
Problem is, how do you tell which is the node holding the truth? That's what vector clocks are for.
Given a value X with clock A and and a value Y with clock B, you can tell whether X was written because Y was read (implying X was written after Y), whether Y was written because of X was read (implying X was written before Y) or if the two are independent of each other. The latter scenario occurs when nodes holding X and Y become partitioned from each other (i.e. network link going down) - but some subset of clients is able to talk to X, another is able to talk to Y and another to both. In this case you can't tell which came as a result of which, but you can see that the scenario has happened and attempt to reconcile the values.
The simplest example of this reconciliation is Amazon's shopping cart (powered by Dynamo), where if two node had causally unrelated versions of the shopping cart it would make sense just to merge the two versions (combining the items).
They're used (amongst many other applications) by Amazon-dynamo patterned storage systems, such as Dynomite, Riak or Voldemort. My presentation from November < http://behemoth.strlen.net/~alex/Voldemort_NoSQL_Oakland.ppt > explains vector clocks and their use in a storage system, but in my opinion the parent article does so even better (I wasn't too good at OmniGraffle and only used PPT out of expediency).
</Shameless plug>
You should definitely read the Dynamo paper < http://www.allthingsdistributed.com/2007/10/amazons_dynamo.h... > as it's just plain interesting (although, it's by no means a "blueprint" to building a storage system). Here's my attempt at cliff notes to the Dynamo paper, explaining the application of a Vector clock:
Suppose you have a system where you want to achieve N-way replication of data, but don't want to lose availability for writes by going to two-phase commit. So in this case, you do this: * Write your data synchronously to W nodes
* Perform a background (async) write to N - W nodes
* When reading, instead read from R nodes. Now, if R + W > N, you can now be sure that at least one of the R nodes will have the most up to date version of the data, achieving a very basic "read your writes" consistency, without having to write to all of the nodes.
Problem is, how do you tell which is the node holding the truth? That's what vector clocks are for.
Given a value X with clock A and and a value Y with clock B, you can tell whether X was written because Y was read (implying X was written after Y), whether Y was written because of X was read (implying X was written before Y) or if the two are independent of each other. The latter scenario occurs when nodes holding X and Y become partitioned from each other (i.e. network link going down) - but some subset of clients is able to talk to X, another is able to talk to Y and another to both. In this case you can't tell which came as a result of which, but you can see that the scenario has happened and attempt to reconcile the values.
The simplest example of this reconciliation is Amazon's shopping cart (powered by Dynamo), where if two node had causally unrelated versions of the shopping cart it would make sense just to merge the two versions (combining the items).