LightCloud: Distributed key-value database built on Tokyo Tyrant

amix · on March 1, 2009

Comments, questions and reviews are very welcomed.

bdr · on March 1, 2009

What does LightCloud add that Tokyo Tyrant does not already provide? I've read the websites of both products, it's just kind of confusing.

From some other comments, it seems I'm not the only one confused. Tokyo Cabinet+Tyrant are pretty new on the scene and there isn't a lot about them in English yet. So, if you add a level of explanation to the site that seems excessive to you, people would probably find it more useful than you expect.

amix · on March 1, 2009

LightCloud adds horizontal scaling. If you just use Tokyo Tyrant then you can only scale by buying bigger servers. If you use LightCloud you can scale by buying extra servers.

When scaling upwards you would generally _really_ want to scale horizontally, since the vertical scale has a limit and you can quickly reach it (plus, buying bigger machines is generally much more expensive than buying extra machines).

chadr · on March 1, 2009

How horizontally have you scaled your systems in production? The homepage mentions 2 servers. I'm wondering if you are using it in production with more than 2.

amix · on March 1, 2009

These two servers run 3 lookup nodes and 6 storage nodes (i.e. 6 lookup nodes and 12 storage nodes in total). These servers are quite powerful [32GB of RAM and using RAID10], they also run MySQL.

dh2k · on May 25, 2009

Hey,

my test results are the following:

[root@server test]# tcrmttest write -port 1978 localhost 100000 <Writing Test> host=localhost port=1978 tnum=1 rnum=100000 nr=0 ext= rnd=0

......................... (00010000) ......................... (00020000) ......................... (00030000) ......................... (00040000) ......................... (00050000) ......................... (00060000) ......................... (00070000) ......................... (00080000) ......................... (00090000) ......................... (00100000) record number: 200001 size: 6928736 time: 22.460 ok

[root@server test]# tcrmttest read -port 1978 localhost <Reading Test> host=localhost port=1978 tnum=1 mul=0 rnd=0

......................... (00020000) ......................... (00040000) ......................... (00060000) ......................... (00080000) ......................... (00100000) tcrmttest: tcrdbget: error: 7: no record found record number: 200001 size: 6928736 time: 21.996 error

[root@server test]# tcrmttest remove -port 1978 localhost <Removing Test> host=localhost port=1978 tnum=1 rnd=0

......................... (00020000) ......................... (00040000) ......................... (00060000) ......................... (00080000) ......................... (00100000) tcrmttest: tcrdbout: error: 7: no record found record number: 100001 size: 6928736 time: 22.692 error

how can I reach the 1M put/get?

Looks like TT is around 2-3K records / sec in read/write. I've tested with all kind of table structures (on-memory hash, b+ tree, disk based hash, b+ tree, table, etc). and it was the same speed all the time.

sebastian · on March 1, 2009

Isn't memcachedb faster?

Comparing the posted benchmark results with http://memcachedb.org/benchmark.html

You get around 2800 r/s using LightCloud vs. around 64000 r/s using Memcachedb

and around 1080 w/s using LightCloud vs. around 23500 w/s using Memcachedb.

I would be interested in seeing some benchmarks that compare both.

I really like LightCloud's idea of automatic scaling, failover and load balancing.

amix · on March 1, 2009

Please do see http://news.ycombinator.com/item?id=498699 (memcachedb != LightCloud). memcachedb should be compared with Tokyo Tyrant and not LightCloud.

And if you liked, you could extend LightCloud with memcachedb support (which we also had at one point and ran it in production [see my posts on memcachedb mailing list for proof]), but really, when it comes to key-value databases, it's really hard to beat Tokyo Tyrant, which is the fastest and most feature complete key-value database out there (IMO and I have looked at most of the popular solutions).

jwinter · on March 1, 2009

That comment says: "memcachedb is not distributed, meaning that you can only scale vertically (i.e. by buying bigger machines)." Is that true? The docs on memcachedb seem to imply the opposite.

amix · on March 1, 2009

memcachedb is not distributed - it only supports replication. I.e. with memcachedb you can only scale reads, but not writes (or at least not without a system like LightCloud on top of it).

leej · on March 1, 2009

Tokyo Tyrant has different kinds of databases LightCloud has support for all? I think so but just be sure.

Do you have any plans for developing a PHP API for LightCloud?

Thanks for your excellent work. I hope documentation will be improved.

siong1987 · on March 1, 2009

I am wondering. From the benchmark, it is obviously slower than memcached. Why someone wants to use this instead of memcached which has better support?

amix · on March 1, 2009

memcached is a memory based key-value database. LightCloud is persistent i.e. data is saved to disk. I'll make this more clear on the website.

siong1987 · on March 1, 2009

I will be more excited to see how this key-value database actually helps to scale plurk. I am interested to integrate this into Rails if this really works very well.

catch23 · on March 1, 2009

I use an in-memory tokyo cabinet as a memcached replacement on my rails system. (well actually it runs merb, but close enough)

henning · on March 1, 2009

Do you therefore see LightCloud as a possible alternative/complement to memcached? Certainly buying 500 GB of disk space is cheaper than buying 500 GB of RAM.

amix · on March 1, 2009

At Plurk.com we use both LightCloud and memcached. They complement each other. I.e. memcached is used for caching stuff - to reduce the load to MySQL. LightCloud is used to store persistent data - such as how many times a plurk has been read.

You could also store this in MySQL, but generally, storing key-value data is not the force of a relational database such as MySQL and LightCloud (and other key-value databases) are optimized for this kind of storage.

nuggien · on March 1, 2009

there is memcachedb (http://memcachedb.org/) which provides persistence through berkeleydb.

amix · on March 1, 2009

memcachedb is not distributed, meaning that you can only scale vertically (i.e. by buying bigger machines). With LightCloud you can scale by adding extra servers to the system (i.e. horizontal scaling).

This said, Tokyo Tyrant performs better than memcachedb and offers more features (such as master-master replication and Lua scripting).

I posted a comment a while back on Tokyo Tyrant vs. memcachedb: http://news.ycombinator.com/item?id=480055

GSP2000 · on March 23, 2009

This is not a correct observation! Memcachedb is nothing but Memcached with BerkeleyDB underneath. Distribution of data in Memcached(also MemcachedDB) is achieved by client side hashing of keys. If your argument were true then, Facebook would not scale vertically - they are the biggest users of memcached. MemcacheDB can be used in exactly the same - and we have in production BTW.

binarray2000 · on March 1, 2009

amix, thanks for your effort (regarding both the development and the explanations here). Upvoted and will be considered for the next project. Keep up the good work!

catch23 · on March 1, 2009

it's down!

btw, i'm also using tyrant myself, a very cool thing indeed!

moonpolysoft · on March 1, 2009

How does it deal with events like disk failure, network partitions, and concurrent updates? The design documentation is rather light, so it's really hard to make out how this actually distributes data.

You say it doesn't have any concept of eventual consistency. Yet how does it coordinate updates to nodes? Does it do two phase commit? Paxos?

amix · on March 1, 2009

Every node in both hash rings is replicated using master-master replication - i.e. node A and node A' can both receive updates and reads. Node A and node A' sync their updates via an update log and can fail at any time and come back at any time without taking down the system.

Additionally, if high availability is really a big issue, then a node A''' can be introduced that can be in another data center.

If you add nodes to the storage ring, then some of the existing keys will be invalidated. To solve this issue and the issue of routing a lookup ring is created. Lookup ring holds a pair (key, storage_ring_location). The system will automatically update (key, storage_ring_location) if it's at some point invalidated (such as that key does not point to node A, but node D).

I have tried to find an easy solution for a rather complex problem. Keeping membership state, doing Paxos and keeping routing tables would have been much more time consuming to make - so I have tried to solve the problem from another angel (by using master-master replication for high availability).

trezor · on March 1, 2009

Like I said on reddit (http://www.reddit.com/r/programming/comments/814no/lightclou...), the performance seems somewhat lackluster. Especially considering the extremely small test-load.

I'd be more interested and might provide a somewhat less negative attitude if you were to do some real testing on a proper dataset (several hundred megabytes or gigabytes, not 10 bytes) and could show that adding servers actually improves performance.

The current test-data and test-script is simply insufficient to the point of being useless.

amix · on March 2, 2009

Like I have already stated I am interested in how the system will run in production. Generally, you will do lots of small updates and lots of small fetches with key-value databases. You won't do batch operations - which makes your benchmark pretty irrelevant.

Try to benchmark your relational database by doing this: - create a new connection - fetch one row - close connection

And try to compare this to selecting multiple rows at once. The result will be MUCH different. And this basically outlines the difference between your benchmark and mine.

This said, you will only hit limitations with a relational databases if you are having lots of data. If you run a blog, a low traffic site or can keep all your data in memory, then you won't have any problems. And I do have experience in the world of relational databases and using MSSQL won't solve this problem for you (else you would see Facebook, Friendfeed, Twitter and Google etc. use MSSQL or Oracle).