These are very niche use cases. Doing an entire benchmark on that basis without ...

hyc_symas · on April 10, 2013

Better to make everything as simple and as easy as possible. App-managed chunking is error-prone, and it forces redundancy of implementations. Likewise, "you have an extra in memory copy" can make or break a system. OpenLDAP slapd using LMDB uses 1/4 the RAM of slapd using BerkeleyDB for a given number of entries. That means you can serve 4x as many entries using LMDB on a given hw config at full RAM speed, vs BDB. LMDB efficiency has a direct impact on hardware cost and electricity cost, as well as sw development cost.

jules · on April 10, 2013

Exactly! So do real world fair benchmarks that show 4x improvement, not unfair benchmarks that literally show 3000x improvement over BDB.

hyc_symas · on April 10, 2013

We are doing all of the above. Microbenchmarks that show only the performance of the DB engine still serve a purpose. And the difference between the microbenches and the full benches shows you how much overhead is eaten in the rest of the system. E.g., when testing read performance of SQLite3 adapted to LMDB, we measured only a 1-2% improvement, because the majority of SQLite3 read execution time is spent in SQL processing, not in its Btree code.

hyc_symas · on April 10, 2013

Here's some more real world testing, done by the folks at Zimbra:

http://wiki.zimbra.com/wiki/OpenLDAP_MDB_vs_HDB_performance

Clearly more than 4x write speed improvement.

jules · on April 11, 2013

Nice! Do you know why that benchmark show almost no read improvement but massive write improvement? The slides here (http://symas.com/mdb/20120322-UKUUG-MDB.pdf) suggest the opposite? Or am I reading them incorrectly?

hyc_symas · on April 12, 2013

As I understand it, that test used something like 32 server threads on a machine with only 4 cores. The UKUUG prezo used 16 server threads on a machine with 16 cores, which allowed all of the reads to run concurrently. The smaller number of CPU cores in the Zimbra test naturally reduces the concurrency advantage of LMDB.

hosay123 · on April 10, 2013

I believe the test runs with 100 byte values, and the key is stored on the same page as the value. Therefore in order to perform a lookup the entire page must still be faulted in, and the key must still be compared, so the benchmark is not fake in that regard. Data is getting pulled off disk, just not in a way that requires multiple copies. Note the library's tiny code size (and zero copy design) has another benefit, in that it can greatly reduce cache pollution, which may again help account for the speed difference.

> Storing 500MB records in a database is a bad idea anyways, since you never know if you're going to trip over some logic in the database, your application, or the kernel, or some interaction between these, that's going to load or copy it all unnecessarily. Better to make the chunking explicit.

So basically you're saying there is no benefit because you aren't used to systems that support it.

jules · on April 10, 2013

> Therefore in order to perform a lookup the entire page must still be faulted in, and the key must still be compared, so the benchmark is not fake in that regard. Data is getting pulled off disk, just not in a way that requires multiple copies.

How did you get this information? If that's the case, why did they not actually use the values (e.g. do something trivial like compute accumulated XOR or something). That would still show the advantages of the zero-copy design, but not ridiculously so.

It still seems highly suspicious. For example the benchmark with 100kb values runs twice as fast as the benchmark with 100 byte values?!

I certainly agree that the zero-copy design can be good, but these benchmarks are vastly overstating it to the point that the benchmarks are pretty useless. With these benchmarks it's impossible to know if that database is actually faster than the others. It could well be faster, but it could also be slower. To know that you need a fair comparison.

Note also that memory mapped designs have disadvantages as well. For example AFAIK memory mapping doesn't work asynchronously. You will block your threads, so you can't use lightweight threads or callbacks but you'll have to use heavyweight threads (which may limit concurrency and be expensive in general). Unless you implement your own OS of course (which interestingly some people are doing: http://www.openmirage.org/).

> So basically you're saying there is no benefit here because you aren't used to systems that effortlessly support it.

I'm not saying there is no benefit, I'm saying that it's a bad idea to rely on it that way for 500mb because for it to work safely every component has to support the same. Too easy to have an unexpected problem somewhere and have it blow up by copying 500mb when you only wanted 1kb. But it's still valuable for a database to handle large values well of course. Anyway, this is beside the main point.

hyc_symas · on April 10, 2013

Still nonsense. "AFAIK memory mapping doesn't work asynchronously" - clearly you don't know much. Reader threads in LMDB never block (except for pagefaults to pull data in, which are obviously unavoidable).

jules · on April 10, 2013

> except for pagefaults to pull data in, which are obviously unavoidable

Umm, obviously that's what I mean. And with async IO instead of memory mapping, you could avoid that. Hence why I said memory mapping has disadvantages. Clearly the "nonsense" was invented at your end.

hyc_symas · on April 10, 2013

Eh. You're still talking nonsense. If an app requests some data and it needs to be read in from disk, you have to wait. Async I/O or not, that app can't make any progress until its data arrives. Async I/O is only advantageous for writes.

jules · on April 10, 2013

That's incorrect, even for reads the app can do something else at the same time while the disk read is being performed. That's what async I/O is all about.

hyc_symas · on April 11, 2013

The app can do whatever else it wants to do that's completely unrelated to the data request. The part of the app that requested the data is stalled until the data arrives. If the data request is the only useful thing going on (e.g., a user clicked "search for Joe's phone number") then there's nothing else productive for the app to do and async I/O is irrelevant.

jules · on April 11, 2013

Yes. That last bit is very rare, especially in server applications where you have multiple clients. In any case you might want to be a little more careful before calling something that you didn't understand nonsense.

hyc_symas · on April 12, 2013

In server applications you have multiple threads too. Individual clients may stall but overall a large number of clients progress.

jules · on April 13, 2013

Yes, I already pointed that out, so we're finally back to what I already originally said ;-)

hyc_symas · on April 17, 2013

So we're in violent agreement? ;)

jules · on April 18, 2013

I think so! :)

hosay123 · on April 10, 2013

I use it in a private project. For comparison, here's the difference between touch-all-bytes and touch-no-bytes using the Python binding:

    random lookup 1000000 buffers in 1.82sec (550928/sec)
    random lookup+hash 1000000 buffers in 2.52sec (397189/sec)

This was using 100 byte values.

jules · on April 10, 2013

Could it be that your test is dominated by Python speed, given that their tests show 15-30 million operations per second but your test only half a million? Or is their hardware so much faster?

justin66 · on April 10, 2013

> Doing an entire benchmark on that basis without making that clear is misleading.

A benchmark which provides source code and test data should never be categorized as "misleading." At worst it puts too much trust in the average reader's understanding of things. But look at it this way: if you littered every page that included a benchmark with all the relevant caveats, it'd be a total mess.

Or as David Simon says, "fuck the average viewer."