Hacker News new | past | comments | ask | show | jobs | submit login

> Nonsense. Whatever the calling app does with the data will always be an additional cost over what the DB does with the data.

Way to ignore everything I said. What you said here is obviously not the case, as I explained, since for a large part, this is simply deferring work till later. Also, explain to me this:

How is the 100kb value benchmark performing 2x as many operations per second as the 100 bytes value benchmark? Do you claim this is indicative of real world performance?

And for example section 7, which does tests on the SSD. Do you really think that 30 million operations per second for 100kb values is in any relation to real world performance?

Obviously, these benchmarks are not indicative of real world performance. This database may well be very fast, but these benchmarks don't show it.




Way to ignore what's printed in front of you. As the writeup clearly states, the benchmark shows that throughput is based on the number of keys in the DB, not on the size of the values. The reason the 100KB test is faster is because there are fewer keys.


Sure, I get that. My question was: do you consider that indicative of real world performance? I consider that misleading. Especially if you are labeling benchmarks with tmpfs, HDD and SSD, when the read benchmarks are not even touching the disk.


Microbenchmarks practically never map 1:1 to real world performance, since these libraries get embedded in much larger software systems that each have their own bottlenecks and inefficiencies. That should already be well understood when we call these things "microbenchmarks" and not "benchmarks". Meanwhile, compare all of the LevelDB/Kyoto/SQLite3 results we reported to the results the LevelDB authors originally reported - you'll find they are directly comparable. It may well be that read tests whose data sets are already fully cached in RAM are not representative of the underlying media speed. But (1) we're just trying to produce numbers using the same methodology as the original LevelDB benchmarks and (2) the full results show that even with all data fully cached in RAM, the type of underlying filesystem still had dramatic effects on the performance of each library.

We've done another bench recently (not yet prettied up for publication) with a VM with 32GB of RAM and 4 CPU cores. On an LDAP database with 5 million entries, the full DB occupies 9GB of RAM and this VM can return 33,000 random queries/second ad nauseum. Scaling the DB up to 25 million entries, the DB requires 40GB of RAM and the random query rate across all 25M drops down to around 20,000/sec. Tests like these are more about the price/quality of your hard drive than the quality of your software. As publishable results they're much less referenceable since any particular site's results will have much greater variance, and the only thing for a conscientious party to do is measure it on their own system anyway.


Fair enough, I don't see any problem with in memory benchmarks, as long as they are marked as such, and if you're comparing apples to apples. The best way to do this being actually using the data from the read queries to do some trivial operation like computing the XOR hash -- that would still be a best case for your library yet still real world.

I've read the papers on your DB and they are quite interesting. What do you think about the work in the paper linked in this post? It's unfortunate that they just compare with skip lists. I don't think anybody seriously believes skip lists are a good idea ever, so it's a bit of a straw man at this point (though I may be wrong).


I find the Bw-tree work pretty underwhelming. It's not a transactional engine; the fact that it can operate "latch-free" is irrelevant since it must be wrapped by some higher concurrency management layer to truly be useful. The delta-update mechanism is probably more write-efficient than LMDB, which could be a point in their favor. The fact that they still rely on an application-level page cache manager is a strong negative - you can never manage the cache more accurately and more efficiently than the kernel, which gets hardware assist for free. Overall, it's an awful lot of noise being made over not much actual innovation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: