Hacker News new | past | comments | ask | show | jobs | submit login

The limits of the RDBMS are orders of magnitude greater than the NoSQL crowd think they are.

Agreed. There's a couple of orders of magnitude just from I/O improvement in hardware, using conventional[1] hardware, configured intelligently.

There's at least another order of magnitude in optimizations that aren't possible in NoSQL's strawman[2], such as separate tablespaces for data and index, partial indexes, and a galaxy of query tuning from an EXPLAIN that actuall provides a query plan.

[1] Meaning commodity-priced, nothing fancier than $400 RAID cards and spinning disks.

[2] MySQL




In discussions about "NoSQL" systems, I've found out that some of the developers complaining about RDBMS performance didn't even know what indexes were.

Usually they learned how to use MySQL from thirdhand PHP & MySQL tutorials off somebody's blog or something, and thought it was representative of all RDBMSs.

Not saying everyone using "NoSQL" is poorly informed, just that sometimes peoples' impressions of performance aren't very accurate. It makes me suspicious when somebody's benchmark only uses MySQL.


I've slowly come to the same opinion over the past few years. I'm just going to leave this here for posterity:

http://www.dbdebunk.com/


There is nothing approaching even a single order of magnitude in improvement in random I/O with conventional HDDs over the last, say, 20 years. Not even a 2X improvement. You're under 500 IOPS per HDD, period, use them wisely.

Moore's law doesn't apply to RPMs of spinning disks.


Yes, but a battery backed write cache will turn random IO into sequential IO eliminating this '500 IOPS' barrier. Put 512 MB of BBWC into a system and see how your 'random IO' performs. The whole point is that NO ONE scans their ENTIRE dataset randomly, if you have to scan your entire dataset, just access it sequentially. Plus, nothing in NoSQL solves ANY of the points you are outlining.


Write cache is useful of course but it doesn't make random reads any faster. If your dataset is too big to cache, disk latency becomes your limiting factor. The question then becomes how to most effectively deploy as many spindles as possible in your solution - SANs are one way, sharding/distribution across multiple nodes another.


No, you would never waste write cache on random reads, instead you'd buy more RAM for your server. Why would anyone ever buy a whole new chassis, CPU, drives, etc when all you need is more RAM? As for how to effectively deploy spindles the answer is generally external drive enclosures. Generally you can put a rack of disks and save a 2-4U for the server.


I said "where it's too big to cache" -- that means too big for RAM, aka >50GB or >200GB depending on what kind of server you have.


SANs are one way,

Agreed, if you include fast interconects like SAS and exclude the network[1] requirement of SANs.

sharding/distribution across multiple nodes another.

I disagree, for the sam reason that doing so with iSCSI over ethernet isn't: too much added latency.

Infiniband may help, but I have yet to try it empirically.

[1] Switching/routing, multiple initiators, distances longer than a few dozen meters.


You're under 500 IOPS per HDD, period

This is only significant if one is limited to a trivial number of spinning disks. 20 years ago, with separate disk controllers, this was the case.

If you run some benchmarks, I expect you'll find that, for random I/O, N disks perform better than N times one of those disks.

SCSI provided (arguably) an order of magnitude for number of disks per system.

Now, SAS provides another. $8k will buy 100 disks (and enclosures, expanders, etc). How many IOPS is that?

ETA: The Fujitsu Eagle (my archetype of 20ish years ago disk technology) had, IIRC, an average access time of 28ms. If its sequential transfer rate was one 60-100th of modern disks, what fraction of a modern disk's 4k IOPS could it do?


Yes, I agree that the solution is to throw more spindles at the problem.

PL/SQL, though, with global data reach and advanced locking states for every single transaction, make it really hard to move off of a single host. So it's more and more work to get more disks attached to that host, and CPU is a hard upper limit.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: