Hacker News new | past | comments | ask | show | jobs | submit login

Yes, but a battery backed write cache will turn random IO into sequential IO eliminating this '500 IOPS' barrier. Put 512 MB of BBWC into a system and see how your 'random IO' performs. The whole point is that NO ONE scans their ENTIRE dataset randomly, if you have to scan your entire dataset, just access it sequentially. Plus, nothing in NoSQL solves ANY of the points you are outlining.



Write cache is useful of course but it doesn't make random reads any faster. If your dataset is too big to cache, disk latency becomes your limiting factor. The question then becomes how to most effectively deploy as many spindles as possible in your solution - SANs are one way, sharding/distribution across multiple nodes another.


No, you would never waste write cache on random reads, instead you'd buy more RAM for your server. Why would anyone ever buy a whole new chassis, CPU, drives, etc when all you need is more RAM? As for how to effectively deploy spindles the answer is generally external drive enclosures. Generally you can put a rack of disks and save a 2-4U for the server.


I said "where it's too big to cache" -- that means too big for RAM, aka >50GB or >200GB depending on what kind of server you have.


SANs are one way,

Agreed, if you include fast interconects like SAS and exclude the network[1] requirement of SANs.

sharding/distribution across multiple nodes another.

I disagree, for the sam reason that doing so with iSCSI over ethernet isn't: too much added latency.

Infiniband may help, but I have yet to try it empirically.

[1] Switching/routing, multiple initiators, distances longer than a few dozen meters.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: