Here's a better trick - treat a database as an indexed storage layer that's doing the minimum of CPU work or complicated transactional conditions that create lots of blocking and bottleneck conditions for threads on your monolithic back end.
And then implement the app logic in some slow-ass web language that doesn't hold a candle to your database's optimized C. What happens when your web server can't keep up? Buy another web server. That's an easily solvable problem. A saturated DB, on the other hand, is very hard to fix, especially if all of your application logic is running in it and would require a rewrite to scale.
There's an awful lot more to computing than websites.
FWIW I work on a "non-scalable" RDBMS that comfortably handles thousands of commits/sec, tens of thousands of selects/sec and tens of terabytes of data. All done with stored procs, one database, no need for "sharding" either. Will it scale indefinitely? No, but neither will anything else. The limits of the RDBMS are orders of magnitude greater than the NoSQL crowd think they are.
Excellent response, it's amazing how many people have never seen a SQL server with 200 disks attached to it, and we're not even talking about SANs. A single quad core chip can easily drive 200 disks. All people see is a $200,000 price tag for a single system and ignore the fact that it will outperform 100 $2000 servers. AMD64 can scale so far these days there is little point to NoSQL for all but the highest end of web-scale. I bet if you look into the core of Google there are still mysql systems doing a lot of grunt work.
Facebook still has MySQL doing a bunch of grunt work, but they're basically treating it like a NoSQL store - sharded key-value storage and that's it. They're moving to HBase now.
And, yeah, damn right I'm hesitant about a single piece of hardware that costs $200k and still leaves you with a SPOF. Effectively you're saying it's a 400k piece of hardware after replication. Oh, and Oracle replication software costs you an extra several hundred thousand. So for about a million dollars, yeah, unless it's incredibly transactional in nature, I'd give other options a very serious look.
It costs $5k for SQL Server with replication, you can use SQL Express for the monitor so all you're looking at is the 5K license you'd need for SQL Server Standard, if you need Enterprise it's only 20K. This are all retail prices which no one actually pays.
Ok, and if you have 50GB of data and have made a determination that SQL Server has some features that you're willing to pay 5k for, fine.
I still think embedding all your app logic together with your underlying data in PL/SQL or any equivalents is a bad idea but hey that's for anyone to do and find out themselves :)
The limits of the RDBMS are orders of magnitude greater than the NoSQL crowd think they are.
Agreed. There's a couple of orders of magnitude just from I/O improvement in hardware, using conventional[1] hardware, configured intelligently.
There's at least another order of magnitude in optimizations that aren't possible in NoSQL's strawman[2], such as separate tablespaces for data and index, partial indexes, and a galaxy of query tuning from an EXPLAIN that actuall provides a query plan.
[1] Meaning commodity-priced, nothing fancier than $400 RAID cards and spinning disks.
In discussions about "NoSQL" systems, I've found out that some of the developers complaining about RDBMS performance didn't even know what indexes were.
Usually they learned how to use MySQL from thirdhand PHP & MySQL tutorials off somebody's blog or something, and thought it was representative of all RDBMSs.
Not saying everyone using "NoSQL" is poorly informed, just that sometimes peoples' impressions of performance aren't very accurate. It makes me suspicious when somebody's benchmark only uses MySQL.
There is nothing approaching even a single order of magnitude in improvement in random I/O with conventional HDDs over the last, say, 20 years. Not even a 2X improvement. You're under 500 IOPS per HDD, period, use them wisely.
Moore's law doesn't apply to RPMs of spinning disks.
Yes, but a battery backed write cache will turn random IO into sequential IO eliminating this '500 IOPS' barrier. Put 512 MB of BBWC into a system and see how your 'random IO' performs. The whole point is that NO ONE scans their ENTIRE dataset randomly, if you have to scan your entire dataset, just access it sequentially. Plus, nothing in NoSQL solves ANY of the points you are outlining.
Write cache is useful of course but it doesn't make random reads any faster. If your dataset is too big to cache, disk latency becomes your limiting factor. The question then becomes how to most effectively deploy as many spindles as possible in your solution - SANs are one way, sharding/distribution across multiple nodes another.
No, you would never waste write cache on random reads, instead you'd buy more RAM for your server. Why would anyone ever buy a whole new chassis, CPU, drives, etc when all you need is more RAM? As for how to effectively deploy spindles the answer is generally external drive enclosures. Generally you can put a rack of disks and save a 2-4U for the server.
This is only significant if one is limited to a trivial number of spinning disks. 20 years ago, with separate disk controllers, this was the case.
If you run some benchmarks, I expect you'll find that, for random I/O, N disks perform better than N times one of those disks.
SCSI provided (arguably) an order of magnitude for number of disks per system.
Now, SAS provides another. $8k will buy 100 disks (and enclosures, expanders, etc). How many IOPS is that?
ETA: The Fujitsu Eagle (my archetype of 20ish years ago disk technology) had, IIRC, an average access time of 28ms. If its sequential transfer rate was one 60-100th of modern disks, what fraction of a modern disk's 4k IOPS could it do?
Yes, I agree that the solution is to throw more spindles at the problem.
PL/SQL, though, with global data reach and advanced locking states for every single transaction, make it really hard to move off of a single host. So it's more and more work to get more disks attached to that host, and CPU is a hard upper limit.
>The limits of the RDBMS are orders of magnitude greater than the NoSQL crowd think they are.
Wish I had more upvotes. Knowing what logic to put in stored procs, and what to put in the application code, so as to play to the strengths of the db and the strengths of the appserver, seems to be a lost art these days.
Coprocessors are in the upcoming releases of both HBase and Cassandra - they'll be back :)
It's not that there's anything fundamentally wrong with the relational model, it's that, as formulated, it's really hard to split up into multiple machines.
A HDD is limited to well under a thousand reads per second, and there's multiple reads per SQL select, so I'll assume either yes or you're leaving something out. If the tens of thousands of reads are over a 4MB table that stays in memory and the rest of your 10TB are infrequently accessed, congrats, you have a single-node problem. If you actually need to deliver, say, 1,000 queries per second over a 10TB dataset? It's not happening in an RDBMS unless you get a whole bunch of SSDs.
Also, RE: websites, of course, substitute your environment's front end if you'd like.
At Microsoft we were scaling SQL Server up to 10-100k rows per second on OLTP $20k systems with a hot cache and RAIDed HDDs.
In one of the code-word projects I saw an $25k system with an OLTP dataset do 1 billion rows in a second.
Edit: We also had statistics that said that 90% of our customers had less than 100GB of data. 99% had < 1 TB. The vast majority of database users shouldn't even be thinking about looking at non-RDBMS systems.
I agree with all of that. But the original poster was claiming 10k SELECTs/second on a 10TB data set. That just sounds fishy to me.
If your dataset fits in memory on one node, I'm all about using a database (prefer MySQL personally, might look at postgres now that it finally has replication).
Do you really need SSD's to do a thousand reads per second? I'm assuming this is with a hot cache, and the access pattern on the 10TB of data is pretty heavily skewed towards a much smaller portion of the total dataset. Also, RAID-ed 15K drives, and the nature of the queries matter. We don't really know enough about the whole setup to know what's possible.
It depends on how much performance you need vs how much capacity you need. Since SSDs can do more I/O in a single drive, even if that drive costs twice as much it will still make more sense than a 4 drive RAID array, for both price and power.
Ok, you're up to 3k random reads per second at the absolute peak. Realistically more like 2k for 6x10k disks at RAID 0, and it's awfully hard to even fit 10TB on a RAID1+0 setup. What's that, 10x2TB? Good luck with the latencies on those 2TB disks.
Still doesn't add up to 10k select statements per second over a 10TB dataset on a singlenode. Even without writes, that's not happening. I call BS on grandparent post.
That shows a lack of understanding of what kind of hardware is being used in the real world to handle this.
With a half-decent SAN with 15k drives and 4Gbit fibrechannel connections, you can get 1000+ IOPS without the storage system even breaking a sweat. Under load it can easily give 10 times that.
This is something that's everywhere in the business world.
Pair this with a bunch of cores and a few GB of memory, and you can have an RDBMS that chews through impressive amounts of data. Unless, of course, you optimize nothing and swamp it with lame queries that do nothing that table scans. Funny enough, the same people that are fine with doing everything in code are the ones that can't be bothered to think more than one second about what kind of queries they are throwing at the database.
No kidding, it's like a battery backed write cache doesn't even exist in the NoSQL world. I was able to easily drive 200MB/sec of random IO on 25 15K drives.
Btw, this was 200MB/sec of random writes. I didn't even bother with random writes. I could have gotten the writes to be basically sequential if I had bothered to write a COMB style UUID generator. I happen to be a fan of UUIDs for the surrogate keys as it makes database merging so much easier.
You will also consume more power, and have 20 servers to manage instead of one, and you have to customize your data to be distributed between those 20 servers, hampering the possibility of querying that data for patterns that enable you to optimize your business.
6 disks? An EMC array can throw 20x as many physical disks at this sort of problem. An Exadata can compile SQL down to microcode and execute it on the storage, like a graphics card doing matrix operations on dedicated hardware.
Again, as I say, the NoSQL crowd have no idea about what the state of the art is in the RDBMS world.
it's awfully hard to even fit 10TB on a RAID1+0 setup
It would actually be hard for me to buy an array that small...
I have a pretty solid idea what state of the art is in the RDBMS world - it's diminishing returns as a machine that's twice as powerful costs 10X as much, all the way up the enterprise ladder. It's spending 100k on your software licenses, 100k on your storage and 500 bucks on a CPU.
Not that there's anything wrong with that. It's ok. If your domain is highly transactional, it's probably a better move than implementing your own transactions over something else. Just don't pretend that your limitations are actually strengths -- you have your own strengths.
It doesn't matter. You see, in business, there is no "cheap" or "expensive". There's worth the money, or not. It doesn't matter how many commodity servers I could buy for the cost; no matter how cheap they are, the money would be wasted if that simply the wrong technical approach.
Because you can't compete at this level by chucking increasing amounts of anything at the problem - people, dollars, spindles, nodes, you name it.
You see, in business, everything is about cheap or expensive. It's just a more broad definition that includes developer time and ROI.
If your problem is extremely transactional and legitimately unshardable, feel free to drop 6 mil on exadata. Or a half a mil on a database server and backup. But frankly, your objections are starting to have a religious feel to them. All I was saying is that PL/SQL is a pile of crap to code in and fundamentally unscalable without spending a boatload of money. A little better design can get the same thing with a lot less cash.
EDIT: No, those are facts, PL/SQL looks like it was designed in 1965 and, yes, putting all of your CPU processing into a single node is fundamentally unscalable. I've seen it. It was fundamentally unscalable.
I'm not making a religious point about RDBMS - it can be the best model in many situations. I'm making a point about single bottlenecks for your architecture.
You buy the EMC unit, because you want the EMC tech to call you and say "we see you have a failing drive, I'm en-route, and I'll be there in 15 minutes with a replacement". Even if you are not paying attention, the unit called in and told the control center it needed attention.
I don't get why you were downvoted, as this is so true. I'd rather throw hardware at a problem, than having to get developers to work on a rewrite on the database level. It's cheaper, easier, and totally not risky in comparison.
I've always seen databases as an index storage layer and thought that "outsourcing" the apps job to the database while convenient in a lot of cases, are hell to maintain at scale.
No you got what I said wrong. I'd rather pay developers to keep the logic in the application and throw more hardware when I need more webservers, than pay developers to develop a 'database logic' solution and having to get them to rewrite for efficiency when the database starts getting blocked and saturated. It's always a good idea to plan for growth, as long as it doesn't mean delaying the app too much.
So yes I rather keep my databases light and get the app to deal with the processing as much as possible, throwing a new webserver at it when the growth requires it.
Dunno why everyone's taking this so personally. It's like they're personally threatened and their approach is to get angry rather than try and understand.
If I had to guess, I'd say that people are downvoting you because even the MySQL guys who spent the 90s arguing against it have now come around to thinking that foreign keys and transactions and stored procs are good things, and they can't be bothered to engage in the discussion again.
^^ I don't get it either. If someone doesn't agree they might as well say so and give reasons to get a discussion going. Instead we get this "Ohh the article said X so you're wrong, I'll downvote and move along" attitude.
And then implement the app logic in some slow-ass web language that doesn't hold a candle to your database's optimized C. What happens when your web server can't keep up? Buy another web server. That's an easily solvable problem. A saturated DB, on the other hand, is very hard to fix, especially if all of your application logic is running in it and would require a rewrite to scale.