Scalable SQL: How do large-scale sites and applications remain SQL-based?

vyrotek · on April 22, 2011

SQL Azure is working on something called Federations which I think looks awesome. Basically the idea is you can shard your data across multiple DBs based on a column value but your code doesn't have to know what's going on. You still have a single connection string and send one query. SQL Server then figures out which federation has the data you're looking for.

Perhaps other DBs have similar systems, I'm not sure. But this is definitely going to change that way we scale our SQL at our company.

http://blogs.msdn.com/b/cbiyikoglu/archive/2010/10/30/buildi...

icey · on April 22, 2011

SQL Server Enterprise edition has this feature... but the licensing on it is pretty painful (Around $30k a processor, last time I checked).

_3u10 · on April 22, 2011

A $30K license is only a problem if you're giving your services away for free. With a quad or 6-core processor (remember MS charges per socket not per core) you can power about 500-750 spindles. You're looking at $200-250K to build a single socket system that can be fully utilized. (eg. 128-256GB of RAM, 3 or 4 controllers, and 500-750 disks) The problem is that people stupidly put SQL EE on a commodity box with 8 GB of ram and 2 spindles and then complain that the software is 5 times as expensive as the box.

Also, $30K is the retail price for the license, no one actually pays that, and if you can write a little bit of code you can easily use a SQL Std license for $2-5K.

rbanffy · on April 23, 2011

As much as I dislike Microsoft, you have a point. When compared to serious RDBMS hardware costs, 30K is not much. And you won't go much further on an SQL database than SQL Server or Oracle can take you.

If you plan to go further than that, you should have an exit NoSQL strategy ready.

rbanffy · on April 23, 2011

Just as a sidenote, when making the plan, keep tabs on the downtime needed to migrate your data and on ways to migrate users to the new system in batches. When you get too big for SQL you are also probably too big for big-bang migrations.

riffraff · on April 22, 2011

i believe dbshards is a similar solution http://www.dbshards.com/dbshards/

zwischenzug · on April 22, 2011

I can't say too much but I work for a company that supplies software to businesses that need to take thousands of transactions a minute on data that's got to be current with real-time and frequent price changes streaming data to 10s of thousands of people and third parties that are themselves big companies.

Downtime is not tolerated - I've been surgically torn apart by boards of major listed corporations over small amounts of downtime (it was something of a pleasure to be so professionally abused, strangely).

For all this we have found the best solution to use a monolithic DB with SAN-based replication, C libraries to connect to the SQL db, and a process-based custom-built application server with an embedded (unfashionable) scripting language for ease.

None of these things are fashionable, but no-one's beaten our performance.

The trouble (from my perspective) with all these social sites is that downtime is tolerated (Facebook is always screwing up; I consider them a partial joke technically) and the data is easily distributed (think Google) because the demands on consistency are low.

I'm sure our time will be up one day, but you go trendy at your peril. I'm an advocate of functional programming languages, but it's not realistic to get hundreds of these guys to switch seamlessly and reliably to a new paradigm without serious pain. Ditto NoSQL.

tom_b · on April 22, 2011

Sounds HFT to me. Plus functional programming. KDB like?

Quit teasing Out with it. Sounds like fun with lots of real hackability and responsibility built in. Would love to hear more of what you can share.

You mention a SQL db - have you guys put something together of your own or are you using a pre-rolled db?

zwischenzug · on April 22, 2011

We tried FP but it didn't take with the more average dev in our number.

It is fun, can be a lot of pressure, and the sheer quantity of issues and the speed with which they must be dispatched means I always have to learn and I often screw up analysis. Humility and the ability to prod people who know stuff and grok what they say quickly is essential.

We use a major DB (not Oracle). Again, think unfashionable, think speedy.

Sorry, don't mean to tease - I just don't want people to be able to track me down.

fendale · on April 22, 2011

Well, I can't imagine SQL Server handling your requirements, and I find it unlikely a mission critical system like this will use MySQL or Postgres (primarily because Enterprises like to pay a lot for that 'security' of support), so I'd shoot for Sybase if its not Oracle?

The reason I ask is because I am a pretty highly skilled Oracle guy, and some of the numbers I have seen around financial systems just seem so fast and big volume transactions per second. Maybe I could get Oracle to do them, but then the zero downtime scares me somewhat as until very recently, keeping Oracle up all the time, even through application upgrades is fairly tricky!

zwischenzug · on April 22, 2011

Actually, the db we use had similar problems. Schema changes require at least _some_ downtime, if only to lock the tables. The trouble is less the number of transactions than the linkage between each transaction and other transactions' liabilities in the system. In effect, serious DB's are in effect in-memory updates (some of our bigger customers run with no logical log write to disk on commit (really!)) running on 128-core Sun boxes (principal and failover). You can get through a lot of tps and the saving on dev cost and maintenance through this monolithic simplicity is great.

adw · on April 23, 2011

If it's financial sector (and it likely is), IBM's DB2 has a large userbase.

zwischenzug · on April 24, 2011

Not fast enough (I hear).

adw · on April 25, 2011

Makes sense. Thinking about it, most of the DB2 usage I've heard of is behind ATM and core accounting systems in retail banking rather than trading applications/low-latency...

fendale · on April 22, 2011

Can you say what DB you are running? Is it Oracle?

tom_b · on April 22, 2011

Not unless they know some Oracle incantations I've not heard of.

I'm going to guess some home-rolled column store similar to KDB.

[EDIT] re: Oracle stuff I've not heard of - there is tons of Oracle scale/performance stuff I've never even had to think about. I've worked somewhere with KDB+ in place to handle streaming market price data and interviewed at another investment bank where it was used as well. So my guess is based more on what I've seen and read rather than knowing you couldn't build such a system on top of Oracle.

fendale · on April 22, 2011

I've seen some Oracle systems push some big data around and handle pretty high trans per second, but I have never worked on any of these financial systems which seem to require more speed and trans per second than any system I have ever built or worked on.

Personally I would love to see if Oracle can match these other systems out there, purely from an interest point of view!

zwischenzug · on April 22, 2011

Customers pressure us to use Oracle but we resist. I'm sure you can make it fast. Our argument is: why switch from what works; then it's "want to pay for it and be the first to use it"?

balakk · on April 23, 2011

Probably Sybase ASE with Tcl as the scripting language. I've heard good things about Sybase replication capabilities.

zwischenzug · on April 26, 2011

How did you get tcl :-0?

No-one's even mentioned the DB, and it's a major DB system owned by a three-letter company...

fendale · on May 3, 2011

Surely it's not SAP (maxDB)? Never heard of it before I did some googling. Failing that the only DBs I can think of are Oracle, DB2, SQLServer and Sybase before going to the open source ones.

zwischenzug · on May 4, 2011

Don't think anyone's reading this anymore, so I'm safe: it's Informix.

jhherren · on April 22, 2011

Most web sites can grow significantly without having to do data partitioning. Simple replication should be considered as a first option, especially if your read/write ratio is high. Also, you don't have to implement a lot of extra logic in your application; just write to your master and read from the slaves.

zwischenzug · on April 22, 2011

Damn straight. And cache appropriately (and dumbly).

benblack · on April 22, 2011

Oddly similar to my article in GigaOm last year. http://gigaom.com/cloud/nosql-is-for-the-birds/

parasubvert · on April 23, 2011

I think a lot of the scalability arguments between SQL and NoSQL has little to do with relational databases as a technology and a lot to do with "how to scale without paying for expensive software licenses". They miss that some of the largest commercial sites use commercial databases.

Oracle, for example, to my understanding, is the standard database in use at Amazon.com for their services; there are alternatives for key services, but Oracle is pervasive (at least as of 2009, when I last had insight into this).

Another interesting rumour I recently read (ymmv, of course): Apple runs the iTunes Music Store on Oracle's Exadata. Source is from a fairly trusted source in the Oracle DBA community (Daniel Morgan): http://forums.oracle.com/forums/thread.jspa?messageID=425520...

Clearly there are limitations to a shared-disk RDBMS or even a shared-nothing RDBMS, but the tech sphere somehow seems to be stuck looking only at the open source implementations of these things as the only "reality". For a startup, that's likely true, but for a larger company, there's a very different risk profile when evaluating a SQL vs. a NoSQL solution.

yread · on April 22, 2011

Relax consistency rules and don't use the DB for everything, flat text files are sometimes enough

seiji · on April 22, 2011

Architecture of a site you probably never use: 60 mysql servers (30 master/master pairs) ranging from 2 core machines with 8 GB RAM to 8 core machines with 64 GB RAM and RamSan PCI SSDs.

Some tables are sharded, others aren't. Some tables are imported into hadoop for hive queries, others aren't. Backups are run using a fragile program called zrm and an army of outsourced DBAs to manage it.

The army of outsourced DBAs also deal with replication failures, figuring out why inexplicable deadlocks happen, tuning every inane innodb configurable, and "rebuilding" servers when mysql decides to slow itself down and yearns for a complete dump/reload of all tables.

Friends don't let friends use mysql.

rudiger · on April 22, 2011

Interesting. I've never seen large-scale database deployments use SSDs (neither for the transaction log, nor for the data); I've only seen battery-backup and write-back caches on hard disk drives used.

jpitz · on April 23, 2011

You will probably start to. The intel 320 series is getting some attention now [1] with a combination of competitive ( for SSD ) price and enough capacitors on board to flush the write cache on a power failure.

You'd likely use SSDs first for indexes, then data. Transaction logs are usually a sequential workload and 7200 rpm drives often suffice.

[1] http://archives.postgresql.org/pgsql-performance/2011-03/msg...

zwischenzug · on April 22, 2011

Nor have I. But it may be the future: cf rethinkdb.

mscarborough · on April 23, 2011

I was surprised to read through this great description about SQL systems at large scale, and not see a single mention of cache. I searched for 'cache' on the page to check I didn't miss something.

SQL servers at this scale are so reliant on Memcache and other related solutions that it seems like a large omission.

Getahobby · on April 22, 2011

I have heard multiple Facebook engineering discussions centered around moving things out of mySQL and into mongo, Cassandra and hadoop. I think this article over emphasizes the Facebook and mySQL connection.

On an unrelated note, does ACM really still use Cold Fusion? And people like to knock PHP.

grannyg00se · on April 22, 2011

Note that this article has nothing to do with SQL vs non SQL systems. It's about relational vs non relational databases.

delinka · on April 22, 2011

Scalable RDBMS is not cheap. Sure, you can use a MySQL or a PostgreSQL server that you didn't need to purchase a license for[1], but the design and implementation for Doing It Right take care, attention and time-- expenses that are not necessarily monetary. If you've built a fully scalable RDBMS-based web application, you won't be posting Show HN: My Weekend Project...

Until someone builds a cloud hosting platform with RDBMS that can do the necessary magic bits, that is.

[1] Yeah, yeah ... MySQL and their commercial licensing BS. It's been awhile but if memory serves and things haven't changed, you can't host a commercial web site backed my MySQL without A) releasing all your source or B) buying a license. IANAL.

mdasen · on April 22, 2011

One can run a commercial website backed by MySQL using the GPL license. The GPL only requires you to release your code if they link off each other and you're distributing the application. Web sites don't usually have their code distributed. Even if you are distributing the code, it's very likely that you don't need to distribute it as GPL, but that's a longer discussion. Suffice it to say, you don't need a commercial MySQL license to run a website.

There's no weekend project that's going to be "fully scalable". Scaling isn't just grabbing a NoSQL backend and declaring victory. A lot of scaling is figuring out how you're going to use your data and creating data structures that allow for fast, efficient, and scalable access. Many NoSQL data stores force one to think about how you're going to access the data more than SQL databases since the NoSQL stores usually don't support things like joins. One still has to do a lot of work to design to implement it right.

michaelchisari · on April 22, 2011

the design and implementation for Doing It Right take care, attention and time

Doesn't NoSQL require the same, only it shifts from being the job of a db engineer/sysadmin to that of the developers?

jarin · on April 23, 2011

That's essentially the crux of NoSQL. When your company consists of 90% developers and you're moving at breakneck speed, who wants to wait for Bob the Cranky DB Consultant? Just add another node to the cluster and let the auto-sharding figure it out.

prodigal_erik · on April 23, 2011

Auto-sharding is when you have a RDBMS that sees "SELECT * FROM t1 JOIN t2 ON t1.c1 = t2.c2" and knows how to choose a plan which minimizes data going back and forth to make that happen. NoSQL instead makes you plow through n^2 records doing every join the hard way in the app, which is about as far from automatic as you can get.

The problem is that the people who actually handle this correctly charge so much that I've never even seen their products in live use. Meanwhile free SQL databases are still really bad at it. But choosing an API that deliberately restricts you to only the worst possible query plans is not going to be part of the solution.