NoSQL Took Away The Relational Model And Gave Nothing Back

raganwald · on Oct 29, 2010

I'll bite as well. Are NoSQL databases faster? Smaller? Simpler? If any of these statements are true, then they are giving something back, although quite possibly that trade-off is not of interest to the author.

I am reminded of another Holy War (beware! reminded-of is not a euphemism for is-just-like): "Ruby took away the power of static typing and gave nothing back."

bmelton · on Oct 29, 2010

Generally, the pros and cons list should be isolated down to a given implementation. While you can vaguely state that NoSQL data stores are fast, and that horizontal scaling is generally easier on NoSQL, that isn't always the case, and doesn't apply to anywhere close to every situation.

MongoDB offers some REALLY cool features that enhance development speed. Inserting into a table that doesn't exist? It creates the table on the fly. Querying from a table that doesn't exist just returns an empty result set (instead of an error, or exception).

If I had to speak in specifics though, I think it's pretty easy to say that NoSQL stores allow for querying very large datasets faster than you could on a relational store on similar hardware.

antirez · on Oct 29, 2010

NoSQL does not mean nothing...

Redis does not use the Relational Model not because it promotes a simple key-value model, but because it proposes a different paradigm and data model based on fundamental data structures.

MongoDB for instance does not took away the relational model at all, in some way. So what NoSQL are we talking about? :)

NoSQL was a good marketing idea at start, but now it's starting to be really misleading.

silentbicycle · on Oct 29, 2010

The term "NoSQL" has always been misleading. The various "NoSQL" systems have very different underlying models, and defining them by their lack of SQL makes discussions about tend to develop an adversarial, "us vs. them" tone. It adds way too much noise.

I don't care about "NoSQL" - I like Postgres and SQLite. What makes the individual "NoSQL" databases interesting, though?

Redis is in a design sweet spot, IMHO. The pure key-value stores seem way too low-level to me, but the support for atomic operations on lists, sets, etc. in Redis is very handy. It also works well as a cache for other databases, and newer commands like BLPOP are very cool.

While I haven't used CouchDB much, its model is also interesting, and I can see its trade-offs being an excellent fit for certain problems (just not mine).

checker · on Oct 29, 2010

Some say NoSQL = "Not Only SQL", implying that there are alternatives to purely relational DB's. I think this definition fits better than the implied purely unrelational DB.

silentbicycle · on Oct 29, 2010

Agreed, but that's just damage control, after a whole bunch of "Death to relational databases!!!" hype. Of course there are alternatives to RDBMSs. What are filesystems if not hierarchial databases?

"NoSQL" is about as useful as rallying behind "languages without camel-case" (NoCC!).

steveklabnik · on Oct 29, 2010

Flamebait title is wrong. From the article:

Update: Benjamin Black said he was the source of the quote and also said I was wrong about what he meant, ill-informed even. His point: The meaning of the statement was that NoSQL systems (really the various map-reduce systems) are lacking a standard model for describing and querying and that developing one should be a high priority task for them.

toddh · on Oct 29, 2010

The title was a quote, so it was not wrong, what he disagreed with was the interpretation.

steveklabnik · on Oct 29, 2010

It's most certainly misleading.

sofuture · on Oct 29, 2010

It's (still) stupid arguing that NoSQL systems are crappy at proving the things RDBMS systems provide. (Because they are, and no one has ever said differently, it's akin to yelling at a Bugatti Veyron (yeah, I went there) for being so bad at flying.)

I'm not using CouchDB because I needed relationships and the RDBMS features of SQL Server but decided at the last minute 'Ehh oh well, let's just try this fancy NoSQL business, I'll rewrite all the RDBMS features into my code'.

Believe it or not I'm using CouchDB because I want to run map/reduce on a bunch of JSON! Astounding, right!?

mjw · on Oct 29, 2010

I see it as a high-level vs low-level thing.

The relational model is a high-level language for describing the structure of data. And a pretty elegant one if you can look past the awkwardness of dealing with current SQL databases (not a fault of the model itself).

That said if you want to scale and tune things you often need to drop down to a lower-level view in order to get things done, hence NoSQL. But that's not to say the relational model isn't still useful, or that it shouldn't inform any formal reasoning done about data in NoSQL systems.

I have some more thoughts on this here: http://matthew.yumptious.com/2009/07/databases/nosql-and-the...

mjw · on Oct 29, 2010

To the downvoter - curious what exactly you disagreed with?

raganwald · on Oct 29, 2010

Up to you, but I'd say let that kind of thing slide. The ideal is that if someone disagrees, they'll explain themselves. Downvoting is supposed to be a mechanism for voting on whether a point adds to the discussion or not.

So either (a) your point did not add to the discussion, in which case further discussion about your point just adds to the noise without contributing signal, or (b) the person or persons downvoting disagree with you but are too lazy to contribute to the discussion by explaining their perspective. In which case, why worry about them?

Responses--whether positive or negative--are discussion gold. Downvotes aren't worth the worry.

dagheti · on Oct 29, 2010

The name relational model is quite unfortunate, as it leads to confusion about what the model is (not much to do with 'relationships').

The key concepts are using sets to store data (making you think about your data in a way where duplication of tuples and orderings don't mean anything), and always using values instead of pointers to store your data and references.

I think that these concepts are very useful when it comes to managing data, but they seem to get lost in the all the talk about "relationships" and "sql" and "structured tables".

wgj · on Oct 29, 2010

I know Ben Black and I can assure you he is very experienced with NoSQL data stores. He is calling for more attention to the query interface to these various data stores. He's specifically not saying they should be relational or dismissing other merits of NoSQL.

8ren · on Oct 30, 2010

"A relational model of data for large shared data banks" http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110...

He gives criticism of the then existing databases (hierarchical and network), then proposes the relational model and the concept of a query language (though not an actual languages - this predates SQL).

Interestingly, one of the hierarchical databases (IMS) is still going strong: http://en.wikipedia.org/wiki/Information_Management_System#H...

anaisbetts · on Oct 29, 2010

I would disagree - I'd say that NoSQL is an exploration into disassembling the component parts of a RDBMS in order to get performance gains in exchange for losing features that aren't needed for your particular engineering problem.

random42 · on Oct 29, 2010

I dont get the big hoopla about NoSQL.

Its a different way of doing things than traditional RDBMS (which very successfully served us well for over 20 years). Why cant we just choose the best tool for the job, after looking at pros and cons?

j_baker · on Oct 30, 2010

"The meaning of the statement was that NoSQL systems (really the various map-reduce systems) are lacking a standard model for describing and querying and that developing one should be a high priority task for them."

I don't see why the relational model (or some subset thereof) won't work for most NoSQL databases.

Key-Value store? It's essentially just a relational database where tables can have one primary key and one other column.

Document store? Essentially just a dynamically typed relational database.

Graph database? You probably would have to add to the relational model a bit, but Codd did write a bit about recursive queries.

knodi · on Oct 29, 2010

I can't talk about all NoSQL but I can talk about CouchDB as I have worked with it some. It gives multiple write nodes, easy replications and syncing, it gave map reduce. I love it

nailer · on Oct 29, 2010

'With NoSQL all relationships have been pushed back onto the poor programmer to implement in code rather than the database managing it. '

OK I'll bite.

Yes. I, the poor programmer, want to implement relationships in my code, rather than adding a secondary language to my application to manage relationships (the database itself never 'managed this' for me, I had to do it manually).

gaius · on Oct 29, 2010

Welcome to MySQL in 1995. You'll be wanting to "manage" transactions next...

jbooth · on Oct 29, 2010

Yeah, I do actually, because I want a storage system to be a storage system and an application layer to be an application layer.

I've seen corporate code bases consisting of entirely PL/SQL before, it's a nightmare. No thanks, from both a maintenance perspective and a performance perspective. YMMV.

rxin · on Oct 29, 2010

Why don't you get rid of the fault-tolerance part in a filesystem system (remove recovery, remove concurrency control). You are not too far away from interacting directly with harddrives' firmwares.

SQL (and RDBMS) is good for three things:

- The relational model can model the nature of our data well.

- Transaction control, recovery, etc are notoriously hard to implement. Most people (I bet 99.99% of the programmers) would fail to design these systems. RDBMS takes care of these for you.

- Physical data independence: The declarative nature of SQL enables you to specify what data you want, rather than how you want them. The optimizer picks a logical and physical execution plan (that doesn't suck) for you. The same query from the 70s running on IBM mainframes can still be running today on say 1000 Amazon EC2 nodes, although the underlying computer and storage architecture is poles apart. (There are political reasons why this is not as true as it should be, but that's another story.)

Now of course, the features are double-edged swords. Some engineers dislike RDBMSes because they don't understand what is going on in the system. If you are Google, you have the engineering power and the business needs to design systems that precisely tailor to the nature of your data.

jbooth · on Oct 29, 2010

Well, there's 2 questions here (posting again to give whatever horribly lacking-in-perspective fanboys are out there another chance to downmod a discussion of frickin storage layers):

1) Are RDBMS useful at all, ever? Sure. If I don't have an extreme scaling need, I'm the first one to say throw MySQL on that box and call it a day, it's tested, it works, gives you a bunch of functionality for 5 minutes of installation, developers are used to it, easy.

2) Should you be putting application logic in the database? I argue no. I understand that this isn't unanimous, but you're sacrificing long term flexibility and scalability for some short-term convienence that isn't even that convenient. I mean, PL/SQL sucks. I'm not saying you shouldn't ever do a join in the database, or shouldn't ever normalize your data, but if you're writing a lot of PL/SQL code, it's probably gonna come back to bite you if you ever need to scale or change architectures.

RE: that query from the 70s -- that query from the 70s that will actually run across different implementations contains exactly the subset of SQL that I consider acceptable and worth using (minus limit and a couple other small fry, I guess). If it's more complicated than that, it should be app code IMO.

ora600 · on Oct 29, 2010

If you write your code in Java (or Ruby or Python or C++) it will come back and byte if you even need to scale or change architectures.

Actually, if you have to change architectures, you are probably bitten anyway.

If you need performance: 1. Do as much as you can in one SQL statement (unless you are using MySQL where the optimizer sucks) 2. If you can't do it in one SQL statement, use PL/SQL to do data processing as close as possible to the data - this will save on network load and PL/SQL is highly optimized to reduce overhead for embedded SQL statements.

With all the talk about "horizontal scalability" we sometimes forget the benefits of writing fast code in first place.

jbooth · on Oct 29, 2010

That network load canard is a little dated. Gigabit ethernet gets you 125MB/s, a couple bonded NICs will get you 250MB/s. That's almost definitely way more bandwidth than your database can push by doing a bunch of random reads (500 IOPS/sec * 4kb = not very much), unless you've got a really expensive SAN (in which case buy a $500 10GBE card for you DB machine), so reducing network load from the DB is sort of a nonfactor. Reducing round trips still makes sense, but if your DB is CPU and/or disk saturated, it doesn't actually save you anything.

Performance really comes down to what work are you doing, where does it take place, and how many disks and CPUs can you get involved to spread the load. For small amounts of load, you could say there's some gain from pushing more work in the database but it doesn't matter anyways. Large load, you're making the DB a bottleneck.

prodigal_erik · on Oct 29, 2010

With logic in the apps, every time you change the business rules, you have to somehow stop everyone from running any app that has the old logic until you have ported the new logic to whatever language it's in. Logic in the database always works and can't be accidentally (or deliberately) bypassed.

gaius · on Oct 29, 2010

Yep, nothing says performance and reliability like doing everything in the app.

Here's a neat trick - for every query SELECT * FROM TABLE, then filter and sort it in the application too.

jbooth · on Oct 29, 2010

Here's a better trick - treat a database as an indexed storage layer that's doing the minimum of CPU work or complicated transactional conditions that create lots of blocking and bottleneck conditions for threads on your monolithic back end.

And then implement the app logic in some slow-ass web language that doesn't hold a candle to your database's optimized C. What happens when your web server can't keep up? Buy another web server. That's an easily solvable problem. A saturated DB, on the other hand, is very hard to fix, especially if all of your application logic is running in it and would require a rewrite to scale.

gaius · on Oct 29, 2010

There's an awful lot more to computing than websites.

FWIW I work on a "non-scalable" RDBMS that comfortably handles thousands of commits/sec, tens of thousands of selects/sec and tens of terabytes of data. All done with stored procs, one database, no need for "sharding" either. Will it scale indefinitely? No, but neither will anything else. The limits of the RDBMS are orders of magnitude greater than the NoSQL crowd think they are.

_3u10 · on Oct 29, 2010

Excellent response, it's amazing how many people have never seen a SQL server with 200 disks attached to it, and we're not even talking about SANs. A single quad core chip can easily drive 200 disks. All people see is a $200,000 price tag for a single system and ignore the fact that it will outperform 100 $2000 servers. AMD64 can scale so far these days there is little point to NoSQL for all but the highest end of web-scale. I bet if you look into the core of Google there are still mysql systems doing a lot of grunt work.

jbooth · on Oct 29, 2010

Facebook still has MySQL doing a bunch of grunt work, but they're basically treating it like a NoSQL store - sharded key-value storage and that's it. They're moving to HBase now.

And, yeah, damn right I'm hesitant about a single piece of hardware that costs $200k and still leaves you with a SPOF. Effectively you're saying it's a 400k piece of hardware after replication. Oh, and Oracle replication software costs you an extra several hundred thousand. So for about a million dollars, yeah, unless it's incredibly transactional in nature, I'd give other options a very serious look.

_3u10 · on Oct 30, 2010

It costs $5k for SQL Server with replication, you can use SQL Express for the monitor so all you're looking at is the 5K license you'd need for SQL Server Standard, if you need Enterprise it's only 20K. This are all retail prices which no one actually pays.

jbooth · on Oct 30, 2010

Ok, and if you have 50GB of data and have made a determination that SQL Server has some features that you're willing to pay 5k for, fine.

I still think embedding all your app logic together with your underlying data in PL/SQL or any equivalents is a bad idea but hey that's for anyone to do and find out themselves :)

mmt · on Oct 29, 2010

The limits of the RDBMS are orders of magnitude greater than the NoSQL crowd think they are.

Agreed. There's a couple of orders of magnitude just from I/O improvement in hardware, using conventional[1] hardware, configured intelligently.

There's at least another order of magnitude in optimizations that aren't possible in NoSQL's strawman[2], such as separate tablespaces for data and index, partial indexes, and a galaxy of query tuning from an EXPLAIN that actuall provides a query plan.

[1] Meaning commodity-priced, nothing fancier than $400 RAID cards and spinning disks.

[2] MySQL

silentbicycle · on Oct 29, 2010

In discussions about "NoSQL" systems, I've found out that some of the developers complaining about RDBMS performance didn't even know what indexes were.

Usually they learned how to use MySQL from thirdhand PHP & MySQL tutorials off somebody's blog or something, and thought it was representative of all RDBMSs.

Not saying everyone using "NoSQL" is poorly informed, just that sometimes peoples' impressions of performance aren't very accurate. It makes me suspicious when somebody's benchmark only uses MySQL.

SkyMarshal · on Oct 29, 2010

I've slowly come to the same opinion over the past few years. I'm just going to leave this here for posterity:

http://www.dbdebunk.com/

jbooth · on Oct 29, 2010

There is nothing approaching even a single order of magnitude in improvement in random I/O with conventional HDDs over the last, say, 20 years. Not even a 2X improvement. You're under 500 IOPS per HDD, period, use them wisely.

Moore's law doesn't apply to RPMs of spinning disks.

_3u10 · on Oct 29, 2010

Yes, but a battery backed write cache will turn random IO into sequential IO eliminating this '500 IOPS' barrier. Put 512 MB of BBWC into a system and see how your 'random IO' performs. The whole point is that NO ONE scans their ENTIRE dataset randomly, if you have to scan your entire dataset, just access it sequentially. Plus, nothing in NoSQL solves ANY of the points you are outlining.

jbooth · on Oct 29, 2010

Write cache is useful of course but it doesn't make random reads any faster. If your dataset is too big to cache, disk latency becomes your limiting factor. The question then becomes how to most effectively deploy as many spindles as possible in your solution - SANs are one way, sharding/distribution across multiple nodes another.

_3u10 · on Oct 30, 2010

No, you would never waste write cache on random reads, instead you'd buy more RAM for your server. Why would anyone ever buy a whole new chassis, CPU, drives, etc when all you need is more RAM? As for how to effectively deploy spindles the answer is generally external drive enclosures. Generally you can put a rack of disks and save a 2-4U for the server.

jbooth · on Oct 31, 2010

I said "where it's too big to cache" -- that means too big for RAM, aka >50GB or >200GB depending on what kind of server you have.

mmt · on Oct 31, 2010

SANs are one way,

Agreed, if you include fast interconects like SAS and exclude the network[1] requirement of SANs.

sharding/distribution across multiple nodes another.

I disagree, for the sam reason that doing so with iSCSI over ethernet isn't: too much added latency.

Infiniband may help, but I have yet to try it empirically.

[1] Switching/routing, multiple initiators, distances longer than a few dozen meters.

mmt · on Oct 29, 2010

You're under 500 IOPS per HDD, period

This is only significant if one is limited to a trivial number of spinning disks. 20 years ago, with separate disk controllers, this was the case.

If you run some benchmarks, I expect you'll find that, for random I/O, N disks perform better than N times one of those disks.

SCSI provided (arguably) an order of magnitude for number of disks per system.

Now, SAS provides another. $8k will buy 100 disks (and enclosures, expanders, etc). How many IOPS is that?

ETA: The Fujitsu Eagle (my archetype of 20ish years ago disk technology) had, IIRC, an average access time of 28ms. If its sequential transfer rate was one 60-100th of modern disks, what fraction of a modern disk's 4k IOPS could it do?

jbooth · on Oct 29, 2010

Yes, I agree that the solution is to throw more spindles at the problem.

PL/SQL, though, with global data reach and advanced locking states for every single transaction, make it really hard to move off of a single host. So it's more and more work to get more disks attached to that host, and CPU is a hard upper limit.

SkyMarshal · on Oct 29, 2010

>All done with stored procs

>The limits of the RDBMS are orders of magnitude greater than the NoSQL crowd think they are.

Wish I had more upvotes. Knowing what logic to put in stored procs, and what to put in the application code, so as to play to the strengths of the db and the strengths of the appserver, seems to be a lost art these days.

jbooth · on Oct 29, 2010

Coprocessors are in the upcoming releases of both HBase and Cassandra - they'll be back :)

It's not that there's anything fundamentally wrong with the relational model, it's that, as formulated, it's really hard to split up into multiple machines.

jbooth · on Oct 29, 2010

SSDs?

A HDD is limited to well under a thousand reads per second, and there's multiple reads per SQL select, so I'll assume either yes or you're leaving something out. If the tens of thousands of reads are over a 4MB table that stays in memory and the rest of your 10TB are infrequently accessed, congrats, you have a single-node problem. If you actually need to deliver, say, 1,000 queries per second over a 10TB dataset? It's not happening in an RDBMS unless you get a whole bunch of SSDs.

Also, RE: websites, of course, substitute your environment's front end if you'd like.

Locke1689 · on Oct 29, 2010

At Microsoft we were scaling SQL Server up to 10-100k rows per second on OLTP $20k systems with a hot cache and RAIDed HDDs.

In one of the code-word projects I saw an $25k system with an OLTP dataset do 1 billion rows in a second.

Edit: We also had statistics that said that 90% of our customers had less than 100GB of data. 99% had < 1 TB. The vast majority of database users shouldn't even be thinking about looking at non-RDBMS systems.

jbooth · on Oct 29, 2010

I agree with all of that. But the original poster was claiming 10k SELECTs/second on a 10TB data set. That just sounds fishy to me.

If your dataset fits in memory on one node, I'm all about using a database (prefer MySQL personally, might look at postgres now that it finally has replication).

binomial · on Oct 29, 2010

Do you really need SSD's to do a thousand reads per second? I'm assuming this is with a hot cache, and the access pattern on the 10TB of data is pretty heavily skewed towards a much smaller portion of the total dataset. Also, RAID-ed 15K drives, and the nature of the queries matter. We don't really know enough about the whole setup to know what's possible.

ora600 · on Oct 29, 2010

Its year 2010, we've invented RAID a while back. Stripe and mirror everything and you get decent performance without SSD.

chrisbolt · on Oct 30, 2010

It depends on how much performance you need vs how much capacity you need. Since SSDs can do more I/O in a single drive, even if that drive costs twice as much it will still make more sense than a 4 drive RAID array, for both price and power.

ora600 · on Oct 30, 2010

Ah, I love SSDs. I was just pointing out that they are not the only game in town.

jbooth · on Oct 29, 2010

Ok, you're up to 3k random reads per second at the absolute peak. Realistically more like 2k for 6x10k disks at RAID 0, and it's awfully hard to even fit 10TB on a RAID1+0 setup. What's that, 10x2TB? Good luck with the latencies on those 2TB disks.

Still doesn't add up to 10k select statements per second over a 10TB dataset on a singlenode. Even without writes, that's not happening. I call BS on grandparent post.

CrLf · on Oct 29, 2010

That shows a lack of understanding of what kind of hardware is being used in the real world to handle this.

With a half-decent SAN with 15k drives and 4Gbit fibrechannel connections, you can get 1000+ IOPS without the storage system even breaking a sweat. Under load it can easily give 10 times that.

This is something that's everywhere in the business world.

Pair this with a bunch of cores and a few GB of memory, and you can have an RDBMS that chews through impressive amounts of data. Unless, of course, you optimize nothing and swamp it with lame queries that do nothing that table scans. Funny enough, the same people that are fine with doing everything in code are the ones that can't be bothered to think more than one second about what kind of queries they are throwing at the database.

_3u10 · on Oct 29, 2010

No kidding, it's like a battery backed write cache doesn't even exist in the NoSQL world. I was able to easily drive 200MB/sec of random IO on 25 15K drives.

_3u10 · on Oct 30, 2010

Btw, this was 200MB/sec of random writes. I didn't even bother with random writes. I could have gotten the writes to be basically sequential if I had bothered to write a COMB style UUID generator. I happen to be a fan of UUIDs for the surrogate keys as it makes database merging so much easier.

jbooth · on Oct 29, 2010

Well, yeah, you can get some decent IOPS for 100k, but 20 servers with 4 spindles a piece are still more.

CrLf · on Oct 30, 2010

You will also consume more power, and have 20 servers to manage instead of one, and you have to customize your data to be distributed between those 20 servers, hampering the possibility of querying that data for patterns that enable you to optimize your business.

gaius · on Oct 29, 2010

6 disks? An EMC array can throw 20x as many physical disks at this sort of problem. An Exadata can compile SQL down to microcode and execute it on the storage, like a graphics card doing matrix operations on dedicated hardware.

Again, as I say, the NoSQL crowd have no idea about what the state of the art is in the RDBMS world.

it's awfully hard to even fit 10TB on a RAID1+0 setup

It would actually be hard for me to buy an array that small...

jbooth · on Oct 29, 2010

What's the cost on that EMC array?

How many commodity servers could I buy for that?

I have a pretty solid idea what state of the art is in the RDBMS world - it's diminishing returns as a machine that's twice as powerful costs 10X as much, all the way up the enterprise ladder. It's spending 100k on your software licenses, 100k on your storage and 500 bucks on a CPU.

Not that there's anything wrong with that. It's ok. If your domain is highly transactional, it's probably a better move than implementing your own transactions over something else. Just don't pretend that your limitations are actually strengths -- you have your own strengths.

gaius · on Oct 29, 2010

It doesn't matter. You see, in business, there is no "cheap" or "expensive". There's worth the money, or not. It doesn't matter how many commodity servers I could buy for the cost; no matter how cheap they are, the money would be wasted if that simply the wrong technical approach.

Because you can't compete at this level by chucking increasing amounts of anything at the problem - people, dollars, spindles, nodes, you name it.

jbooth · on Oct 29, 2010

You see, in business, everything is about cheap or expensive. It's just a more broad definition that includes developer time and ROI.

If your problem is extremely transactional and legitimately unshardable, feel free to drop 6 mil on exadata. Or a half a mil on a database server and backup. But frankly, your objections are starting to have a religious feel to them. All I was saying is that PL/SQL is a pile of crap to code in and fundamentally unscalable without spending a boatload of money. A little better design can get the same thing with a lot less cash.

EDIT: No, those are facts, PL/SQL looks like it was designed in 1965 and, yes, putting all of your CPU processing into a single node is fundamentally unscalable. I've seen it. It was fundamentally unscalable.

I'm not making a religious point about RDBMS - it can be the best model in many situations. I'm making a point about single bottlenecks for your architecture.

gaius · on Oct 29, 2010

"pile of crap" and "fundamentally unscalable" and I'm the religious one o_0

ora600 · on Oct 30, 2010

BTW. All architectures have a single bottleneck. Thats pretty much by definition.

Oracle tried to market their Exalogic as "no bottlenecks" which is nearly as funny as "unbreakable linux" and "zero latency".

cosmicray · on Oct 29, 2010

You buy the EMC unit, because you want the EMC tech to call you and say "we see you have a failing drive, I'm en-route, and I'll be there in 15 minutes with a replacement". Even if you are not paying attention, the unit called in and told the control center it needed attention.

sp4rki · on Oct 29, 2010

I don't get why you were downvoted, as this is so true. I'd rather throw hardware at a problem, than having to get developers to work on a rewrite on the database level. It's cheaper, easier, and totally not risky in comparison.

I've always seen databases as an index storage layer and thought that "outsourcing" the apps job to the database while convenient in a lot of cases, are hell to maintain at scale.

ora600 · on Oct 29, 2010

You are paying developers to rewrite features that already exist in the database and you think this is "throwing hardware at the problem"?

sp4rki · on Oct 30, 2010

No you got what I said wrong. I'd rather pay developers to keep the logic in the application and throw more hardware when I need more webservers, than pay developers to develop a 'database logic' solution and having to get them to rewrite for efficiency when the database starts getting blocked and saturated. It's always a good idea to plan for growth, as long as it doesn't mean delaying the app too much.

So yes I rather keep my databases light and get the app to deal with the processing as much as possible, throwing a new webserver at it when the growth requires it.

jbooth · on Oct 29, 2010

Snurf blerg schlep albargady!

^^

That's me deliberately misinterpreting your comment, beating the straw man with a killer counter-argument and declaring victory.

jbooth · on Oct 29, 2010

Hey, you got downvoted too. Gave you a bump.

Dunno why everyone's taking this so personally. It's like they're personally threatened and their approach is to get angry rather than try and understand.

gaius · on Oct 29, 2010

If I had to guess, I'd say that people are downvoting you because even the MySQL guys who spent the 90s arguing against it have now come around to thinking that foreign keys and transactions and stored procs are good things, and they can't be bothered to engage in the discussion again.

jbooth · on Oct 29, 2010

My beef was PL/SQL and too much of the transactions/storedprocs thing. Foreign keys can be a good thing.

sp4rki · on Oct 30, 2010

^^ I don't get it either. If someone doesn't agree they might as well say so and give reasons to get a discussion going. Instead we get this "Ohh the article said X so you're wrong, I'll downvote and move along" attitude.

I guess people don't appreciate debating anymore.

nailer · on Oct 30, 2010

Re: performance and reliability.

Mongo has some pretty good performance characteristics, according to people that have very, very large datasets being updated at a massive rate. As for reliability, I personally don't really see to much difference in the reliability of Python logic vs SQL logic. It's like you're comparing the reliability of English vs Spanish.

"> Here's a neat trick - for every query SELECT * FROM TABLE, then filter and sort it in the application too."

Huh? Mongo (my personal database of choice) doesn't do that. You specify what you want when you run find(). No other database does either.

Again, if you have a point, make it. The last three posts you're written have been shoving words in to other peoples mouths, sarcasm and statements without any actual backup or citations.

brunoc · on Oct 29, 2010

But isn't the nightmare a human, collaboration problem rather than a technical one? Assuming that the purpose of these stored procedures is to ensure data integrity at a lower level of abstraction and that you have appropriate staff to maintain that layer, I don't see where the nightmare part comes in.

In my experience, and perhaps with my bias as someone who's very comfortable with (relational) databases, I have seen plenty of nightmarish scenarios involving, say, Hibernate.

benmccann · on Oct 29, 2010

PL/SQL is an awful language to work with. What's up with having to declare variables before using them? Does it not even have multi-pass compilation?! And it works only with Oracle, which causes lock-in I'd rather avoid.

I'd rather create a service layer on top of my DB and access it through that. Then I have more control over logging, managing access, pubsub, etc.

gaius · on Oct 29, 2010

It makes me laugh, people who won't write a single keyword to specify the type of a variable, but will write a hundred lines of unit tests to make sure it's always what you want.

nailer · on Oct 29, 2010

Dude. Stop it. He didn't mention unit tests. If you have a point, stop making stuff up about the posters you're responding to.

gaius · on Oct 30, 2010

Oh I'm sorry, didn't know we were playing word games. The above poster complained about having to declare the types of his variables as if that was a bad thing. I simply pointed why it was in fact a good thing - you don't now need to do tests to ensure that your function that expects an INT always gets an INT, the type system takes care of it for you.

Discussions at this level presuppose a certain amount of background knowledge. Do try to keep up.

nailer · on Oct 30, 2010

You mean as opposed to creating tests for storage failures when my strongly typed DB complains?

things need to be tested, strongly typed DB or not.

Please quit it with the Rude manner. It's against HN guidelines and makes your arguments weaker to boot. Or leave and go somewhere else.

konad · on Oct 29, 2010

Declaring variables isn't hard. However, in Postgresql one can use many languages to write stored procs. I prefer plpgsl but TCL, Python and Perl are also available in the stock distribution.

jbooth · on Oct 29, 2010

"isn't hard"? Yeah, writing it in Algol isn't "hard" either, but why? If something's harder to use and less performant, why would it even be on my list of things to maybe consider?

konad · on Nov 1, 2010

C coders manage easily enough.

jbooth · on Oct 29, 2010

The maintenance issue is tractable, I'm just saying that PL/SQL doesn't make it easier and probably makes it worse by being hard to unit test, refactor, etc (although this is very subjective and depends on the strengths of your team).

The scalability problem? Good luck getting 90%+ of your processing out of the database. Rewrite from scratch?

nailer · on Oct 29, 2010

I never said I required, or tried to implement transactions. In fact I can tell you know I'm pretty sure a CMS and push based communications app are running quite fine without transactions. What I do have is:

* One language which handles both app logic and relationships, as mentioned above.

* A fidelity of types between storage and application (I'm using Mongo, so documents are stored as documents, not fields and rows)

* Apparently, some speed benefits should I need it in future, at the expense of losing the aforementioned features I don't need.

Please don't put words into my mouth:

* It's not nice.

* It's against the HN guidelines

* If your argument was of substance you wouldn't need to do it.

It saddens me this kind of nastiness gets upvotes.

philjackson · on Oct 29, 2010

Taking away the relational model was what they gave us.