Michael Stonebraker: Why Enterprises Are Uninterested in NoSQL?

edw519 · on Sept 30, 2010

Trivial example of what people need to know in enterprises:

“Tell me whether pet rocks are selling better than Barbie dolls in the south?”

What people really need to know:

I already know from existing reporting that 984 orders (18% of our backlog) are already past due. For those 984 orders:

  - How many are for one item and how many are for multiples?
  - Do we own what we owe those customers?
  - If we do own it, is it in the proper warehouse?
  - If it is in the proper warehouse, can we find it?
  - If we can find it, is it undamaged and certified?
  - If it's shippable, do we have enough labor to ship it?
  - If it isn't certified, how soon can QA certify it?
  - If it isn't in the right warehouse, can we move it?
  - If we don't own any, where can we get some?
  - Which vendors have it on the shelf?
  - Which vendors do we have blanket purchase orders with?
  - Which vendors do we have contracts with?
  - Which orders can be split to satisfy a partial?
  - Which orders are for customers already on credit hold?
  - Which customers are threatening not to renew with us?
  
  and (ironically) the most asked question of all:
  
  - Which orders must be shipped to hit our quarterly numbers?

I can go on and on; this is just off the top of my head. We like to pick on enterprises, but this is the shit that really happens all the time. So whenever you get gas in your car, bread on your table, new shoes at the mall, steamed milk in your latte, etc., etc., etc., rest assured that someone, somewhere has asked these questions. Questions that were probably answered using some form of RDBMS, SQL, ACID technology (with really good application software on top of it).

thibaut_barrere · on Sept 30, 2010

SQL is actually quite often a bad way to try to answer those questions, too! See http://philip.greenspun.com/wtr/data-warehousing.html for an entertaining explanation.

I believe MongoDB in particular can be a fairly good solution to build datawarehouses (I'm starting to use it for reporting systems).

One great point about MongoDB is that it makes the ETL process a lot easier (you don't have to prepare tables with the right schema and it supports large amounts of data).

I wouldn't be surprised to see some NoSQL solutions get wider adoption in the enterprise, either alone or with tools that build upon them.

As for the article: it's pure linkbait in my opinion!

jaxn · on Sept 30, 2010

I used to be a Business Intelligence consultant for enterprises. We built reports, data warehouses, dashboards, etc. From my experience, the article is spot on, not linkbait.

Maybe MongoDB is better once you have a well defined query that you need, but I think the point of the parent comment is that those examples of queries are ad-hoc. NoSQL is not as good as SQL when it comes to report specs that are constantly in a state of flux.

I need my data available to answer questions. When building a product you have a well defined set of operations based on the features of your product. When the requirements shift on a regular basis, NoSQL is too limiting.

When the article talks about a low level query language being too limiting, they are talking about missing things like CONNECT BY PRIOR or SUM(CASE IF col IN ('a','b','c') THEN 1 ELSE 0). These are the same kinds of things that are difficult to do with an ORM.

thibaut_barrere · on Oct 1, 2010

I am currently doing reports/datawarehouses/dashboards. When something more complicated that simple questions is needed (see Data Warehousing for Cavemen), ad-hoc queries are quite often not the answer anymore, either with NoSQL or with SQL.

I don't want my clients to be dependent on me (or someone else) to build complicated SQL queries when they have questions, so I focus on getting an easy to maintain facts/dimensions model (as advocated by Ralph Kimball http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimens...) which can evolve if needed.

The nice point about MongoDB when doing this is that it makes it a lot easier to add attributes to dimensions, or load the data, or evolve the reporting system in general (and I like that).

You can apply the same principles to build dimensions/facts based data structure and answer questions that SQL alone wouldn't be able to answer easily.

Example of such question: how many calls did we receive during french legal week #9 that were handled by team X outside the normal working hours or while we were in vacations ? In those calls, how many were issued by a woman (as it has a financial impact in this case) ?

aloneinkyoto · on Sept 30, 2010

Both of those queries are very easy to perform in MongoDB.

For examples of how to easily model and query trees (CONNECT BY PRIOR in SQL) see: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB

SUM(CASE IF col IN ('a','b','c') THEN 1 ELSE 0) can be implemented as a group or mapreduce query.

I would say that you have a lot more power and flexibility in MongoDB compared to an average SQL database when it comes to ad-hoc querying.

jaxn · on Sept 30, 2010

I am not a MongoDB expert by any stretch, so please correct me where I am wrong.

The way I read your link it sounds like I need to store the data in a particular way in order to run a parent/child query. That is great if I know that I need that query at design time. What happens if I have tens of millions of records and need to run that report on an ad-hoc basis? Where the relationship may or may not be important?

What if I want to sum the "cost of goods sold" one day and the "items per transaction" the next? Does that not require someone to write code more complex than SQL? Because on Oracle a business analyst can open up Toad and run that query.

If I am wrong then it very well may be that the problem is one of the enterprise not being aware.

aloneinkyoto · on Oct 1, 2010

Ease of use is subjective. I dont think writing a mapreduce job needs to be more complex than writing an equivalent SQL query. What really matters is elegance, flexibility and power.

My personal experince is that the MongoDB model seems to win in most cases. Especially when it comes to flexibility and ad-hoc querying. Having a real language (javascript) and a flexible schema tend to make most business problems easier to express.

Locke1689 · on Oct 1, 2010

Ease of use is subjective. I dont think writing a mapreduce job needs to be more complex than writing an equivalent SQL query. What really matters is elegance, flexibility and power.

Unfortunately it seems you have completely misunderstood the nature of both SQL and MapReduce. MapReduce is a distributed computation engine. While it can be used in that way it was never meant to be a database system. BigTable is proof enough of that.

In general, SQL is the syntactical representation of relational algebra with some hacky additions for programmer convenience. Comparing just "SQL" to the MongoDB language model is misguided since you then break down to a question of algebraic expressivity and relational power.

I'm not going to try and build a proof here but we do know that a formal relational algebra system is equivalent to first-order logic. As far as MongoDB's relational language goes, one would probably have to make an argument that it is equivalent to either tuple or domain relational calculus, but I know of no theoretical work that has attempted this. If anyone has any more information to the theoretical expressiveness of the MongoDB relational system I would love to read it.

aloneinkyoto · on Oct 1, 2010

I was not arguing about relational algebra or theoretical expressivity or logical equivalence or anything like that. I was simply stating that in practice most business problems are easier to model and more flexible to query in the MongoDB model.

Of course you need some time get used to thinking in terms of documents rather than tables and rows. But once you get used to the idea you can easily model most domains that occur in practice.

> MapReduce is a distributed computation engine. While it can be used in that way it was never meant to be a database system. BigTable is proof enough of that.

Yes, MapReduce in the Google and Hadoop sense is designed for massive batch processing. That's why BigTable and HBase exists. MapReduce in the CouchDB and MongoDB sense is a Turing complete query and processing layer built on top of a column store. In the CouchDB case MapReduce is the only way you can query the database.

http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views http://www.mongodb.org/display/DOCS/MapReduce http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-...

ergo98 · on Oct 1, 2010

>SQL is actually quite often a bad way to try to answer those questions, too! See http://philip.greenspun.com/wtr/data-warehousing.html for an entertaining explanation.

That's why data warehouses rely upon cubes/OLAP for analysis. It is a specialized solution that serves the need very well.

>One great point about MongoDB is that it makes the ETL process a lot easier (you don't have to prepare tables with the right schema and it supports large amounts of data).

So does a CSV. In fact, so does the last silver bullet, which is XML. XML is a loose or as strict as you want it to be.

thibaut_barrere · on Oct 1, 2010

OLAP on SQL comes at a cost, too (which is why some people are diving into analytics and reporting with NoSQL tools, where you can add one server without large expenses).

On CSV/XML, my point wasn't clear enough: I wanted to underline the fact that it's a lot easier to load dimensions data then load facts data and achieve foreign keys lookups when working with MongoDB (it's not about the file format, it's about the loading/lookup part which is a large part of ETL in my cases).

senya72 · on Oct 2, 2010

how about high-perf joins at run-time?

ergo98 · on Oct 1, 2010

For the people who need OLAP analysis, the relevant expense range is seldom that much of a consideration. I'm looking at storage systems right now that costs $800,000. It's considered mid-range and is merely the starter system.

Note that OLAP is, in many regards, NoSQL. It is really the most successful variant of NoSQL.

However I'm very curious what sort of analytics people are doing with NoSQL. I have seen people essentially generating reports to MongoDB, for instance, but I have never seen anything remotely approaching flexible analytics on such a system.

gaius · on Oct 1, 2010

That article is very peculiar

A data warehouse is a separate RDBMS installation that contains copies of data from on-line systems. A data warehouse would not be necessary if RDBMS software worked as advertised. It is merely a $10 million bandaid applied to the limitations of modern computers and RDBMS software.

If you want to do more computation, you require more computers, and this is "insight"?

You might as well say "A level 2 cache would not be necessary if RAM worked as advertised". Duh!

ratsbane · on Oct 1, 2010

Having spent an unpleasantly large part of my life enmeshed in it, I strongly disagree with the last statement "with really good application software on top of it."

Otherwise, spot-on.

lubos · on Sept 30, 2010

Many years back before this NoSQL madness took off, I was building my own "NoSQL" database. It was fast, refreshing and it seemed like common sense. Then as I was adding more and more features, I realized my neat "NoSQL" (I didn't call it this though) database is turning into relational one. I've come full circle back to relational stores. Only then I fully understood why relational aspect is mathematically superior to any other paradigm out there and why it's only SQL that really sucks. Perhaps this will happen again to mainstream NoSQL movement. It certainly wouldn't surprise me.

So instead of making surveys why enterprises don't jump on NoSQL bandwagon, I would be more interested to know how many of NoSQL adopters ever heard of Codd's theorem or how many of them can differentiate between SQL and relational model. Relational model is a good thing, throwing it out and start from scratch feels like throwing out the baby with the bathwater. That's how I feel about NoSQL movement.

limist · on Sept 30, 2010

In short: 1) No ACID Equals No Interest 2) A Low-Level Query Language is Death, and 3) NoSQL Means No Standards. ACID, complex querying, and standards matters when dealing with something more than a string dealing with what someone had for lunch.

But while Stonebraker is convincing and very credible, can his message really compare with these talking heads? http://www.xtranormal.com/watch/6995033/

dasht · on Sept 30, 2010

I think a lot of the comments here (not all) display a relative naivete about what is meant by ACID and why the physical / logical separation of SQL is vital to the point about ad hoc queries.

It's ironic because some of those comments complain about enterprise types not knowing about NoSQL while, at the same time, the comment makers seem to not (really) know about SQL RDBMS systems.

mmt · on Oct 1, 2010

I, too, have found this to be the general case.

I speculate that this is because they consider MySQL to be the archetype RDBMS. I have yet to run into such depth of misconception from those who work with Postgres.

8ren · on Sept 30, 2010

Before relational databases and SQL, there was a successful industry of hierarchical databases (key-value stores), which was so utterly annihilated that not one vendor remained to give testament.

RDB has since become standard. One aspect of a "standard" is uniform interfaces or one-size-fits-all, an essential aspect of which is that it doesn't suit all needs perfectly. It's a modular approach, that involves adapting non-quite circular circles and non-quite triangular triangles to perfectly square slots. You see it in mathematics and (eg) OO polymorphism all the time. It's an unpleasant but often worthwhile tradeoff.

But at the edges, that tradeoff becomes questionable. And the web's REST is a new giant, whose clothing needs a different shape altogether. Applying RDB to it is so ill-fitting and irritating and obviously wrong that hierarchical databases have found a new home and a new life and a new name, NoSQL.

But when NoSQL attempts to spread to others, it is surprised and hurt to find them already tailored for.

---

The above fable is light on detail. There are many complex issues in the NoSQL/RDB comparison, since both have adapted to the detailed problems of their respective domains. However, I think the key aspect is that RDB is meticulously thought out in terms of information, and only then adapted to specific engineering needs. Specifically, it has normal forms and schema (needed for normal forms.)

gamble · on Oct 1, 2010

NoSQL has technical merits, but it seems to me like a good chunk of the interest in it is driven by distaste for SQL and the complexity of RDB systems in general. There are a lot of programmers out there who shudder at the thought of delving below the ORM, much less writing a stored procedure.

Big companies may be many things, but one thing they are not is afraid of RDBMs. Enterprises are filled with the kind of people who dedicate their lives to learning every in and out of Oracle or DB2. Not to mention that these are the people traditional RDBMs products are designed for. They just don't have the same pain points as startups.

davidw · on Sept 30, 2010

I'm not an enterprise guy at all, but here's another thing worth considering: if even I, web 2.0 dude, am a bit confused by the array of things out there, and think that waiting until they battle it out and only a few are left at the top might not be a bad strategy, what is some corporate IT guy going to think?

ahoff · on Sept 30, 2010

The disclosure at the bottom is laughable. It should read, "Michael Stonebraker has a strong economic incentive to steer enterprises away from NoSQL. Whenever a company chooses NoSQL, he will lose money. Hence, his opinions should be considered in this light." Oh, and full disclosure, I have a strong economic interest in promoting NoSQL.

jhugg · on Sept 30, 2010

As somebody who knows Mike Stonebraker and works for one of his RDBMS companies, I can honestly say he says this stuff because he really believes that NoSQL is history repeating itself.

Whether it's hierarchical databases, XML databases, objective databases, proprietary query languages, etc..., people keep trying new things and then coming back to the relational model and SQL. When people say "This time it's different", he gets annoyed. Maybe this time he's wrong, but with an understanding of the history of databases, it's easy for me to see his point of view.

Honestly, I'm not sure if antagonizing the NoSQL community helps his companies or not. The NoSQL crowd is pretty passionate.

jbellis · on Sept 30, 2010

Not sure why this is being downmodded. Stonebraker has a history of using his reputation to promote his businesses. Which is fine, but I don't think most people realize how much of a financial interest he has in downplaying nosql. Something his competitors like ahoff and I are very much aware of. :)

davidw · on Sept 30, 2010

It's a bit over the top: "Whenever a company chooses NoSQL, he will lose money." - it's not like Stonebraker gets money every time someone sets up Mysql, or even Postgresql, which, IIRC, he was involved with at some early point in its history.

ahoff · on Sept 30, 2010

I'll admit it was a bit snarky, and written in haste. You said it much better. He wants to play the role of an objective observer here, almost like an expert journalist. But he definitely has skin in the game.

ahi · on Oct 1, 2010

I don't mind that Stonebraker has a conflict of interest. He's marketing, I get it. I do mind that ACM has given him the platform to do it. Any other ACM publication would consider it incredibly unethical to publish articles with such clear conflicts.

protomyth · on Sept 30, 2010

Is there a company with a NoSQL product that is selling support contracts and is listed with Dun & Bradstreet? What do the reports from Gartner and Forrester Research say? Can I sue someone if something goes wrong. Did the options get evaluated with the corporate IT database committee?

The sad part is that is how it works in a lot of companies and not just a bunch of snark. Since SQL is so entrenched and most IT shops have contractual relations with SQL vendors with lots of money (Microsoft and Oracle), I don't see a lot of ability to get NoSQL into the enterprise environment. Not to mention the legacy software that means SQL Databases are needed, so why add a whole new type?

jchrisa · on Sept 30, 2010

My company CouchOne offers commercial support for CouchDB (which consequently has quite a bit of enterprise uptake, contrary to Stonebraker's assertions).

I think there is something to the argument that without commercial support, companies won't be adopting these technologies.

mathgladiator · on Sept 30, 2010

I try to look at the bigger picture think how can companies like yours help companies like mine starting up so we can tear down the entrenched SQL forces.

davidw · on Sept 30, 2010

> tear down the entrenched SQL forces.

Sounds like a terrible idea to me. What you should be doing is building new things where SQL is not such a good solution, rather than "tearing things down".

LiveTheDream · on Sept 30, 2010

> why add a whole new type?

Because using the right tool for the right job makes sense. SQL is great for lots of things. NoSQL hits the sweet spot for other applications.

protomyth · on Oct 1, 2010

"right" is often defined by management as something quite different than what a technical person would desire.

sriramk · on Sept 30, 2010

I dont think it is about the technology at all. It is more about social factors. Enterprises often shy away from being on the leading edge of anything. There is very limited upside and big downside. I'm willing to bet that once the NoSQL space shakes down with some big winners, some defacto standards, enterprises will be adopting it just the same.

Enterprises also need to be sold to. I don't see any startup having a sales team pushing NoSQL solutions inside the enterprise (Amazon's SimpleDB and my own team's Windows Azure Tables might be exceptions).

jfager · on Sept 30, 2010

http://cloudera.com http://riptano.com http://couchone.com http://10gen.com http://neotechnology.com http://basho.com ...

sriramk · on Oct 5, 2010

None of them are 'large' enterprise companies - think Oracle, MSFT, IBM, etc.

kenjackson · on Sept 30, 2010

The disappointing aspect isn't that they aren't interested in it, it's that they don't know about it.

At the end of the day, there are a lot of trendy technologies that I have no use for. But its good to know about them. I probably have about equal use for NoSql as I do SQL (the first professional code I ever wrote was a NoSql database, although we didn't call it that back then).

You choose what's best for the job, but in order to know what's best you have to know what's out there.

davidw · on Sept 30, 2010

To play devil's advocate, aren't a lot of these places where it's best not to rock the boat and risk screwups? In other words, they'll learn about new things in good time once they have big companies advertising them and have been around the block once or twice.

Whether that's sensible or not, I'm not in a position to judge, just that in that sort of world, perhaps learning about things a bit late is not a problem.

kenjackson · on Sept 30, 2010

True enough. And in hindsight, I think asking people if they've heard of "NoSql" is somewhat an unfair question. The term itself is s trendy one. I'm sure there are probably people who have fixed more lines of code in NoSql databases than most people have here have written, and don't know that specific term.

ahi · on Oct 1, 2010

Disagree. It's everywhere in tech news sources. If you read any tech news on a regular basis you can't possibly be unaware. Part of being a professional is keeping on top of new products/solutions in the field, even if they aren't immediately useful.

arethuza · on Sept 30, 2010

Hierarchical OLAP databases applications are pretty widely used in enterprises and they often look nothing like an SQL database to users (sure something like Oracle Hyperion may have a relational database buried under there somewhere but the exposed data model sure isn't relational and things like Essbase are pretty similar but don't have any underlying SQL).

rmorrison · on Sept 30, 2010

Another big reason: money is not an issue. You want to create ten interconnected tables w/ a bajillion records each, then query across them needlessly every microsecond? No problem, with enough money and hardware there are ways to make almost anything happen using commercial SQL databases.

muloka · on Oct 1, 2010

Your statement reminds me of this quote "A boat is a hole in the water which you throw money into."

This is the general point of view executive committees have of IT. A necessary cost to keep them afloat in the sea of business.

So as much as money is not an issue the bottom line at most enterprises is money. If you figure out how to greatly reduce the overall costs IT, or in this case data storage, the enterprise company you work for will probably appreciate it.

terra_t · on Sept 30, 2010

It's a funny thing.

Very powerful and scalable parallel SQL-based RDBMS systems have been commercially available for a long time. Also, I think that scalability concerns about ACID are exaggerated: ACID doesn't reach Facebook or Amazon scale, but there's only 5,000 or so sites that are that big.

A converse question to the one they ask is, "Why aren't web developers interested in Commercial Parallel RDBMS" and I think the answer to that is that there are a generation of us who grew up using open source databases, who find the thought of using a commercial database like sticking their hand in a toilet... Even if it's a reasonably priced product like SQL Server's Web Edition.

A lot of people see mysql and pgsql as going nowhere, so there's a lot of interest in something like mongodb which has a future.

houseabsolute · on Sept 30, 2010

On the contrary, almost all of Facebook's data is stored in SQL systems. Cassandra and such were developed as ephemeral stores for certain types of data, like lists of people who "Like" certain things, but ultimately even that data is backed by SQL. (As far as I understand from my friends who work there.)

davidw · on Sept 30, 2010

> pgsql as going nowhere

Eh?! They just released a major update with long-awaited functionality!

jbooth · on Sept 30, 2010

Did the new functionality include reliable replication?

jfager · on Sept 30, 2010

Actually, yes: http://developer.postgresql.org/pgdocs/postgres/release-9-0....

jbooth · on Sept 30, 2010

Awesome. Was an honest question, don't do a lot of mysql or postgres these days.

jbooth · on Oct 1, 2010

Geez, touchy subject I guess.

terra_t · on Sept 30, 2010

pgsql has a badly nonstandard type system (lots of SQL queries that work in Oracle & mysql don't work in pgsql.) I've still got a long list of queries that run 100x faster in mysql than pgsql

pbh · on Sept 30, 2010

I'm very confused by this comment. Did you switch mysql and pgsql in this comment, or was this what you meant as written?

MySQL has always struck me as having a non-standard and quite strange type system (broken time/date types without microsecond accuracy, explicitly sized text types, ...). I've always seen PostgreSQL being marketed as a drop in replacement for Oracle due to superior standards support. If anything, PostgreSQL seems much more similar to Oracle than MySQL.

I'm also very curious what sorts of things I should avoid if you really do have a very long list of queries that PostgreSQL handles really poorly.

terra_t · on Oct 1, 2010

you can write

select sum(x=1) from y;

in standard SQL, Oracle, and Mysql. True == 1 in standard SQL, False == 0. Both pgsql and Microsoft SQL server define a nonstandard boolean type that requires you to add a cast or an if statement, bulking up the query.

I did a shootout of mongodb and the three RDBMS systems (!Oracle) I mention for building a system to represent data from Freebase. It was possible to make a VARCHAR(4096) in mysql and only index the first 64 characters which meant I could map freebase types to mysql tables without running into index limitations -- I wanted the better GIS capabilities in pgsql, but I didn't want to double the size of my tables and queries to be able to handle strings losslessly.

chunkbot · on Oct 1, 2010

The title isn't a question, it's a statement.

I'm pretty sure Mr. Stonebraker's English is better than that.

shawndumas · on Sept 30, 2010

We are looking in to using Node.js/Redis/Socket.io to track changes to useful profile data across all of our intranet applications in real-time. No change to the current ACID setup; just hooks at the client-end. No need to persist; speed and efficiency is top priority. No extra hardware; just an extra load on our content servers.

And though we are probably in the minority we are Enterprise nonetheless.

j_baker · on Sept 30, 2010

Say what you will about SQL databases and I'll probably agree with you. However, larger organizations have to think at a much larger scale than the typical startup does. Migrating from a SQL database to a NoSQL database is a much more difficult decision to make once you reach a few hundred thousand lines of code, much less millions.

And I haven't even gotten into the logistical issues yet. Do we have server capacity for the new database? Do we have people who know how to administer the new database? Can I install it in under a month on our server running <insert outdated version of RHEL here>? And these questions become doubly important if your clients are running your software on their own servers. Now not only are these logistical issues, they're excuses for your clients to drop you and go with someone else.

To make a long story short, if NoSQL databases are worth it, enterprises will get there. But it's going to take a while. Much as us engineers hate to admit it, that might even be a good thing.

muloka · on Oct 1, 2010

In terms of the adoption of NoSQL databases (or any dev technology) within your enterprise, write a small and useful app as a side project in your spare time. Once you're done present it to your manager (or team lead) and I'm sure it'll increase its chance of it becoming more widely used within said company.

muloka · on Oct 1, 2010

Earlier CitizenKane made a point about the lack of standards across SQL databases in general. As such migrating from one SQL db to another would probably present just as equally a challenge.

Let me tell you migrating from Informix 9.4 to SQL Server 2005 was no walk in the park.

batasrki · on Sept 30, 2010

The most disappointing thing about this post is the fact that we're supposed to be preaching "use the best tool for the job" to the young ones. And here is someone that anyone interested in database technologies has heard about and is regarded as an expert denouncing a whole slew of potential database solutions based on the opinion of one person.

And ACM publishes it!

baconner · on Sept 30, 2010

On the contrary the post explicitly goes into the reasons why current nosql solutions may not be appropriate for enterprise oltp systems. Seems to me that the post didn't say nosql solutions are bad for all enterprise apps just that rdbmses may be the better solution for the bulk of them. So that's not an argument to choose the right tool for the right job? I think it is...

CitizenKane · on Oct 1, 2010

Stonebraker makes a point of saying that the lack of standards in NoSQL system prevents their adoption. While there are standards surrounding SQL most SQL databases do not adhere to the standards. They have connection semantics and syntax that are at times mutually exclusive. Drupal 7 has a dynamic query builder specifically to take care of this (see this podcast for more info http://www.lullabot.com/podcasts/drupal-voices-24-larry-garf...). But for most organizations, it's not like they can simply switch from one database system to another without a fair amount of work.

kragen · on Oct 1, 2010

It seems to me that there are two or three things being conflated here under the "NoSQL" moniker, and Stonebraker isn't helping things.

First, there's the standard relational storage model. We've known for almost 40 years that some kinds of queries are a lot faster if you denormalize so you're not even in 1NF. Occasionally this matters. "Document stores" are a lot better suited to things that aren't in 1NF than relational databases. In theory you could maybe make this problem go away by throwing more hardware at the problem, so that the extra factor of lg N (which might be around 20 or so) goes away.

Second, there's SQL and all of the hassles it comes with, which can be largely papered over by things like SQLAlchemy and Django and the like, but there are still tricky issues like schema migration and incremental rollout.

Third, there's the CAP theorem. ACID requires consistency (the C in ACID and the C in CAP are the same), so you give up either availability or partitionability, and when you give up partitionability you're usually giving up parallelizability as well, which can put a crimp in the "throw hardware at the problem" approach.

It's true that the costs of abandoning standard SQL databases are high. But there are things you can't get any other way. If you need a million queries per second per database server (serving up variable-length lists of structured data), if you need to be able to incrementally roll out a new database schema across your site, or if you need to tolerate network partitions (and latency; a partitioned network is just the limiting case of high latency) without your database becoming unavailable, standard SQL databases aren't going to give you what you need, as far as I know.

As far as I can tell, these are the three sources of "NoSQL": rejecting the relational model as a way to organize on-disk storage, rejecting the SQL DDL as a way to manage change, and rejecting ACID transactions. Interestingly, none of these necessarily implies rejecting SQL as a user-interface language, and indeed FQL is an SQL user interface on top of some very non-SQL-ish systems.

(SQL itself is a terrible language, as Stonebraker will no doubt tell you if you ask, but it's not so terrible as to justify the switching costs.)

ahi · on Oct 1, 2010

The entire premise is wrong because there is nothing to compare the survey data to.

"44% of enterprise users questioned had never heard of NoSQL and an additional 17% had no interest. So why are 61% of enterprise users either ignorant about or uninterested in NoSQL?"

Is 61% high or low compared non-enterprise users? Maybe 61% of all users are either ignorant or uninterested.

Unrelated, if you're in IT and are unaware of NoSQL databases you kind of suck at your job. Not saying you have to use them, but you should at least know they exist when the tech news is saturated with them.

ludwigvan · on Sept 30, 2010

About the proliferation of NoSQL, check out this Changelog episode where "things got a bit rowdy when the panel debated features of Cassandra, CouchDB, MongoDB and Amazon SimpleDB and started throwing dirt at everybody else’s favorite NoSQL databases." http://thechangelog.com/post/457259567/episode-0-1-8-nosql-s...

This is 6 months old btw.

hello_moto · on Sept 30, 2010

Correct me if I'm wrong. I thought some of the "enterprises" have already using NoSQL as part of their data warehousing strategy?

To my knowledge (which might be wrong), there are companies out there such as Greenplum, Teradata, Netteza, Oracle Exadata that specialize in data-warehousing solution, have built their own DBMS based on column-oriented.

rxin · on Sept 30, 2010

Column or row-oriented really has nothing to do NoSQL or not. I'd argue NoSQL is a bad name, and as pointed out by previous commenters, many that are involved in the argument don't even understand ACID/SQL/relational algebra.

etm117 · on Sept 30, 2010

I am not positive myself, but I believe that Greenplum and/or Netteza are built on top of Postgres (highly customized) DB engines to ensure their transactional integrity across their clustered servers.

djhworld · on Oct 1, 2010

The simple reason for all this mess is the fact that most enterprise companies have spent a lot of money on their Oracle/Sybase/Whatever licenses and hardware and want to see a long term return on their capital.

0xbadcafebee · on Oct 1, 2010

Reason #3 is a good one. In Perl i'd make DBDs for all the NoSQL solutions I had to use and tell my devs to use that, but what else could we do that'd be more language-independent?

SilianRail · on Sept 30, 2010

Not true, they are just aren't doing "NoSQL" specifically: http://www.sap.com/press.epx?pressid=13293

smokeyj · on Oct 1, 2010

Didn't NoSQL start in enterprise? Facebook used Cassandra, Google used BigTable, Amazon had Dynamo, etc?

_3u10 · on Sept 30, 2010

Because NoSQL systems are a tool, enterprises have invested heavily in workforces and capital that are centered around SQL. It's relatively easy as an enterprise dev to get a SQL database that is going to be backed up and restored by their admins. It's ok to lose the odd 'Like' it's not ok to lose the odd deposit.

Since enterprise businesses are generally not built on a freemium/advertising model, they have the resources to buy systems that are accurate first, and then furnish hardware to provide the speed.

Another problem with NoSQL is that while the core database functionality is there, the rest of the tool chain is missing. Show me a NoSQL database that has tools on par with SQL Server management server, or has tools that integrate well with Visual Studio.

Simply put, the use-case for NoSQL is not what enterprises need. SQL fits the use case far more than NoSQL. Keep in mind that a lot of large enterprise applications still run on mainframes. If you actually think about it, most mainframe applications are NoSQL because they are still using datastores so ancient and basic that they are essentially NoSQL.

jshen · on Sept 30, 2010

I'm not sure I agree with the definitions of enterprise being implied in the comments here.

I'm at a giant company all of you have heard of and we have the sql server visual studio enterprisey people that the article mentions, and they fit the articles description to a T. I also work directly on analytics for our company, and good friend of mine does analytics for another giant corporation nearby that all of you have also heard of.

I believe our companies would be considered "enterprise", yet neither of us work at banks, neither of us work with ecommerce or supply chains for physical commodities, and neither of us are at tech companies. We both have groups heavily invested in sql, yet it is the wrong solution for both of us. Our enterprisey people had never heard of NOSQL or things like hadoop. Fortunately we're coming around and are starting to move things to hadoop, but the article matches my personal experience, and ACID is a requirement for the analytics we do. Our analytics are fuzzy by nature, yet we've been stuck in the sql enterprise mindset for years.

c00p3r · on Oct 1, 2010

0. Because mediocre enterprise programmers can't manage complexity.

0.5. Because you can point fingers to Oracle when it crashes.

0.9. Because "No one was fired for choosing IBM".

ajsharp · on Sept 30, 2010

Without reading the article, I can sum this up in two words: perceived efficiency.

sudonim · on Sept 30, 2010

Please don't use double negatives... shouldn't this read "Why Enterprises are Interested in SQL". (And it's a statement not a question).</snark>

kujawa · on Sept 30, 2010

There's a positive review if I've ever heard it: "Enterprise thinks it sucks".

If I live a dozen lifetimes, I hope to never write another line of "enterprise" code again.