Ask HN: What's the current state of NoSQL ?

simonw · on Feb 1, 2010

NOSQL = Not Only SQL - I don't think you should ever expect it to completely replace SQL since they're good at different things. I've played with a bunch of NOSQL engines (including CouchDB, MongoDB, App Engine, Solr, Xapian and Tokyo) and while I really like them and would use them for a bunch of problems, for most of my projects the ability to create arbitrarily complex queries using joins is essential for rapidly iterating and trying out new ideas.

Instead, I'd suggest using NOSQL stuff to complement SQL. Songkick use MongoDB as a fast caching layer for example: http://effectif.com/ruby/manor/denormalising-your-rails-appl... - and I've found Redis incredibly useful as a way of handling write-heavy parts of my applications and dealing with requirements to return random elements.

One of the most interesting aspects of document stores such as CouchDB is that they are schemaless, which for some problem sets is incredibly powerful - anything where you might be tempted to use key/value pairs in SQL for example.

gridspy · on Feb 1, 2010

I'd only consider NoSQL if you expect a very heavy load and one of these:

   - You have live data that is changing very regularly
   - You have a large quantity of flat data (think column-oriented databases)
   - You don't need to index / find data and it is very flat 
     (perhaps simple files on disk will allow you to store more)

Plus :

- You cannot possibly put off going NoSQL until you have further established yourself in the marketplace

For Gridspy, we have live data and I expect large quantities of pretty flat data. It makes sense to stream the data directly to the user via messaging rather than polling through a database. Plus, I plan to store large quantities of high resolution data in a specialised database or dumped to disk - it will be much smaller and simpler without the indexing information since I don't need to search it, only slice it.

See http://blog.gridspy.co.nz/2009/09/database-meet-realtime-dat...

m0th87 · on Feb 1, 2010

To be fair, I think there are a lot more use cases than that. My last project and my current one both used NoSQL solutions. The former because graph walking is painful in SQL and the latter because schemaless documents enable so many new scenarios (in my case, the automatic mapping of user-created forms to the backend). For me, the performance ramifications of NoSQL is way overplayed compared to its usability improvements. It is certainly not a successor or replacement to SQL, but for applications that handle a lot of unstructured or semi-structured data, NoSQL makes code tighter, simpler and easier to reason about. Which means less bugs and faster deployments.

EDIT: I should qualify this argument by stating that I'm talking about datastores that support custom queries (like MongoDB and CouchDB) rather than ginormous k/v stores (like Tokyo Cabinet). I've found limited usefulness for the latter.

gridspy · on Feb 1, 2010

You know when you have a special case. If you don't - just the most convenient off the shelf database and get back to adding core functionality.

It sounds like you do, however I think that there are a lot of people who have no real reason to move away from SQL other than fashion. The database choice should be just like any other engineering decision - often your familiarity with the new tool is very important.

JulianMorrison · on Feb 1, 2010

MongoDB is good for non-flat data if you can live with "transactions" that operate only within one "document". (You can do a surprising amount with this, using complicated select-and-alter primitives.) And it has indexes.

The "mmap and update in place" design has upsides and down. Up: it's very fast and it doesn't allocate any new space for simple updates. Down: data will be trashed if the DB is shut down rudely, so you need a replica of each node just to keep ACID.

justinsb · on Feb 1, 2010

Is your app pushing the limits of SQL databases? Is there any reason to look at NoSQL other than the fact that it's 'cool'? Currently all the NoSQL databases are very early adopter products, and each have their own strengths and weaknesses, so you'll have to choose a NoSQL database whose strengths match the area where a SQL database is failing you, and where the weaknesses aren't deal breakers.

Of course I'm biased, and tend to lean towards using SQL/relational databases... FathomDB is all about trying to eliminate the pain points of running a (My)SQL database. I feel a lot of the NoSQL marketing hype is picking on weaknesses of MySQL (rather than relational databases per-se), and so we're thinking about how to make MySQL better, and we don't think it's a good idea to abandon the relational model entirely. After all, our industry started with NoSQL back in the 60s, and there were good reasons for adopting the relational model 30 years ago!

olalonde · on Feb 1, 2010

My ultimate goal is to increase productivity - I'm getting tired of writing and maintaining all those SQL queries. I felt the driving force behind NoSQL databases was cutting through the pain of SQL, but I can see from other comments that it is not really the case.

justinsb · on Feb 1, 2010

I believe that simplifying querying is a non-goal for NoSQL, in fact most NoSQL databases actually push more of the burden of querying onto you. CouchDB is arguably a bit better here with its concept of views. SQL is incredibly powerful for expressing very complex queries succinctly, and it's pretty difficult to beat. CRUD queries are tedious in SQL or NoSQL, and an ORM or similar abstraction layer definitely helps productivity when programming those bread-and-butter operations.

olalonde · on Feb 1, 2010

Thanks, that's exactly the answer I was looking for. For some reason, I was under the impression that simplifying queries was actually a design goal of NoSQL whereas it really is about scalability and performance... right?

cperciva · on Feb 1, 2010

NoSQL simplifies queries to the extent that it makes complex queries impossible and thereby forces people to design their data structures in such a way as to allow them to do everything they need using only simple queries. :-)

justinsb · on Feb 1, 2010

That's a great way to look at it. The problem here comes when you want to add a new feature, and you find that your design doesn't allow for the new queries needed.

The relational model and data normalization are reasonable at stopping you painting yourself into a corner, and SQL's bulk data manipulation operations can get you out of trouble.

With NoSQL, you're on your own, which is great for those that are infallible and omniscient. Ironically enough, I think that means that only Larry Ellison should be using NoSQL, to twist the well known joke.

richcollins · on Feb 1, 2010

How are complex queries impossible? You could easy write a declarative language for querying graphs.

wlievens · on Feb 1, 2010

Sure, but the performance would be horrible. That's his point. You can't do some of the things rdbms's do.

kunley · on Feb 1, 2010

I guess he meant: complex queries are impossible at the server side, so you have to run it in the client.

justinsb · on Feb 1, 2010

Yes - that's my understanding of the goal of NoSQL. As for whether they've achieved it, that's a more complex question :-)

simonw · on Feb 1, 2010

"I'm getting tired of writing and maintaining all those SQL queries." - learn a good ORM. The Django ORM is a heck of a lot better at writing SQL than I am, especially the kind of queries that make up 95% of a web application.

olalonde · on Feb 1, 2010

That sentence was for illustration purposes: I was wondering if NoSQL databases acted on the same level as ORMs (and I was apparently wrong). I am very well aware of ORMs (actually wrote my own in PHP: http://www.getdorm.com).

alexpopescu · on Feb 1, 2010

I think it would be a mistake to think of NoSQL as a replacement of RDBMS. Its main goal is rather to make our lives easier for a set of scenarios that were created with the read-write web. I'd encourage you to take a look at these NoSQL usecases: http://nosql.mypopescu.com/tagged/usecase. Hopefully that would give you an idea where some of these systems are fitting in. If you check other presentations on MyNoSQL you'll notice that many live systems are using a mixture of RDBMS and NoSQL.

:- alex

Jim_Neath · on Feb 1, 2010

I've been using MongoDB to rewrite the activity feed on one of my apps. I wouldn't use it for everything though (not just yet anyway).

Having said that, the guys behind Harmony (http://get.harmonyapp.com/) use MongoDB for everything, as far as I know: http://railstips.org/blog/archives/2009/12/18/why-i-think-mo...

simonw · on Feb 1, 2010

I'm curious: how do you structure an activity feed in MongoDB? Are you doing anything special to support the case of "show me activity from all of the people I am following", or do you just use it as a fast append-only log?

siculars · on Feb 1, 2010

from your comments here, it seems that you are a bit confused by the term nosql. it is kind of a misnomer, inmho, and should rather be called nordbms. what this movement is really replacing is the traditional rdbms approach to data storage, retrieval and searching. as @emileifrem points out in his talk at nosqleast.com, nosql should be referred to as "not only sql". further he likens the explosion of new systems under the nosql banner of that to the explosion of rdms's in the 1980s and 1990s. i tend to agree. there are a number of solutions out there right now, each approaching nosql from a different angle.

watch some of the videos from nosqleast 2009, https://nosqleast.com/ to get a better picture of some of the different options and major players in this area before making a decision as to what nosql solution to base any of your future projects on.

aita · on Feb 1, 2010

Yahoo's benchmarks: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf

randliu · on Feb 1, 2010

No, it's not ready to replace SQL, and I don't think it ever will. What are your requirements? If it's horizontal scalability (and you're actually hitting a performance wall) you should begin to think about it. Maybe also if you never do any joins.

Relational database systems (+ normalization) compromise everything to ensure the ACID properties, which for the majority of cases, is the most important part.

richcollins · on Feb 1, 2010

How is the relational model inherently better for ACID than non-relational models (graph dbs for instance)?

randliu · on Feb 1, 2010

>database systems

kunley · on Feb 1, 2010

There are different things to consider depending on whether you'd create new app alone / with trusty co-founders, or you'd want to introduce it to a team using some form of agile development, or you'd want to expose NoSQL to people using ol' rusty waterfall model.

The last case is hardest and I'll share some thoughts on it. I know most of you don't live in such environment, but still you can infer the "agile" scenarios from the waterfall one. In other words, the following waterfall issues can be areas of potential fkups using whatever development model.

So, the impact of switching to NoSQL for different waterfall'ish teams:

- it changes the way how your data is organized -- mostly it's denormalization and some strategies tied to the specifics of queries you'd have to use most (read: ad hoc strategies). So, it influences the analysis, architects, development & release management.

- it changes the way how the db "schema" changes can be introduced. You'd say "there's no schema". Well, it's partly true, but in real life you have to add some metadata information to the underlying db, otherwise your db queries won't run. For example, Cassandra has ColumnFamilies definitions, CouchDB has its view definitions. Somebody has to agree what needs to be changed and then write these changes and maintain it in sync with the codebase. You'd probably need mechanism like Rails migrations to maintain it - you won't get rid of it with the promise "there's no schema". Somebody has to apply such changes to production as well. So, back to the waterfall: it influences analysis, development, release management & operations.

- it changes the way how your app scales. The goal of many NoSQL engines is to easily scale horizontally -- this is a big win to operations! But we're not there yet (Cassandra? Maybe MongoDB?), see eg. http://bjclark.me/2009/08/04/nosql-if-only-it-was-that-easy/. Also, if something you need crucially doesn't scale, you have to redesign your app. So the influence is: operations have less work, release management has more work, but in the worst case all the teams have to rework the app.

- it allows for some non-standard app behaviours. Eg. CouchDB is excellent at disconnencted operations, meaning: ocasionally synchronizing data between nodes which are mostly offline. It's also called "no master" as opposed to "multi master". No wonder IBM research funded CouchDB development (trying to rewrite Lotus Notes? ;) and also Ubuntu chose it for their Ubuntu One sharing platform. Feature like this is a relief for release mgmt & operations, but can need a lots of work from the architects, analysis & devs.

Hope this is useful. I'm considering convincing some BigCorp to use NoSQL in some project and these are the issues I thought of.

kunley · on Feb 1, 2010

I didn't write anything about the lack of transactions. Well, one should examine the atomicity level provided by the engine and then there's some work needed to be done by architects & devs to ensure data consistency where it's needed.

w3matter · on Feb 1, 2010

For us the big thing, during the current refactoring of http://www.funadvice.com, is eliminating joins.

In testing of the new up-coming platform, that was a huge, huge win for speed. And we're a Postgresql shop too.

MongoDB allowed us to: * Have embedded documents (very large performance improvements) * Have arrays and hashes as "columns"

We also use Redis in a few crucial places, because of its really good support for lists (queues), and sets, besides, just its blazingly raw speed.

Downsites? Yes. Many rails-style plugins don't work well. But an upside is that we're forced to write leaner code and not depend too much on those.

Another downside, MongoDB is super-fast, but is still a work in progress in some places, and the ORM we're using (mongo_mapper) is somewhat of a moving target right now.

But hey, thats what happens when you're on the bleeding edge.

MongoDB: * build-in replication * basic sharding * embedded documents * very very fast