Hacker News new | past | comments | ask | show | jobs | submit login
MongoDB vs Redis, a different interpretation of what's wrong with Relational DBs (antirez.com)
30 points by davidw on June 3, 2009 | hide | past | favorite | 24 comments



I expect that in a few years what was the real problem with RDBMS is going to be very clear, even if now it can look confusing enough and there are different alternatives and it is very hard to meter the relative value of the different solutions proposed.

I expect that in a few years antirez will study relational theory and conclude that he's been incrementally, unknowingly, implementing an RDBMS this whole time.


Redis can be the most useless piece of code in the world but it is very hard looking at the command reference to state that Redia is going to be an rdbms in my humble opinion


I'm not 100% sure, but it looks like redis might already have the low level building blocks of a relational dbms. The relational model is: relations + relational operators + boolean logic.

A relation is just a special kind of set construction and redis has sets, so relations can be built. All of logic and operations can be built from either (not + and) or (not + or). You have SISMEMBER (not), SUNION (and), and SINTER (or).

That might be enough.


You can't get select, project, or cartesian product out of ismember, union, and intersection. Also, union corresponds to and and intersection corresponds to or, not the other way around as your comment has it.


You can't get select, project, or cartesian product out of ismember, union, and intersection.

Sorry, I didn't mention that redis has SADD and SREM for implementing projection. I'm not sure what you mean by select. You're right about product. I think that's what's missing.

Also, union corresponds to and and intersection corresponds to or

That's exactly what I said. We both got it backwards.


Haha! You're right.

"Select" is the SPC-algebra operation which gives you the subset of a relation whose tuples satisfy some Boolean condition, e.g. firstname='bob'.


I think that's totally unwarranted. He essentially pointed out that MongoDB is actually an RDBMS, and clearly explained how Redis is different, and why.


There's nothing relational about MongoDB. At all. Relations are sets. MongoDB is setless.


David mine it's not a critique to MongoDB at all. It's just a matter of different models. In MongoDB objects are composed of fields with values, and this is very like tables, and you can perform queries that while are not written in SQL are actually very similar to SELECTs. Also you can set indexes into fields, that means, there is some kind of implicit structure to access the data.

If you don't like it, I'll not tell the word "relational" since it is not very correct in this context, but this model is mathematically and algorithmically similar to what is inside the implementation of MySQL for instance.

To state it more clearly, in this model, the access pattern is not already implicit in the way data is stored by the user. With Redis lists it's pretty obvious (and you can see in the doc the time complexity of all the operations), that if you use LPUSH against a list, then with LRANGE you can get the top 10 items in constant time regardless of the length of the list: this is because of the data model, Redis lists are already ordered because they are just linked lists, and so on.

This is not to say Redis is better or MongoDB is better, it's just a different point of view, MongoDB implements a smart subset of the features provided by an RDBMS while Redis tries to take a different path where you select a data structure that is the best for the way you'll later retrieve those data.


This is not to say Redis is better or MongoDB is better, it's just a different point of view

Sure. I didn't mean to imply any criticism, or imply that you were criticizing. I, personally, haven't figured out when/where to use a document database like Mongo or Couch, but I figure they must be a good fit for someone's use case or people wouldn't be making them.

The thing that I have trouble with is that in every presentation or discussion I've seen of these alternative databases the people presenting them never seem to have an understanding of what a relational database is. And because the language in these presentations is always so obfuscated (to me) I'm never able to get a clear idea of what exactly they do and where they stand in relation to a relational DBMS (by which I mean the platonic ideal of a relational DBMS, not SQL).

Redis is interesting to me because it's unlike the others. And because of the way it's unlike the others. It has data structures with operations. In other words it's a DBMS, or "structure server" in your terminology.

At its core a relational DBMS is just a "structure server" with one kind of data structure, called a relation, and operations on that structure. And that's all it is really. Others probably disagree but I think even a query language is optional. The query language is just the UI.

SQL confuses everything. A SQL DBMS is a relational DBMS (almost) plus a boatload of additional functionality. I see your point about how Mongo is like SQL because it has named values and queries, but it doesn't have relations or relational operators.

In summary: Mongo may have some SQL like features, but they aren't relational features. The set features of Redis on the other hand are getting close to the building blocks of the core of an actual relational DBMS.


Ok, so there is some terminology mismatch here, sorry but this is my fault I think since when in the article I speak of RDBMS I think to Oracle, MySQL, ..., relational SQL databases.

Btw in Wikipedia I read: "A short definition of an RDBMS may be a DBMS in which data is stored in the form of tables and the relationship among the data is also stored in the form of tables."

Basically once you have the idea of tables, a lot of things about SQL databases are automagically needed: queries and indexes for instance.

Also MongoDB supports _id fields specifically. And tables, so you can build tables that have as unique goal to describe relations among other tables. So MongoDB is an RDBMS following the definition of Wikipedia, that may be wrong of course (and if so should be fixed).

With Redis Sets probably it is possible to build the storage system of an RDBMS just with software on the client library side, but it can't be considered an RDBMS AFAIK because of the lack of tables, and Sets can't be considered tables since they don't have the field:value idea at all, but are just collection of elements.

Btw in the future I'll try to use the term RDBMS better, and will refer to Oracle, MySQL, Firebird, DB2, and so forth, as SQL databases.


Which might not be a bad thing. I'd like to see something implement the relational model well but lose the cruft that got into the SQL standard. (Duplicated tuples? WTF?)


The problem about RDBMS in web world is that it just don't fit the scenario. When RDBMS was born, it deal with large, relational, but relatively static data. You can use whatever query to get the organized result. But today's web world is very different. We have relatively static queries (get top 10 stories, get account XXX most recent posts etc.), but the data are changing all the time. Consistent query may be a good start to solve the problem.


this is an insane statement. High transactional capability is exactly what you'd use RDBMS for. Or are there not that many credit card transactions per hour ?


In my experience there are two sources of problems with Relational databases. They are either designed by Programmers or they are designed by DBAs. They both make a complete and utter mess of the task almost every time.

Programmers design databases to make their programs simpler and DBAs seem to design databases to make programming near impossible.

The most amazing thing is that Relational databases are simple things to understand and design - when you understand the data. That is probably where the trouble lies - neither of my two design groups understand the data - they both have entirely different agendas.

To get to the point - Relational Databases are fantasticly flexible high performance tools - it is just that so many examples of poorly designed databases abound that people get the idea that they are a less than ideal tool.


> DBAs seem to design databases to make programming near impossible.

Have you ever been in a situation where a C buffer overrun caused a data structure to get scribbled over, and instead of crashing, the program just started passing around corrupt data? In a long-running application, that eventually died for a reason that made no sense whatsoever? To the point where you know that noise got in the system, but you don't know where or when, and everything is now suspect?

Imagine that happening to a hospital's medical records. That's a DBA's nightmare scenario. They're not trying to "make programming near impossible", they're trying to systematically prevent (potentially incompetent) programmers from corrupting the data. Databases are good at including internal consistency checks into the schema design itself.

If a programmer new to your project got mad because they kept making asserts or unit tests all over the place fail, and decided to delete them, how would you react?


Far too many people are confusing problems with SQL with problems with the relational model.

SQL != relational database


Indeed. But for this to be fully in context add this: 99.99% of relational databases user base is using SQL databases.


My problem with redis is that it is permanently advertised as a "Database". It is not, it is a persistent cache with some nice query functionality. But all your data must fit in RAM. Otherwise bad things™ happen.

Some may consider this a small terminology glitch but for me it's quite a fundamental issue.


Even without the persistence you can say something is a database. It's not a coincidence that the term "in memory database" exists for example.

Btw everything is used to store data permantly, reliably or not for your specific needs/tastes, is a database.

Also you can use Redis in append mode if you wish, this will sync your data ASAP on disk. For now in order to do this you need to use netcat, but this will be supported with ad-hoc tools in the future. Still this will be the discuoraged way to do things.

And about reliability, it's worth to note that with Redis there is a delay between the memory copy and the persistent copy of the dataset, but this is not about reliability of the whole dataset, all your data that is not about the latest few minutes is stored on disk permanently without any kind of risk.

Also, you appear to claim that if a database for design limits the amount of data it can hold to the amount of free ram, it is a cache. This is not the case IMHO.


Even without the persistence you can say something is a database. It's not a coincidence that the term "in memory database" exists for example.

Then why is redis not advertised as an "in memory database"? It likes to be compared with MongoDB or TokyoTyrant but is clearly not playing in the same league.

Moreover it glosses over this issue in quite a self-righteous fashion almost everywhere. One example of many would be this gem from the "TwitterAlikeExample":

This means you can serve millions of users every day with just a single Linux box, and this one was monkey asses slow!

That's just false advertising if your "millions of users" actually use the service. Just storing the plain account data for 1 million users would push the database into the 1GB realm. Go figure what happens when these users actually begin to leave a footprint.

Btw everything is used to store data permantly, reliably or not for your specific needs/tastes, is a database.

I'm not normally nitpicking about terminology. It just left a bad taste in my mouth in this particular case because the redis homepage doesn't bother to mention this critical difference to most similar offerings. The current blurb is highly deceptive.


Yes, but consider the Prevalent Hypothesis (http://prevayler.org/wiki/Prevalent%20Hypothesis):

"That all your business data fits in the RAM you can afford"

I believe this is true in many, not say most, cases. If not, don't use an in-memory "database", "persistent cache", "prevalence layer", or whatever you want to call it.


Sorry, but that's the most idiotic hypothesis I've heard in a while. ;-)

Being able to guarantee that all your data will always fit in as much RAM as you can afford is a luxury that almost nobody has. There's still an order of magnitude between common disk and DIMM sizes.


I think I agree. I'd call something a database only when it's capable of transparently managing a quantity of data significantly larger than your RAM storage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: