Hacker News new | past | comments | ask | show | jobs | submit login
MongoDB is to NoSQL like MySQL to SQL - in the most harmful way (use-the-index-luke.com)
118 points by davvid on Nov 12, 2013 | hide | past | favorite | 121 comments



I think there's one other problem that affects most NoSQL systems - the perception that adopting a NoSQL means you don't have to think about your data. When designers and front-end developers want to develop a web application they can do one of two things; a) say "don't worry about the back-end, we'll just throw it in a NoSQL database" or b) learn to also be a back-end engineer.

The idea that you don't need to worry about your data structure is deadly. Every successful project I've been involved with thinks seriously about the data model. ER diagrams with 160 tables aren't uncommon and knowing the structure of your most common queries helps you make sure that your database isn't over-normalized. There's a science to data systems after all.

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." - Frederick P. Brooks from The Mythical Man-Month.

I quote or paraphrase this fairly frequently as I've come to believe it. If I understand your data structures, I can pretty easily tell what you might be doing with the system and I can read the code to see which parts you've implemented. But don't use this as an excuse to delay working on the rest of the application. There are Architecture Astronauts in the realm of database modeling too!

"Normalize until it hurts, denormalize until it works!" - Unattributed adage.

The key is to understand your data (and it will provide an amazing boost to the rest of the project). If you're worried about having your data-model perfect before you start coding, you should have started coding already. There are practices you can adopt that make refactoring your database more tenable. I recommend you read the "Evolutionary Database Design" article by Martin Fowler and Pramod Sadalage (http://www.martinfowler.com/articles/evodb.html), then adopt a workable technique as a team.


I will go further on to say. The biggest problem is the perception that a particular tool will make up for bad practices, general indiscipline and many other small time problems.

If you start with such an assumption only bad can to the project.

MongoDB has its uses, but if you are making an assumption that its going to make up for SQL related use cases then you shouldn't be surprised if the consequences are going to be disastrous later on.


Well-said!


Yes, this is often overlooked, and one of the main downsides to NoSQL databases. Schemas don't really go away they just get pushed into the application layer. When multiple apps use the same database, schema maintenance and evolution become a real issue.

Perhaps it's because much of web scale data is arguably tissue paper data, more useful in the aggregate. Transactions and constraints on the data seem to be less a concern.

In this regard I think key-value stores are preferred over document stores such as MongoDB and CouchDB. A key value store makes it clear where the semantics of the data belong.


"In this regard I think key-value stores are preferred over document stores such as MongoDB and CouchDB. A key value store makes it clear where the semantics of the data belong."

True, but I've used CouchDB successfully in applications too ... that's a mental "note to self" you need to keep. If the value in a key-value store is completely opaque, you can't do views (my favorite part of CouchDB).


I totally agree, I've used CouchDB a lot myself and a "mental note to self" or other best practice can suffice, especially if programmers leave or otherwise document those notes for their successors :)

CouchDB views are really neat, something that distinguishes it from others.


I worked on an ambitious project which used MongoDB (as a kind of cache to service queries only) with some success in an app which dealt with huge variety of semi-structred, ad-hoc data. The system was to empower users to create and use their own schemas in an excel-ish kind of way, i.e. no waiting on DBAs or programmers to get stuff done.

But on reflection, the capabilities of MarkLogic or, if we had an infinite budget (this was a thing we built while still doing our core work), some pile of RDF/SPARQL-tastic workflow and a team of 10 engineers gluing something together that's drivable by normal human beings - would have been better.

We found the promise of "schemaless" MongoDB a little lacking back in the Mongo 1.6 era with respect to supporting the ad-hoc queries users wanted to do. That's down to paging, which needs sorting, which doesn't scale unless you have indexes. And I don't mean in a performance sense, I mean, it just died and you got errors instead of sorted results. The work-around was to use limit to make the sorted result fit inside a single BSON document, but then anticipating exactly what that limit should or can be when you have wildly diverse document structure and contents was very tedious and so stupid hacks ensued...

... but in conclusion, this is the type of use-case I hope MongoDB would be appropriate for.

It was still the right choice for our limited budget and resources, we got the 80% solution very quickly indeed.


I don't doubt that MongoDB can be used successfully, but I think the whole point of the article is that MongoDB's success is due to marketing rather than suitability. Where MongoDB is truly suitable, I'm sure it shines. I've got CouchDB in production on two systems, but for applications or portions of applications that are well-suited to a document-store.


I think the biggest lesson here for us engineers is that we should watch and brace ourselves for similar tech buzz offerings like MongoDB. A few years when I went for interviews for startups, it was shameful that I didn't have experience and wasn't using MongoDB in my own products. I was a 'rusty spoon' for using resilient SQL. Funny how even engineers, a whole industry, can get swept up in using a subpar data store because it's the new cool piece of tech that makes it easier to have json you throw to a wall stick. I will never feel shamed again for not using some hot shot popular offering only because it makes development a tad bit easier. As engineers the end result is most important and we shouldn't sacrifice our platform stability just to ease our code editors and some minor migration frustrations. /endrant


The key argument is that nobody knew the performance characteristics. While developers looked at the API they where enchanted by it's perceived simplicity, but knew nothing about the behaviors of the database under load. Understanding the various tweaking parameters and failure modes is an even more important part of any database than the API in production.


MySQL was a great database that solved a problem. It did not solve it in a 100% complaint way, it did not have all the features that "dbms experts wanted", but in the early days it was very easy to started. In version 5+ they have fixed a lot of the issues and it is now on-par (or better) than most other databases. I also think they have focused on the right stuff: ease of use, easy way to fix performance issues, great ways to setup replication, great ways to take backups, not losing data, optimized to be used for the web etc. Also, most bigger properties (including Facebook and Google) use extensively MySQL.

If Mongo is going to be like MySQL, then great!


> but in the early days it was very easy to started

It's easy to start a bicycle rolling down a steep hill too. It's only later that you find out whether the brakes, tires and handling characteristics are up to par...

> features that "dbms experts wanted"

Maybe they want those features because the know what they're doing, and think some of those features are pretty critical.


And you base this on what? MySQL is used to scale some of the biggest Internet properties in the world and it has been doing that for many years. Stop spreading FUD.


FUD is actually useful (engineering FUD rather than marketing FUD) ... if you pick MySQL because it's easy to get started, you need to know its pros and cons. Those scaling MySQL at the biggest Internet properties in the world ARE database experts - Facebook for instance has a custom cache and memcached in front of their MySQL instances [1], shards their data and also uses other database technologies where appropriate. People like Facebook’s Mark Callaghan will be successful using any reasonable database.

[1] - http://gigaom.com/2011/12/06/facebook-shares-some-secrets-on...


If "is now on-par (or better)" means usable then I'll agree. I'll also agree that you don't have to be an expert to use it. Sometimes having expertise is useful.

Prior to version 5+, I'd have called it unusable as the lack of real foreign key constraints (for instance) was simply unacceptable for real applications. The argument (by NoSQL proponents too) that the application has to check data constraints anyway might be true, but ending up with dirty data is the worst possible outcome. Check it in both the application and the database if you can.


Besides, the assumption that only a single application will access the database is often wrong in the real world.


MySQL still isn't on par or better than its most comparable peer, Postgres. It's still nowhere near as terrible as Mongo, but let's not pretend it's amazing when it's merely serviceable.


Postgres got proper non-hackable replication in version 9... And it still isn't that great.


"Isn't that great" how? In that it imposes a bunch of limitations on you? Or that it frequently breaks and slaves have to be manually syned? Oh wait no, I was thinking of mysql.


MySQL replication is the one place it's got the edge.


Is it not the case that back in the day MySQL was better suited to running small to medium website backends with simple schemas out of the box? I know this hasn't been the case for a long time but I think it might be a contributing factor to MySQL's success.


The advantages were ease of installation on Windows, and network effects.


Have you ever used any other database? Literally every single thing you listed as being a plus of mysql is something postgresql and sql server both do better. Mysql is in no way even close to being on par with non-broken databases. A brief list of some obvious, glaring flaws with mysql that have been actual problems in practice:

no check constraints

views with aggregates are too slow to be used

no expression indexes

triggers don't fire on cascaded actions

no window functions

can't set default values to be the result of a function

no transactional DDL

doesn't have multiple databases, just schemas misnamed as databases

rollbacks are orders of magnitude slower than other RDBMS, and it is interrupted it can corrupt the database

functions can't use prepare/execute, so no dynamic SQL in functions

subqueries are broken: can't modify and select from the same table

functions can't be called recursively (seriously? is this 1962?)

triggers can't alter the table they are firing against

stored procedures can't be invoked from prepare/execute

"slow query" log has a completely useless resolution of seconds


You forgot silent data loss.


If PHP is a fractal of bad design, then MongoDB is PHP of database world.)

As for MySQL, Mongo marketing guys have learned the lessons from MySQL very well - there are nothing but "satisfied users", "good documentation" and "quickness and easiness" (any thinking, leave alone understanding aren't required) on their site and paid content - well-done "product", designed to trigger an ignorant consumer snap-judgement, the same way MySQL did at its time. They use the same sales strategy.

http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-de...


At least the MySQL guys eventually gave up trying to convince us that referential integrity is a crutch for sub-par programmers.


What referential integrity?) It is My free SQL, everybody's using it. Table-level locking? Never heard of them.)


Not sure what you are referring to here, but MySQL definitely has referential integrity constraints, and InnoDB uses row level locking. Not sure what level of transaction isolation support is build in though.


The InnoDB was not the default storage engine for quite a long time and it wasn't stable-enough for years.


OK, I'm not a huge fan of MySQL, but InnoDB is what is used now. I'm only really interested in the latest releases of MySQL.


Honestly, I see very few thought-out or backed-up arguments here. The server-level blocking, the unacknowledged writes, silly defaults, etc are not only old news but have been behind mongodb for a while.

You also get comments like 'it still has database level locking' but rarely is there an educated discussion about the work that's been done to yield locks as quickly as possible and the real-world impact of this design.

People seem intent on punishing the platform for old mistakes (or immaturity) rather than providing a balanced view of it today. And comments like "Absolutely no benefit to Mongo. It's even really slow, so there's really no advantage at all." are just misleading without context, because it's at all true as an outright statement.

I spent today at the MongoDB developers conference in Sydney, and heard the CEO speak about when they think mongodb is good vs when you should use RDBMS, heard them speak about their community and target market, and heard a lot of stories and strategies about how to prepare for scale, how to think about schema design, how to monitor and admin the database / clusters, etc.

I'll just say that from hearing them in person, and hearing what people say about both the platform and the "marketing guys", you'd never know they were talking about the same company or product.

My own company has had a lot of advantage from using MongoDB in recent projects over other technologies (we have a largely MSSQL background). No, we don't do 100GB+ "web scale" (yet). Yes, we build solutions to real world problems and make money. No, it's not a silver bullet. Yes, you should know what you're doing and read the docs.

To any hackers / entrepreneurs who are thinking that MongoDB might work for them I'd say do some research, talk to people who've actually used it, and understand the benefits / limitations for yourself.

But stay away from opinions not based in real experience (and anything that mentions problems from before version 2.4, it's nearly a year old people) or you might miss out on something good.


Not to mention that the authors only critique of MongoDB was global write locking from version 2.2. The rest of his article was empty text speaking NOSQL databases his friends think are cool and statements equating too "MongoDB is bad. Why? Cause it is."

All if this ends at his signature that mentions his scaling SQL book. Not like he has an axe to grind or anything.


Sheesh, what's with the MongoDB witch hunt? It seems like HN can't go 2 days without some disparaging of MongoDB. I get it. I'm an idiot for using MongoDB even though it suits my needs perfectly.


I'm a long time (3 years) MongoDB user. My experience of the current state of the art:

- The official documentation is clearly written by someone who doesn't speak English as a first language.

- Their official driver (Node.JS 1.3.13) silently throws away exceptions.

Both these have been acknowledged by MongoDB Inc (formerly 10Gen). I can't speak for you, but personally I do feel like an idiot for using MongoDB.


I really don't see that the author put forward a convincing argument. He seems to be stuck in the 100GB data blog post, but he confuses NoSQL with literally "don't use SQL", which is really somewhat missing the point of so much of this space.


From my experience, as soon as you need some kind of relation with Mongo, you end up writing more application logic to create and maintain your relations. This probably means you should move to a relational database.

For Mongo specifically, can anyone share a pragmatic use case where you did not run into problems with duplication?


Not sure if it counts as a pragmatic use-case, but here's my example.

I recently built a little webapp [1] to send email reminders about repositories that you've starred on GitHub. The motivation for it being that I have hundreds of starred repos on GitHub and can't remember what most of them are. Anyway, there is one thing I need to store in that database, users. Nothing else needs to be saved.

You can accurately characterise it is a toy application that I built for fun rather than profit. But there are lots of those around and Mongo is often a good fit. If I don't need complex schemata or a huge dataset, why would I choose something else that is less easy to manage?

[1] https://githubreminder.org/


One example is logs. People have logs, and lots of them. They're usually semi-structured (hopefully there is at least a timestamp and topic and severity in each record). You can put them in MongoDB, and relations are probably not a big concern.


Elasticsearch or Solr would be better than Mongodb for Logs.

Hell if you actually care about your log and want to search it use Solr or Elasticsearch NOT Mongodb.

If you want to store even more data just use Cassandra and index it with Solr or Elasticsearch.

Cassandra have faster write than read, it's perfect for logs.

You can decentralize logging system too, there are architecture layout out there with elasticsearch and such (example: https://medium.com/what-i-learned-building/e855bc08975d).

MongoDB is a poor choice, in general for anything in my personal opinion.



I have never really touched NoSQL for anything serious, yet i have worked with Riak, MongoDB and Redis. In the case of MongoDB i have worked with an ODM on top of it where i had to define a Schema and Object relations and was left wondering: What is the benefit of this anyway if i still define Schemas and relationsships ? Does it just make sense at a big scale when i would have to shard using MySQL ? Schema migrations arent that hard today in MySQL with a good framework, so i am not really sold on the benefits for smaller apps. If i can just save a JSON Object to it, thats cool (however Postgre is goot at that too), but in practice with server side validation and schemas i have to decode/encode anything anyway. Can anyone enlighten me on the real benefit ?

Edit I see many benefits in using Redis though, just for its sheer speed and more as a smart memcached replacement though.


There really is no benefit to using NoSQL if your schema is well defined unless your queries don't use indexes an all you want is a stack of documents. SQL is faster for structures data when the queries rely on well defined indexes and well defined data types


Absolutely no benefit to Mongo. It's even really slow, so there's really no advantage at all.

Redis, Riak and the like are actually useful for a change.


MongoDB slow? Could you post a benchmark to support this claim?


He wasn't saying that MongoDB is slow...


Sure he was: "Absolutely no benefit to Mongo. It's even really slow,..."


He did indeed, but it was in response to the following questions:

"In the case of MongoDB i have worked with an ODM [by which he means ORM] on top of it where i had to define a Schema and Object relations and was left wondering: What is the benefit of this anyway if i still define Schemas and relationsships ?"


i actually meant ODM which refers to Object Document Mapping in contrast to ORM which refers to Object Relational Mapping

eg: http://mongoosejs.com/ http://docs.doctrine-project.org/projects/doctrine-mongodb-o...


MongoDB stores JSON (in binary form) - serialized Javascript objects. That there is what looks to be a large and polished library for mapping between the two is a huge red flag to me.


Well there you are. I learn something every day! Sorry about that.


Can you explain this a little better since you apparently know all three?

I know Redis (and Mongo), not Riak, and Redis shines in certain aspects but depending on what you want it's not the right choice.


What is this ? 'Lets bash mongodb' week ?


And 'make sweet love to Postgres' year


It's been a truly great year so far!


Come on, give MongoDB a break. I use MongoDB and I am proud of it. And I have used numerous SQL DBs (MySQL, PostgreSQL, Oracle, MSSQL, SQlite) and HBase from noSQL camp, so I do have something to compare to.

Yes, it was overhyped (and is underhyped now, at least on HN). Yes, it has its problems. Yes, it is not correctly positioned in people's minds (web-scale? not really). MapReduce on top of MongoDB, in a single-threaded JS engine - are you kidding me??? But it is useful tool nevertheless; one I am happy to have in my arsenal.

MongoDB will NOT solve your scalability problems. Old news. It will even add some of its own problems on top of it. But man, is it nice to just fire that data into your database and forget about DB-level schemas... And when you need that data, it is just there. True, I have built an ORM on top of MongoDB so it manages the DB for me; but it was easy to do and is incredibly useful. I would have done it with MySQL too (actually, I did - and it was a PITA to build because of relational databases' rigidness).

So please, stop bashing MongoDB for wrong reasons. It is an excellent and easy to use storage - when used for the right kind of problems. Nothing more, nothing less.


How did you build an ORM on top of MongoDB? And why?


Sorry, poor choice of words... I have build an ORM on application level above MongoDB.

This level lets me specify data model in a simple JSON and the system then automatically performs all (high-level) data input validations, displays a proper web form to the user,... There are actually many similar systems to mine, I just couldn't find one that would offer all the functionalities I needed.

Note that DB is just a storage for me. It can't know enough to validate for example e-mails, because for DB they are just strings.


That's not an ORM, that's a data abstraction layer.


It's actually both. Term "data abstraction layer" describes its function while term "ORM" describes how it is done. Unless I am misunderstanding these terms?

But we are going a bit off-topic here. :)


An ORM maps a relational database to objects. Given MongoDB is most definitely not relational, you can't have implemented an ORM :-)


That's an excellent point! :)

I was so used to thinking of this system as an ORM that I never checked if the term still applies. So... I guess it is OnoRM now. ;)


What were you mapping?


agree, it's kind of like MS Access was, right? Dirt simple to use, lots of neat add on tools over the years, and great for rapid prototyping, but not really a database.


This isn't really being fair to MySQL. It was pretty terrible when it started out (corrupted data, no joins, no transactions, silly locking, didn't support half of SQL, etc.), but we must not forget just how terrible Mongo was and still is. It shipped with unacknowledged writes as the default for a long time.


> The article is about handling MongoDB if it grows above 100GB. It gives me the impression that scaling MongoDB to that size is a serious issue.

I don't think so. It gives me the impression that 10gen (MongoDb Inc.) has made a mistake how to present "handling MongoDb over 100GB".

> example: global write lock up to release 2.2

Why are we still talking about this? It's not global write lock any more, okay? Stating previous inefficiencies doesn't bring much value to the table.

> At this point it was inevitable to see MongoDB as a popular, yet poor representative of it’s species.

Show me the real arguments! Show me the real arguments!


> Why are we still talking about this? It's not global write lock any more, okay? Stating previous inefficiencies doesn't bring much value to the

It's now a db-level lock, which is an improvement but not optimal. And its locking system in general isn't as sophisticated as, say, a recent version of Postgres, MSSQL, or Oracle.

Neither is its query planner, where using an $or can cause it to scan all the documents in a collection despite the fact that you should technically be hitting indexed fields.


> ... despite the fact that you should technically be hitting indexed fields.

Never had that experience - could you elaborate? Indexes in MongoDB are a bit different from SQL indexes, so if you are comparing to that, you are in for a few surprises. :)


This discussion has happened before.

MySQL is to SQL like MongoDB to NoSQL - https://news.ycombinator.com/item?id=6481526


My guess as to the rise of MongoDB usage is that the rapid application development needs of the startup world led to people deciding that representing application data as JSON across all tiers speeds up the early stages of development. Unfortunately, as soon as data needs to be used relationally, developers are finding themselves pigeonholed in an inappropriate persistence format.

In short, developers that skip studying how their data might need to be queried in the future are to blame, not MongoDB.


I've heard that some people use the MongoDB syntax with postgresql using redis as caching, could someone comment on this idea ?


I question their sanity.


that is the worst idea ever


So he's saying MySQL is bad, and MongoDB (because of a statement on their home page) is kind of like MySQL, so MongoDB is also bad. Also there was some article about scaling it to "only" 100GB which means it damages NoSQL as a whole in the "most harmful way".

Sorry but I don't really get the reasoning.


MySQL is an awful implementation of SQL. MongoDB is even more pitiful of an implementation of NoSQL. It's a Fischer price database that had a global write lock until version 2.2. What exactly don't you understand about the analogous nature of two crumby data stores?


So what exactly makes Mongo such a bad implementation (apart from things that were already fixed)?

I don't mean to defend MongoDB, i'm genuinely interested in finding a better NoSQL DB. I just usually get to hear the same "Fisher price" bashing without any real arguments.


The problem you have is that there are different types of NoSQL databases. Key value stores, document stores, graph databases, you name it they have their own specialty.

NoSQL's name shows just how nebulous the concept is. It was first coined by someone who built a relational database but without implementing SQL, then it was later used to describe distributed, non-relational databases that ignored, downgraded or discarded some form of ACID. Now it can be a retronym for "not only SQL".

In this space, work out what your actually needs are, then carefully research the technology and pick whatever is most appropriate.


It still has a global write lock, but now it's only global to the database.


I did not know this. That would be pretty awful for a high traffic service.


It is quite awful. You basically get a graph showing latency as almost directly proportional to number of concurrent requests :(


Not to mention it is not fun to admin.


The reasoning is:

1. MongoDB and MySQL have similar market penetration

2. MongoDB and MySQL both have limitations

(these statements are independent)

Then, from 1 and 2, he concludes 3: In the same way that MySQL gives SQL a bad reputation, MongoDB gives NoSQL a bad reputation.


The problem for MongoDB is that its limitations are far less understood by its users than with MySQL.


Don't worry, most of MySQL and MongoDB bashing is uninformed prejudice coupled with a "holier than thou" attitude.


I never understood what Mongo brings to the table that ext4 doesn't.


Does this offer absolutely anything new ? No.

Is this person objective and well respected ? No.

Does this person understand why MongoDB/MySQL is popular ? No.

Is anyone going to gain anything from this ? No.


> Is anyone going to gain anything from this ? No.

Well, if it stops someone from using MongoDB without really thinking about whether they ought to, maybe it'll save someone some grief in the future.


If this was an intelligent piece analysing the pros/cons of MongoDB and when it would be an optimal choice then I could understand it. But this is the equivalent of blog spam from a guy who is selling a SQL Performance book.


Many articles regarding MongoDBs awful memory mapped simple b-tree data storage method have popped up on HN. Martin is merely talking about the politics of MongoDB; to call his links to polyglot methods and his information blog spam is dishonest to say the least. I thought it was a great post and I bet many other HNers who up voted it feel the same way.


> MongoDBs awful memory mapped simple b-tree data storage method

You realise that they are using default kernel implementation? Awful indeed. :)

If you are so anti-MongoDB, would you care to offer an alternative? It must support: - document-like storage (I already have schema on application layer), - simple use / administration, - hot backup, - single node operation.

While this list is mostly outlining why MongoDB is a great DB for my current use case, I am also genuinly interested in better solutions (for that case). So, fire away... :)


Let's see if I can make your day with a shameless plug.

First of all, using the kernel's paging algorithm for a database is awful. It has way less information about your access patterns than the database, so it's going to make a lot of bad decisions, and a bunch of MongoDB's problems with fragmentation and performance cliffs come from using mmap. It's quick and dirty.

Here's the alternative, and it supports the exact same document storage as MongoDB, the same administration interface, actually the same protocol everywhere, hot backup, and however many nodes you want: TokuMX[1]. It does all those things because it's actually mostly MongoDB, just with the storage engine replaced with an engine that works faster, compresses your data, and has more mature things like document-level locking for writers and better multi-document isolation semantics on a single node. Give it a whirl and let us know if you like it.

[1]: http://www.tokutek.com/products/tokumx-for-mongodb


Why is it called a "fractal tree"?


Marketing. No technical reason.


From what I can tell, this has been patented, but was implemented in part by Postgres much earlier.


I promise it hasn't, but I'm curious to see what you're talking about. Link?



That thread is talking about the possible inclusion of our indexing library inside Postgres, and was from before we had open sourced the code. IANAL but I think it would be possible to include the fractal tree library in Postgres now if someone wanted to do it, but none of it has ever been implemented by Postgres.


I stand corrected! Thanks for the info.


Everything you've mentioned is standard to SQL databases. If you have a schema you don't need NoSQL. Use an ORM for SQL for the language of your choice. Pretty much any db will run as a single node but you should never do that in practice, redundancy is security is stability


> If you have a schema you don't need NoSQL

Sure, but by that argument I don't need SQL either. I am not sure what your point is?

Anyway, the selling point (for me) is that I don't need to find some fancy ways to translate my schema to relational schema. For instance, let's say I have some text content in two languages; in relational DB the texts should probably go to a separate table and I should reference the main record by some foreign key. In document-oriented DBs I can just put both of the texts inside the record... To me that makes much more sense.

But yeah, if you are a hardcore SQL user who is used to normalizing the data until it hurts (as someone else commented) it is difficult to break out of the habit.

And after you eliminate SQL as an alternative (in my case) there are not many options left. At least I couldn't find (m)any.


If you know you are only storing two languages, they can go directly on the parent table, named english_text and spanish_text. If you are talking about the easyness of storing a hashmap in NoSQL, ie: { english :'text', spanish : 'text', french: 'text' }

You can achieve this in the same table with the JSON storage type:

http://www.postgresql.org/docs/devel/static/datatype-json.ht...

Really though, a decent ORM will take care of all the foreign key implementation, and a hashmap is well served by a 2nd table called 'texts' with 3 columns. I can see why you would be opposed to this, because of the extra join but given the right indexes it won't be slow at all.


Is the comment above going to get downvoted? Yes.


Of course. Bashing MongoDB is quite the fun past time here on HN.

Having evaluated nearly every database for my own startup I find it extraordinary that there such angst towards a technology. Sure MongoDB has its faults but so does everything and surely competition is good to keep everyone honest. I mean would PostgreSQL really have had HSTORE if MongoDB never existed ?

At least stop rehashing the same tired arguments for everyone's sake.


> I mean would PostgreSQL really have had HSTORE if MongoDB never existed ?

Yes. The hstore extension was added to postgres in 2006 (commit 642194ba0cdc0aada9c99bf7712fcae5f3ac86d1), though it predates that date by quite a bit.

The first commit to MongoDB was in 2007

But that doesn't really matter. The reason I dislike MongoDB is that it shipped with unsafe defaults for years and caused people to lose data because of that.

For me, the one must-have feature of a database is that it will always allow me to read the data I've written to it (minus hardware defects). A database which can't fulfill that one feature doesn't qualify as a database in my book.

MySQL has issues there (it allows invalid data to be written and then "corrects" it and I've seen it corrupt table and index data multiple times without any evidence of hardware issues) and so does MongoDB (unless you changed the default config).

This has nothing to do with NoSQL vs. SQL (though, personally, I really like the flexibility for querying offered by the relational model) and everything with data integrity.


Original HSTORE was simply key/value pairs which has nothing to do with MongoDB. My point was that the popularity of MongoDB likely was a big incentive to add JSON support to PostgreSQL which they did only last year.

http://lwn.net/Articles/497069/


> (unless you changed the default config).

Lamest argument ever... Are you suggesting that the majority of postgres (say) installs are running default config with no changes? Or even better, Oracle?


I'm running multiple postgres installs (albeit small ones) in the default config.

Also, the default config of most databases is very conservative: You won't get speed, but you will get safety and it'll run on whatever hardware you throw at it.

Mongo on the other hand came with unsafe defaults and no obvious warning for people to change it. Additionally, after fixing the configuration, you would lose a lot of performance to the point of it being impractical to run mongo in a safe configuration.


Majority of installs or majority of transactions across all installs? I suspect that the majority of installs are indeed running default configuration, or at least close enough to default that the user wouldn't have looked closely enough to spot the kind of configuration issue being discussed.

Larger installs, which probably do account for the bulk of transactions, are more likely to have been tuned for speed, although probably keeping all of the safety settings at (or above) their defaults. But part of the point here is that in a safe configuration Mongo makes things slower than default.


The reason I dislike MongoDB is that it shipped with unsafe defaults for years and caused people to lose data because of that.

Surely when using any sort of new technology, you'd do some basic research about write concerns, durability, etc. first? If you just lazily adopt a new technology without reading the documentation, that seems like a bit if a recipe for disaster to me...


"Safe by default" is not a revolutionary concept. That's the reason I won't touch Mongo - not because of any specific bug or failing, but because it was designed from the outset without a consideration for what I consider a pretty fundamental engineering principle.


I'm sorry, but the hstore extension is around 3 years longer.

first commit for hstore http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;...

Wikipedia states the first release of mongodb for 2009


Read the commits. That was when it was storing key, value pairs.

HSTORE as most people know it is the JSON/V8 implementation.


HSTORE still is only storing key, value pairs. Nested HSTORE is something we might get in 9.4. JSON/V8 does allow nesting but doesn't allow indexes for arbitrary value access - something that MongoDB easily does.

So depending on your use-case and the type of documents and queries you are dealing with, neither the current HSTORE implementation nor the current JSON implementation will work for you, while MongoDB's model might.

But the reverse is true too: Some use-cases are highly relational and it would be really painful to do that with MongoDB due to the lack of easy joins (you have to do them in your application logic which means a lot more work as your requirements change).

Pick the right tool for the job.

Me personally, I prefer the immense flexibility for querying a relational data set over the flexibility of not having to deal with schemas. The reason is that it's much easier to change or add an SQL query than it is to re-format and re-duplicate all of my data followed by more or less manually implementing as good query plan in application code whenever the requirements change.

But this is a pure matter of taste.


"I mean would PostgreSQL really have had HSTORE if MongoDB never existed?"


Does your back hurt from moving those goalposts?


That's some gorgeous schooling right there, kudos to you sir.


The rehashing of "tired arguments" is usually because the ones receiving it (Mongo-users in this instance) are in love with their technology and just dismisses the issues...

There is a reason that RDBMs with SQL is still alive almost four decades after its inception... it was a quantum leap forward (and still is)

The point is data quality, data durability and flexibility... and performance...

Remember, those who are ignorant of history are doomed to repeat it...

Btw. as others pointed out, HSTORE actually predates MongoDB, but I give you that some of the fire under HSTORE is due to MongoDB's "threat"... As a very happy PostgreSQL user I thank MongoDB for that at least :)


> Having evaluated nearly every database for my own startup I find it extraordinary that there such angst towards a technology.

It's a cultural problem. When you go to a web dev meetup you inevitably meet a programmer in their early 20s who thinks SQL is obsolete and Mongo will be everywhere.

The problem is that it's really nothing new -- it's an XML Database using JSON. There are well understood advantages and disadvantages.

It's just frustrating to constantly encounter people who are convinced that it's a silver bullet.


I agree wholeheartedly, the value of a strict schema and a (mostly) standardized query format providing a known good strategy to get the result - and a culture of taking care of its customers data's safety - really is the selling point for SQL.

But anything old is not worth doing to some people (luckily reproduction is the oldest human endeavour ever, so hopefully they won't multiply :)


Agreed. The second point definitely after reading his other blog posts. This also applies to the Diaspora coder who blamed MongoDb for her company's problems. And anybody else who's having a relational epiphany.


Tired of this. I think MongoDB is the best thing ever!


Not surprised that a person who gets his salary with SQL work is unhappy with a competitor that does not use his favourite word...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: