Hacker News new | past | comments | ask | show | jobs | submit login

I think there's one other problem that affects most NoSQL systems - the perception that adopting a NoSQL means you don't have to think about your data. When designers and front-end developers want to develop a web application they can do one of two things; a) say "don't worry about the back-end, we'll just throw it in a NoSQL database" or b) learn to also be a back-end engineer.

The idea that you don't need to worry about your data structure is deadly. Every successful project I've been involved with thinks seriously about the data model. ER diagrams with 160 tables aren't uncommon and knowing the structure of your most common queries helps you make sure that your database isn't over-normalized. There's a science to data systems after all.

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." - Frederick P. Brooks from The Mythical Man-Month.

I quote or paraphrase this fairly frequently as I've come to believe it. If I understand your data structures, I can pretty easily tell what you might be doing with the system and I can read the code to see which parts you've implemented. But don't use this as an excuse to delay working on the rest of the application. There are Architecture Astronauts in the realm of database modeling too!

"Normalize until it hurts, denormalize until it works!" - Unattributed adage.

The key is to understand your data (and it will provide an amazing boost to the rest of the project). If you're worried about having your data-model perfect before you start coding, you should have started coding already. There are practices you can adopt that make refactoring your database more tenable. I recommend you read the "Evolutionary Database Design" article by Martin Fowler and Pramod Sadalage (http://www.martinfowler.com/articles/evodb.html), then adopt a workable technique as a team.




I will go further on to say. The biggest problem is the perception that a particular tool will make up for bad practices, general indiscipline and many other small time problems.

If you start with such an assumption only bad can to the project.

MongoDB has its uses, but if you are making an assumption that its going to make up for SQL related use cases then you shouldn't be surprised if the consequences are going to be disastrous later on.


Well-said!


Yes, this is often overlooked, and one of the main downsides to NoSQL databases. Schemas don't really go away they just get pushed into the application layer. When multiple apps use the same database, schema maintenance and evolution become a real issue.

Perhaps it's because much of web scale data is arguably tissue paper data, more useful in the aggregate. Transactions and constraints on the data seem to be less a concern.

In this regard I think key-value stores are preferred over document stores such as MongoDB and CouchDB. A key value store makes it clear where the semantics of the data belong.


"In this regard I think key-value stores are preferred over document stores such as MongoDB and CouchDB. A key value store makes it clear where the semantics of the data belong."

True, but I've used CouchDB successfully in applications too ... that's a mental "note to self" you need to keep. If the value in a key-value store is completely opaque, you can't do views (my favorite part of CouchDB).


I totally agree, I've used CouchDB a lot myself and a "mental note to self" or other best practice can suffice, especially if programmers leave or otherwise document those notes for their successors :)

CouchDB views are really neat, something that distinguishes it from others.


I worked on an ambitious project which used MongoDB (as a kind of cache to service queries only) with some success in an app which dealt with huge variety of semi-structred, ad-hoc data. The system was to empower users to create and use their own schemas in an excel-ish kind of way, i.e. no waiting on DBAs or programmers to get stuff done.

But on reflection, the capabilities of MarkLogic or, if we had an infinite budget (this was a thing we built while still doing our core work), some pile of RDF/SPARQL-tastic workflow and a team of 10 engineers gluing something together that's drivable by normal human beings - would have been better.

We found the promise of "schemaless" MongoDB a little lacking back in the Mongo 1.6 era with respect to supporting the ad-hoc queries users wanted to do. That's down to paging, which needs sorting, which doesn't scale unless you have indexes. And I don't mean in a performance sense, I mean, it just died and you got errors instead of sorted results. The work-around was to use limit to make the sorted result fit inside a single BSON document, but then anticipating exactly what that limit should or can be when you have wildly diverse document structure and contents was very tedious and so stupid hacks ensued...

... but in conclusion, this is the type of use-case I hope MongoDB would be appropriate for.

It was still the right choice for our limited budget and resources, we got the 80% solution very quickly indeed.


I don't doubt that MongoDB can be used successfully, but I think the whole point of the article is that MongoDB's success is due to marketing rather than suitability. Where MongoDB is truly suitable, I'm sure it shines. I've got CouchDB in production on two systems, but for applications or portions of applications that are well-suited to a document-store.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: