Denormalize. Match your data rows to your access pattern (i.e. your UI). Naive example: if you have a webpage that displays a list of employees, and it must have their department name and boss in that list, you put that data in the employee row. What is the probability that a boss will change his or her name causing you to have update a ton of records? Very low. (not zero mind you, so you have to be able to do it). So why pay for the join every query?
There are no longer, in my view, any situations where a SQL db is the best idea. You either want a giant NoSQL database, or you want a massive in-memory object-graph using pointers. Or you want something for $20m from Oracle or IBM.
The problem is not having to update tons of records, the problem is seeing one day, after 2 years of having the app in production, that the listing shows that employee X works in department Y and her boss is Z, but Z is not the head of Y. Bugs happen and referential constraints go a long way towards keeping your data clean.
Yeah, I was worried about all that. Hell, I was worried NoSQL couldn't possibly work at all, given my experience of SQL and the joyful things that happen there.
But I've found that my object model has evolved to handle the "scariness" of the back end. If someone wants to change the boss of an employee, they are doing if via an http post. So I've got to check that the key I was sent over http is even an employee at all and not some javascript bug. Since I have to do that, I might as well read the data into my employee object. Then the code to update employee X with employee Y as a boss is pretty straight forward and thoroughly unit tested. The code to serialize an employee is thoroughly unit tested.
Not saying shit cant happen. Now look me in the eye and tell me you never had some noob drop a constraint and forget to put it back.
You could periodically run a script that checks all the records for errors (especially embedded records that might have drifted from their current value, and not been properly changed by the app-level constraints), and automatically correct them (plus log the error).
If Michael Arlington changes his job from "editor in chief" to "founder, former editor, occasional contributor, and CEO of Arlington Investments", and his old posts aren't all updated, it's not the end of the world.
It really depends on the problem domain. You wouldn't run a bank's ledger off MongoDB. On the other hand, a bank's ledger should be radically simple, with little need for normalization.
What is the probability that a boss will change his or her name
That's obviously an example of something that will practically never happen, which is why it doesn't work all that well as a justification for ditching SQL databases altogether.
I've never used NoSQL for anything, so there must be a lot that I'm missing, and that's why I asked. But it seems to me like you'd be digging up necessary information through quite a few steps if everything is "flat".
On the contrary, its the SQL database thats "digging up the necessary information through quite a few steps" it just that massive effort required by the SQL server is hidden from you, the programmer, by a one line bit of text called a SQL statement. So you do it all the time. Indeed we've been taught that denormalizing is the "proper" thing to do because otherwise "Bugs happen and referential constraints go a long way towards keeping your data clean."
Digging kills you. I assert that SQL does the digging automatically, and thats exactly why it doesnt scale.
Yes, an SQL DB does the digging for you, but with NoSQL you'll be doing it yourself, right?
Your app will most likely have some kind of "entities", and then records to represent them. How much information can and should you cram into records of various "types"?
How much information do you typically end up duplicating across all those "entity records", and is it not a problem?
As I said in my original complaint: I, and many others, spent a lot of time figuring out how to write apps that do it the "app engine way"
That included learning NoSQL. At least that part was not a waste. There are no right answers to your questions, there are only right actions, starting with stepping outside the SQL box and writing an app using NoSQL. I started by thinking of a simple app that would be useful to me personally. I knew java servlets, I knew SQL, I knew all sorts of things, but after several iterations my app is architected like no app/server I've ever written before. Almost every iteration involved starting doing it the way I knew how, running into either roadblocks or major cognitive dissonance, and then rewriting it to fit these new-fangled constraints. Its been a huge learning experience. You might like to try it.
>> What is the probability that a boss will change his or her name
> That's obviously an example of something that will practically never happen
Women changing their name when they get married? A tiny assumption like that can make our software brittle. Now every model that caches the old name needs updating and you need to make sure there aren't any overlapping saves in any of those models that'll overwrite any items in your bulk update. If a single linked model has the wrong old-name cached, your data update process is buggy.
Now every model that caches the old name needs updating and you need to make sure there aren't any overlapping saves in any of those models that'll overwrite any items in your bulk update. If a single linked model has the wrong old-name cached, your data update process is buggy.
Well, that sounds like the kind of stuff I'd like the other guy to talk about. How does he avoid the bad sides of having all your data in a key - value store?
There are no longer, in my view, any situations where a SQL db is the best idea. You either want a giant NoSQL database, or you want a massive in-memory object-graph using pointers. Or you want something for $20m from Oracle or IBM.