Hacker News new | past | comments | ask | show | jobs | submit login
NoSQL, Heroku, and You (heroku.com)
76 points by shawndumas on Sept 12, 2010 | hide | past | favorite | 18 comments



I find it interesting that every time "NoSQL" solutions are listed, noone mentions Berkeley DB. It provides many options that you'd normally find in k-v stores and many more. Is it just not cool enough / too old / ... ?

It seems very similar to the time (a year or two ago) when every couple of weeks people got excited by a new cool wire-protocol (around the time of protocol buffers, thrift, etc.), but noone even mentioned ASN.1.


Berkeley DB is an embedded database with no network story, let alone distributed story, so it isn't really a complete solution compared to the k-v stores people talk about. It can be a building block though, e.g. memcachedb uses it.

ASN.1 is hideously complex, part of the appeal of all those new wire protocols is simplicity.

SQLite is Berkeley DB's main competitor... it's actually funny that NoSQL on the server is all the rage, but SQL use is now really popular client side.


> Berkeley DB is an embedded database with no network story, let alone distributed story

True, true and not really. It can be easily distributed - replication works great with them. Yes, it's not good for everything, but it's great for example for simple web api servers. Skip the connections if you don't really need it and write/read k-vs at a ridiculous speed locally.

> ASN.1 is hideously complex

Yeah... I guess this is more of a personal preference, but I never found it that bad. Write a wrapper that can (de)serialise your native objects once and you don't have the problem anymore.


No one lists BDB, they list Tokyo Cabinet instead. As far as I know (but I'm not an expert), BDB has been completely superseded by TC.



As someone that's been interested in this stuff but has not been able to apply any of it for work/personal purposes, this is a pretty darn useful summary of the solutions out there.


Even if you aren't interested in using Heroku this is a good read. I hadn't seen the mix of technologies referred to as polyglot persistence before but it sounds appropriate.


This is a good reference point, at least for starters, when someone asks you "what kind of NoSQL database should my application use, if any?"


It was an interesting article, but it would be interesting to see what the author has to say now (it was written about a month ago) after the spectacular failure of Digg/Cassandra.


Heroku just gets it. Top-notch hosting aside, for personal exploration, there's no better platform: frictionless, free, and dead simple.


I'm sorry but Heroku is far from being free, unless you mean free as in Free Software.

Google App Engine is another awesome PaaS which is really free, because the free quotas are huge compared to Heroku's tiny 5MB database.


Let me clarify, my comment was in regards to personal exploration and nothing beyond that. Most add-ons have a basic, very limited, free option. The same goes for the pricing on Heroku's platform. If you're rockin' an app that needs to be production scale, baseline won't cut it on Heroku; frankly, you get what you pay for so I don't expect it to be free.

Point for point, GAE's baseline quotas are obviously better but it's not an apples to apples comparison in my opinion.


Interesting read, but as with almost all of these pieces it starts off with a completely incorrect statement about old-school databases that turns it from good information to, essentially, propaganda.

"SQL (the language) and SQL RDBMS implementations (MySQL, PostgreSQL, Oracle, etc) have been the one-size-fits-all solution for data persistence and retrieval for decades."

This is completely untrue.

For instance some RDBMS systems are purely in-memory. Some are optimized for SSDs. Some are forced persistence, where durability is job #1.

With an RDBMS you have the option to use bounding (and expensive) transactions. Or you might not.

You have the option to normalize. Or to denormalize. Or to store all of your data in a giant table that is nothing but a varchar. Or to find some balance in between.

RDBMS systems have supported loose replication for many years -- see replication in SQL Server, with multiple masters, conflict resolution, and as much decoupling as you'd like.

You have always had the option of choosing and picking your style of ACID with the classic RDBMS.

The RDBMS solution was never a one-sized fits all solution. Some would then argue that either you use them as a fully-transaction, ACID, fully-normalized stack or you're "effectively using NoSQL", which is utter bunk that defies reason.

That particular bit of NoSQL advocacy has always derailed the conversation because it isn't factually correct and turns it into a religious argument.

Then there's the RDBMS are some rusty, approach-

"The SQL databases we’re using today were designed over a decade ago. They were written with the constraints of 1990s hardware in mind: storage is cheap, memory and cpu are expensive."

This, and the conclusions drawn out of this, are so extraordinarily wrong that I don't even know where to begin. It's yet another example of trying to twist reality show how the RDBMS has rusted, but it's completely in defiance of reality.

The weak point of the RDBMS chain has virtually always been I/O -- getting lots of IOPS has always been a problem, and it is virtually always the weak link in most database operations. IOPS to the disk matter because most database systems don't consider the job of a transaction done until the operations have been confirmed completed to the disk.

Storage has never been cheap (despite the absurd claims in this article). It has always been the most expensive part of the equation! To get a decent platform to run a moderate sized database on is almost always the most prohibitive part of the equation, with ridiculously expensive rigs from high end SAN providers.

But of course we now have SSDs. Limited IOPS have been the Achilles heal of the RDBMS, especially for those who heavily normalize (it's an option in some scenarios), but SSDs move us from 100 IOPS per disk to 15000+ IOPS. If anything, the RDBMS was designed for tomorrow's computer.

And for the record, at this very moment -- when I pulled up HN for a distraction -- I'm working on a MongoDB solution. Despite my appreciation of the product, I have a strong, very strong, dislike for misleading propaganda. The RDBMS has some serious downsides, but manufacturing a new reality to sell alternatives isn't the best approach.


I completely agree, I find it weird whenever someone mentions Hadoop as as NOSQL. Hadoop is a Distributed Computing system. While HDFS is the Database.

Hadoop is more of an Distributed O.S. to run process and store data across multiple machines.


err HBASE is the database and not HDFS.


Isn't what they presented as a standard description of RDBMS servers incorrect, simply because only those kind of servers are available in FOSS world? Unless you want to spend a lot of money, your options in 200X were MySQL and Postgres. Not many others were mature / polished / known enough. There was always some niche for SQLite of course, but that's not reasonable for a multi-user service.

On the other hand almost every NoSQL solution comes as a FOSS project. If such project specialises in some way, it's a much better choice if you can prove that feature X (which the project provides) is what you need.

So the opinion of RDBMS as a "one-size-fits-all solution" makes some sense if you take the software availability into account.


  because only those kind of servers are available in FOSS world? 
You can use several storage engines with MySQL and it has replication too. Far from being on-size-fits-all.


I completely agree. There's no way I can take these NoSQL extremists seriously until they stop foaming at the mouth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: