Antirez: You Need to Think in Terms of Organizing Your Data for Fetching

bialecki · on Oct 10, 2012

Also keep in mind that learning to "organize your data for fetching" is not necessarily something you can do before you start your project. Many (most?) times you can't predict which data access patterns will be most common and benefit from using Redis, etc.

Starting with a "slower, but flexible" datastore like a traditional relational database, monitoring which access patterns need a boost, and then optimizing or introducing a new datastore is almost always a solid plan of attack.

spenrose · on Oct 10, 2012

I still feel the canonical answer to "what should my default data management policy in write-some-read-a-ton situations" is to, at write time:

    1) store an appropriate write-whole/data-mining-friendly format
    2) ASAP, for each major view, write out a Redis-style O(1)-to-read data structure
    3) think carefully about backup and replay strategies

You trade a slightly stale read of the very hottest data for much improved performance on everything else, and more importantly, much simplified view code.

The best reference I have found for this pattern, and it isn't great (too big-SQL-centric), is "command-query-responsibility segregation":

http://www.udidahan.com/2009/12/09/clarified-cqrs/

MatthewPhillips · on Oct 10, 2012

if the CEO/owner/founder of your company is non-technical he/she will request the data in ways you wouldn't have thought about in advance. That's just reality. That makes Redis not appropriate for most companies. It's also too expensive for side projects. So that leaves technically-led startups. Which is a good chunk of companies (and probably the funnest to work for).

noelwelsh · on Oct 10, 2012

Lots of people use Redis at a cache, not a primary data store. You can have full querying in your SQL database and fast access to common requests using Redis.

In what way is it too expensive for side projects? It's the easiest data store to compile and run that I've used.

MatthewPhillips · on Oct 10, 2012

Here are some Redis hosting options, you tell me if this is affordable for a side project:

https://openredis.com/

http://redistogo.com/

kermatt · on Oct 11, 2012

How are 3rd party hosting options part of a cost comparison?

Hosting any DB offsite comes at a cost, and it does not appear that any one database platform has an advantage over another in terms of a service provider.

dagw · on Oct 11, 2012

My current little side project uses about 200-300 MB of database storage. Using Heroku, that would cost (pr month) $9 on their shared postgres database, $10-15 using mongodb, $50 using heroku's dedicated production postgres service, or $125 using redis.

That's quite a difference.

inkel · on Oct 11, 2012

Wait. If you use 200-300MB of DB storage, openredis costs $69 using it as a Heroku add-on and $45 if used independently.

$125 ($90) is the price of the large instance, which offers 1.7GB of storage.

That's quite a difference.

dagw · on Oct 11, 2012

I stand corrected. I was using the prices from redis-to-go, which was the only available option last time I checked, and the service with the big prominent link at the top of the heroku pricing page. Good to see that there is some competition in the area and that prices have come down a bit. Still not cheap enough to move my hobby project from a 1GB VPS, but reasonable enough if I ever turn my project into something that might make money.

res0nat0r · on Oct 10, 2012

The smallest plans listed are $7 and $8 a month. Sounds affordable for a side project.

benarent · on Oct 10, 2012

FYI, http://redistogo.com offers a free plan. 5mb, 1 database. Perfect for a side project.

MatthewPhillips · on Oct 10, 2012

Doesn't give you much though.

ironchef · on Oct 10, 2012

First, running redis remote where you have WAN times involved is typically not the most performant. Second, if it's the cost, run it on a VPS where you'll get more memory for the money (hopefully the same VPS provider you use for other things).

seppo0010 · on Oct 10, 2012

That's why is common to replicate to some SQL server for non-real time operation.

dman · on Oct 10, 2012

Doing transformations for one off reports is okay. Doing extensive data transformations on data that is loaded millions of times every day is not.

0x10c0fe11ce · on Oct 11, 2012

  So anyway if your data needs to be composed to be served, you are not in good waters. -- antirez

Perfect! This sentence puts an end to SQL vs NoSQL holy wars. There's no silver bullet. But we 'all' knew that already :)

siganakis · on Oct 10, 2012

Data should not be organised based around retrieval or insert / update patterns but organised according to the model that best captures the essence of what the data is. That may sound fluffy, but most of the time data captured is captured following something real happening that caused data to be generated. Your data model needs to make sense in the context of the thing that happened in the real world, not in the context of what is inserted or how it is read.

The issue being that youre methods of collection and retrieval will change over time and your data model needs to support that and still make sense for existing data.

shuzchen · on Oct 11, 2012

While what you describe is certainly the ideal, and likely applicable in a variety of situations, there are a lot of real world situations where it can't cut it. Antirez says it best:

   remember all those stories about DB denormalisation, and hordes of memcached or Redis farms to cache stuff, and things like that? The reality is that fancy queries are an awesome SQL capability (so incredible that it was hard for all us to escape this warm and comfortable paradigm), but not at scale.

You'll be hard pressed to find any medium-sized project in the wild that doesn't require a layer of denormalization or caching to be reasonably responsive (I mean, some frameworks come with that built-in - their users might not even be aware this is happening). You might have some beautifully crafted model underlying that layer, but don't fool yourself into thinking that's all there is.

nsp · on Oct 11, 2012

QQqfzrqqztzs