Hacker News new | past | comments | ask | show | jobs | submit login
Building Better Node.js Applications with RethinkDB (nodecraft.com)
100 points by CherryJimbo on April 21, 2015 | hide | past | favorite | 40 comments



> over 70 million API requests a week.

Congratulations. However, if you're going to imply that traditional SQL databases are non-performant in comparison, you may want to do a bit more looking at the performance numbers both PostgreSQL and MySQL are capable of.

100+ million transactions per day is pretty easy to achieve with a traditional SQL DB. Also, you can do a lot of filtering in software specialized for those use cases, reducing the need to consume processing time in the Node.js V8 thread.

I don't want to knock RethinkDB here, just point out that there are stronger performance cases to be made.


You're absolutely right - there's nothing stopping a standard MySQL setup pulling off the same stats, but it just wasn't what we were looking for, coupled with the various other decision points made in the post. RethinkDB's native ability to shard and scale is the key performance case to be made, in my opinion.

As mentioned by another user though, the filtering here is done entirely server-side (at RethinkDB's level), so the V8 thread performance isn't really a concern. https://news.ycombinator.com/item?id=9411738 for more info.


Yes, seconded. I read the article really interested in learning what level of capacity Rethink is capable of in the wild, but didn't get a great feel for that. It's not very helpful to frame database performance in operations per week. Peak load would be a better starting point.


You are correct, our customers (we sell a key/value store) manage 1B entries / day and do north of 500 million requests per day.

70 millions requests / week is actually very weak for a NoSQL database and should be left out of the promotion material.


The filter you see is actually compiled and sent to the server. The client isn't doing any filtering. This is actually one of the more powerful features in RethinkDB, IMO. The query language mimics local javascript.


The filtering done in this article is actually run on the server, not the client.


So is rethinkdb what mongo was always meant to be?


In my experience, yes, pretty much! It's a really amazing database, and it's baked-in web UI for management and dead-simple clustering is just perfect for the use cases I've slotted RethinkDB into.


All of the benefits sound the same as MongoDB, which also does sharding and JavaScript queries, including arbitrary code in "callbacks" in map-reduce or $where clauses (though it's slow).

Can someone explain the benefit of RethinkDB over MongoDB?


Slava @ Rethink here.

RethinkDB is based on a fundamentally different architecture from MongoDB. Instead of polling for changes, the developer can tell RethinkDB to continuously push updated query results in realtime -- check out http://rethinkdb.com/faq/ for more details on this.

Just FYI, Rethink does sharding and JavaScript queries (including callbacks) -- http://rethinkdb.com/api/javascript/js/. The `js` command interops with every other command in RethinkDB and works really well.



Just one benefit is that it has joins, server-side joins.


I'm probably forever spoiled by Rails' ActiveRecord - has anybody here used http://thinky.io with Node/RethinkDB? I really like to define my objects in an OOP fashion with class/instance methods included and would prefer a convenience "ORM" library for that purpose.


Thinky is really top notch. Michel keeps it up to date and constantly adds new features to it. I highly recommend checking it out


Maybe it's just me but:

    var signupFilter = new Date();
    signupFilter.setDate(signupFilter.getDate()-30); // get a timestamp from 30 days ago
    r.table('users').filter(function(user){
         return user('signup_date').gt(signupFilter);
    }).run(function(err, user){
         // ... a cursor to stream through the data which can also be converted
         //  to an array using the toArray method
    });
Is a lot nastier than:

    db.query('SELECT * FROM users WHERE signup_date > CURDATE() - INTERVAL 30 DAY;', {}, function (err, users) {
        // Do stuff
    });
No?


I would agree that the latter is less 'nasty' in that it is a query written in a DSL designed to make querying tabular data more understandable. It would be a major failing of SQL if that were not the case.

However, what the former does that the latter does not is provide a mechanism to build up and modify queries. It can be very helpful to pass around and affect partial queries when you are looking to eliminate repetitive code for building multiple, similar queries.

This is probably why many abstraction layers have query builders that look a lot like the syntax that you call nasty to generate the 'less nasty' raw SQL.


So, basically you're saying it's OK that the abstraction layer for RethinkDB creates nasty code because it's native JS? I don't agree. I think nasty code is never OK.

Why can't the RethinkDB abstraction layer just be better? Why isn't:

    var dateFilter = new Date();
    dateFilter = dateFilter.setDate(dateFilter.getDate() - 30);
    rdb.table('users').get([ { column: signup_date, gt: dateFilter } ]).each(function (err, user) {
        // Do stuff
    });
Possible? That's just off the top of my head how I would do it in a way that's native JS and far better than what's presented in the article.


You can do this in RethinkDB:

    r.table('users').filter(r.row('signup_date').gt(signupFilter)).run()
Using an anonymous function like in the first example is optional. `r.row` can be very convenient. In this case, you only need to use `r.row` because you're using the `gt` comparison. If you're just doing a direct match, you can use this syntax:

    r.table('users').filter({signup_date: dateFilter})


Maybe we can agree that we have different definitions of what 'better' is. I can understand the design decisions and trade-offs that would likely have been considered in designing the API as presented.

I'm also a bit wary of trying to introduce too much 'magic' just to make things convenient.


Doesn't look like your solution would compose very well. You would need a solution to combine arbitrary terms in a nice way.


It's more verbose, but if you have had the chance of building SQL queries from raw strings, it's obvious that the first solution composes a lot better (eg, you can add a filter in a nice way).


Seems like a great option for real-time on the backend. But what would you use to push real-time data to a client-side JS app?


Lots of choices there: Faye, Socket.io, Pushpin, PubNub, etc.

I'm curious to see someone try integrating RethinkDB with Meteor.


Don't forget vanilla tech like WebSocket, EventSource or good old fashioned ajax.


WAMP and SignalR also come to my mind


One of the best optimizations I have done was switch my caching from using a key-value serialization to an array form. I'm caching fairly large data reports about computer performance. It brought the average cache size from 100 megs to around 4 megs.

[{'first_name':'mike'},{'first_name':bill'},...]

vs

{meta: {'header':['first_name']]}, data: [['mike'],['bill],...]};

This is why I can't consider databases like RethinkDB or Mongo. They are taking such a big hit already for using something like BSON as a storage that I don't even want to be involved in that train of thought anymore.


> r.table('users').filter(function(user){ > return user('signup_date').gt(signupFilter); > }).run(function(err, user){

Whenever I see one of those "awesome" no-SQL queries, I can't help but think about how ugly and bulky they are compared to SQL.

Or compared knex.js for node:

select().from('users').where('signup_date', '>', signupFilter)

It supports promises. What's wrong with that?


I can easily switch "RethinkDB" with "MongoDB" and the article would still be correct.


Except for the part about real time notifications, which is RethinkDB's killer feature.


How would RethinkDB's real-time capabilities compare to doing something like this with Mongo? (I'm genuinely curious about any shortcomings with the following implementation, besides scaling.)

    // plugins/replicate.js

    var replicator = require('replicator');

    module.exports = exports = function replicatePlugin (schema, options) {
      schema.pre('save', function (next) {
        // notify subscribers of save
        replicator.notify('save', this);
        next();
      });
    }



    // schemas/game.js

    var replicatePlugin = require('../plugins/replicate.js');
    var GameSchema = new Schema({ ... });
    GameSchema.plugin(replicatePlugin);


I don't know what the replicator module does, but from RethinkDB's end, it allows you to avoid polling (which is a scalability thing, lots of really nice library APIs are built on top of dead-slow polling of the database).

RethinkDB also lets you write queries and subscribe to changes on the query itself. So in mongo, you can tail the replication oplog to avoid polling, but you still need to filter every event happening on the database for the ones you're interested in. On top of that, if you need to do transformations of the data, you have to re-apply them to what you get from the oplog. With RethinkDB, you write the same query you would have, and the database can be very efficient in only sending changes you're actually interested in, and send them with the transformations you asked to be applied.

Check out these slides if you're interested: http://deontologician.github.io/node_talk/#/


so are we saying that

    r.table('users').filter(function(user){return user('signup_date').gt(signupFilter);}).run(...
is running this filter in the database, not in node?


It does run on the database server-- see the documentation for more information: http://rethinkdb.com/api/javascript/filter/

They key line(s) is/are:

Predicates to filter are evaluated on the server, and must use ReQL expressions. You cannot use standard JavaScript comparison operators such as ==, </> and ||/&&.

It's essentially equivalent to:

r.table('users').filter(r.row("signup_date").gt(signupFilter)).run(conn, callback);


Am I correct in assuming that the implementation is similar to that of LINQ and ORMs like NHibernate or Entity Framework? Example: https://msdn.microsoft.com/en-us/data/jj573936.aspx


Yes, that's right. If you want to know how it's implemented, check out this post -- http://rethinkdb.com/blog/lambda-functions/, it goes into quite a bit of depth into how the drivers work.


A small note -- you can use arbitrary javascript expressions with the `r.js` command (rethinkdb.com/api/javascript/js/).


ah, that makes a bit more sense. Still impressive though. Thanks for the information.


Just a shout out, I am not sure when they evaluated the different db's. But Postgres ships with hstore and json store for people who want to go schema-less.


I'd rather use a database purpose built for what I'm using it for. I.e., if I'm using Postgres, I'll suck it up and plan a schema. Their json stuff strikes me as a second class citizen.


I am curious, why you think postgres json stuff is second class citizen ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: