Congratulations. However, if you're going to imply that traditional SQL databases are non-performant in comparison, you may want to do a bit more looking at the performance numbers both PostgreSQL and MySQL are capable of.
100+ million transactions per day is pretty easy to achieve with a traditional SQL DB. Also, you can do a lot of filtering in software specialized for those use cases, reducing the need to consume processing time in the Node.js V8 thread.
I don't want to knock RethinkDB here, just point out that there are stronger performance cases to be made.
You're absolutely right - there's nothing stopping a standard MySQL setup pulling off the same stats, but it just wasn't what we were looking for, coupled with the various other decision points made in the post. RethinkDB's native ability to shard and scale is the key performance case to be made, in my opinion.
As mentioned by another user though, the filtering here is done entirely server-side (at RethinkDB's level), so the V8 thread performance isn't really a concern. https://news.ycombinator.com/item?id=9411738 for more info.
Yes, seconded. I read the article really interested in learning what level of capacity Rethink is capable of in the wild, but didn't get a great feel for that. It's not very helpful to frame database performance in operations per week. Peak load would be a better starting point.
The filter you see is actually compiled and sent to the server. The client isn't doing any filtering. This is actually one of the more powerful features in RethinkDB, IMO. The query language mimics local javascript.
In my experience, yes, pretty much! It's a really amazing database, and it's baked-in web UI for management and dead-simple clustering is just perfect for the use cases I've slotted RethinkDB into.
All of the benefits sound the same as MongoDB, which also does sharding and JavaScript queries, including arbitrary code in "callbacks" in map-reduce or $where clauses (though it's slow).
Can someone explain the benefit of RethinkDB over MongoDB?
RethinkDB is based on a fundamentally different architecture from MongoDB. Instead of polling for changes, the developer can tell RethinkDB to continuously push updated query results in realtime -- check out http://rethinkdb.com/faq/ for more details on this.
Just FYI, Rethink does sharding and JavaScript queries (including callbacks) -- http://rethinkdb.com/api/javascript/js/. The `js` command interops with every other command in RethinkDB and works really well.
I'm probably forever spoiled by Rails' ActiveRecord - has anybody here used http://thinky.io with Node/RethinkDB? I really like to define my objects in an OOP fashion with class/instance methods included and would prefer a convenience "ORM" library for that purpose.
var signupFilter = new Date();
signupFilter.setDate(signupFilter.getDate()-30); // get a timestamp from 30 days ago
r.table('users').filter(function(user){
return user('signup_date').gt(signupFilter);
}).run(function(err, user){
// ... a cursor to stream through the data which can also be converted
// to an array using the toArray method
});
Is a lot nastier than:
db.query('SELECT * FROM users WHERE signup_date > CURDATE() - INTERVAL 30 DAY;', {}, function (err, users) {
// Do stuff
});
I would agree that the latter is less 'nasty' in that it is a query written in a DSL designed to make querying tabular data more understandable. It would be a major failing of SQL if that were not the case.
However, what the former does that the latter does not is provide a mechanism to build up and modify queries. It can be very helpful to pass around and affect partial queries when you are looking to eliminate repetitive code for building multiple, similar queries.
This is probably why many abstraction layers have query builders that look a lot like the syntax that you call nasty to generate the 'less nasty' raw SQL.
So, basically you're saying it's OK that the abstraction layer for RethinkDB creates nasty code because it's native JS? I don't agree. I think nasty code is never OK.
Why can't the RethinkDB abstraction layer just be better? Why isn't:
var dateFilter = new Date();
dateFilter = dateFilter.setDate(dateFilter.getDate() - 30);
rdb.table('users').get([ { column: signup_date, gt: dateFilter } ]).each(function (err, user) {
// Do stuff
});
Possible? That's just off the top of my head how I would do it in a way that's native JS and far better than what's presented in the article.
Using an anonymous function like in the first example is optional. `r.row` can be very convenient. In this case, you only need to use `r.row` because you're using the `gt` comparison. If you're just doing a direct match, you can use this syntax:
Maybe we can agree that we have different definitions of what 'better' is. I can understand the design decisions and trade-offs that would likely have been considered in designing the API as presented.
I'm also a bit wary of trying to introduce too much 'magic' just to make things convenient.
It's more verbose, but if you have had the chance of building SQL queries from raw strings, it's obvious that the first solution composes a lot better (eg, you can add a filter in a nice way).
One of the best optimizations I have done was switch my caching from using a key-value serialization to an array form. I'm caching fairly large data reports about computer performance. It brought the average cache size from 100 megs to around 4 megs.
This is why I can't consider databases like RethinkDB or Mongo. They are taking such a big hit already for using something like BSON as a storage that I don't even want to be involved in that train of thought anymore.
How would RethinkDB's real-time capabilities compare to doing something like this with Mongo? (I'm genuinely curious about any shortcomings with the following implementation, besides scaling.)
// plugins/replicate.js
var replicator = require('replicator');
module.exports = exports = function replicatePlugin (schema, options) {
schema.pre('save', function (next) {
// notify subscribers of save
replicator.notify('save', this);
next();
});
}
// schemas/game.js
var replicatePlugin = require('../plugins/replicate.js');
var GameSchema = new Schema({ ... });
GameSchema.plugin(replicatePlugin);
I don't know what the replicator module does, but from RethinkDB's end, it allows you to avoid polling (which is a scalability thing, lots of really nice library APIs are built on top of dead-slow polling of the database).
RethinkDB also lets you write queries and subscribe to changes on the query itself. So in mongo, you can tail the replication oplog to avoid polling, but you still need to filter every event happening on the database for the ones you're interested in. On top of that, if you need to do transformations of the data, you have to re-apply them to what you get from the oplog. With RethinkDB, you write the same query you would have, and the database can be very efficient in only sending changes you're actually interested in, and send them with the transformations you asked to be applied.
Predicates to filter are evaluated on the server, and must use ReQL expressions. You cannot use standard JavaScript comparison operators such as ==, </> and ||/&&.
Yes, that's right. If you want to know how it's implemented, check out this post -- http://rethinkdb.com/blog/lambda-functions/, it goes into quite a bit of depth into how the drivers work.
Just a shout out, I am not sure when they evaluated the different db's. But Postgres ships with hstore and json store for people who want to go schema-less.
I'd rather use a database purpose built for what I'm using it for. I.e., if I'm using Postgres, I'll suck it up and plan a schema. Their json stuff strikes me as a second class citizen.
Congratulations. However, if you're going to imply that traditional SQL databases are non-performant in comparison, you may want to do a bit more looking at the performance numbers both PostgreSQL and MySQL are capable of.
100+ million transactions per day is pretty easy to achieve with a traditional SQL DB. Also, you can do a lot of filtering in software specialized for those use cases, reducing the need to consume processing time in the Node.js V8 thread.
I don't want to knock RethinkDB here, just point out that there are stronger performance cases to be made.