MemSQL CTO here. Great article- Domas has done a good job of digging into the internals of MemSQL! A few questions/comments:
1.) The range query issue you pointed out can be explained by a well known limitation of skip lists. Unlike B-Trees, skip lists are unidirectional. By default, our indexes are ascending, so indeed you have to skip to the end to run a MAX() or "ORDER BY id DESC" query. To fix this, just change the primary key to be descending in your schema:
CREATE TABLE x (id int, ..., PRIMARY KEY id (id) DESC, ...)
2.) The transaction-buffer > 0 setting is really where we shine. Synchronous durability is something we have for flexibility, but I'll be the first to admit that it's not really optimized to perform well. The customers that we're working with are okay with this. And it's inspired by what modern companies do. Maybe it's changed in the time since I've left (Domas?) but Facebook certainly does not write comments synchronously to disk.
3.) Our synchronous durability does indeed flush in 50 MS cycles, so you'll see poor performance on single threaded inserts. However, as you pointed out, we're optimized for the case with multiple concurrent writers. Since MemSQL implements group commit on high parallel load throuput picks up. Sure, writing your own very specific benchmark you can show us writing poorly, but we've never worked with a customer that's needed single threaded, synchronous writes to shine. If this is your use case, unless you need substantial read performance, MemSQL is not the database for you.
Regarding #2, we do normally run MySQL with full transactional durability at Facebook. We have for a very long time (several years at least). For example, here is a FB note regarding our enhancing performance under full durability in MySQL from Oct 2010:
sync_binlog=1
innodb_flush_log_at_trx_commit=1 (which is default, but we add it to our configs since we run with =2 on the slaves since they don't need full durability)
We do use some performance related settings related to this as well:
innodb_flush_method = O_DIRECT
innodb_prepare_commit_mutex=0 (enables group commit in the Facebook patch, other branches have a different setting)
For settings like innodb_doublewrite we leave the default of enabled.
It's not specifically related to the original article discussed here, but skiplists don't need to be unidirectional. For instance Redis implements all the sorted sets operations using doubly-linked skiplists, so once you identify a node you can traverse other nodes in both directions.
Good point- ours are currently unidirectional because they have to be lock free. We have some ideas on how to keep them lock free and make them bidirectional, but that's not part of the current release.
I don't think the point of the article was seriously about the details of performance in MemSQL versus MySQL. I took it as a criticism of the irresponsible reporting of benchmarks.
Yet you still claim your memSQL is durable by default. Except its not. That's the point. If you make memSQL durable in the configuration (since it's not actually durable by default as you guys claim), then the performance is apparently terrible. Very horrible kind of terrible.
The problem is the claim of being durable by default. It is, to be extremely charitable, a stretch of the truth. Asynchronous durability is essentially an oxymoron.
Even though skip-lists as described in literature are singly linked (horizontally), they can be augmented with a back pointer to make them doubly linked. This would allow one to re-use an index as either a forward or backward index, and the order of the index is no longer important. I am not sure how easy this is to do in a lock free fashion. However, if one were to adopt a strategy which combines functional data structures and RCU, I am guessing that it would be possible to implement it with relatively good readability.
Additionally, skip-lists can be used for performing range queries as well. This means that "SELECT * FROM table ORDER BY id DESC LIMIT 5;" can be executed very efficiently as long as there is a skip-list on "id". Additionally, even COUNT(), MAX(), MIN(), etc... queries can be optimized in a similar fashion. Again, however, doing it in a lock-free manner might not be the most fun thing to do.
Skip list is essentially a hierarchy of linked lists. It is very hard to make a doubly linked list work concurrently without using locks. One would have to update two pointers atomically in order to insert a new element in a doubly linked list.
You could probably pull that off if you implement it like full text search indices in e.g. Lucene. Have every modification create a "diff" skip list that takes precedence over older skip lists, and then have a merge operation that runs asynchronously (and will either have to lock, or copy the entire data structure and switch it in atomically). That's also essentially how Bigtable's disk format works.
Modern x64s have 128-bit CAS instruction, but the problem is that those two pointers can be far apart in memory, in two different elements of the list.
The tl;dr here is: MemSQL is optimized for a different context.
That's all well and good, but as a consequence, it is only fair to compare yourself to other systems that are configured to take advantage of the same context.
You seem to be missing the main reason why people have an issue with MemSQL. The issue are quotes like this on your website and video:
"MEMSQL IS 30 TIMES FASTER THAN MYSQL."
It's an extremely invalid and biased comparison. You're quite literally comparing the speed of writing to RAM versus the speed of writing to disk. If you didn't make such ridiculous assertions, people would accept your product for the actual awesome things it does and not focus on refuting your bullshit claims.
Hi Tim, we're not comparing "the speed of writing to RAM versus the speed of writing to disk." You can run InnoDB with a buffer pool large enough to keep the entire database in memory and we'll still outperform it significantly. We're actually working on a blog post right now with that comparison.
I think it might help more to explain how your design works and why we should trust it, than to provide simple benchmarks proving the speed. Speed is only one part of the equation, and frankly it's the one I care about only after durability is taken care of. Explain thoroughly how you achieve both and it will be more relevant to me. Looking forward to the post.
This should give you a good idea about how durability will react to tuning, but it doesn't dive deep into the internal design. If there's enough interest (seems like there is) we'd be happy to discuss it in a blog post.
Tim, if you want to talk substance, you have to peer deeper under the covers. And if you do that, then you just can't overlook the fact that the query execution model used by MemSQL is significantly different than that used by old school relational databases, including MySQL. And what's different is that MemSQL translates you SQL query into extremely efficient C++ code. Code that is compiled and executed natively. Whereas MySQL, SQL Server, Postgress, Oracle - all of these products evaluate queries by interpreting their respective tree representations of your SQL queries. So which database do you expect to be faster? The one that runs native code or ones that interpret?
This is a huge differentiator we are talking about here. For someone who has been around databases for quite some time (of which I had spent about 5 years working on SQL Server engine) this is very exciting. Is it not?
>And what's different is that MemSQL translates you SQL query into extremely efficient C++ code. Code that is compiled and executed natively. Whereas MySQL, SQL Server, Postgress, Oracle - all of these products evaluate queries by interpreting their respective tree representations of your SQL queries.
This sounds like absurd cargo culting. I've never designed a database but parsing the SQL can not have ever been the bottleneck; the fact that you rarely write to disk is where any speed improvements can be found.
You are asserting that databases are always I/O bound and never CPU bound. Your assertion is wrong. A properly tuned database will become CPU bound (whether it's MemSQL or MySQL). At that point hyper-efficient execution results in higher throughput and lower latency.
> At that point hyper-efficient execution results in higher throughput and lower latency.
I don't get excited for 1% decreases in latency. I'm willing to bet money that the performance penalty behind parsing the sql queries is asymptotically a constant - or at least, a tiny fraction of the time spent computing the data set.
I feel especially confident in this because memsql never brags about the specific increase in speed. This can't be hard to measure.
Phill, parsing has little to do with what goes on at query execution time. Parsing is typically done once for a given parameterized query (and MemSQL automatically parameterizes queries with constants). After parsing comes compilation which produces a query plan. The query plan is cached and that is the thing that is actually used to evaluate the query when it's re-submitted again. We are discussing the differences in how MemSQL handles executing compiled queries. Executing compiled queries is what a (well tuned) database spends most of its time doing. It is the hot code path. And if you cut the instruction count in half you can expect latency to decrease proportionally and throughput to increase.
You are making all these skeptical claims that are technically wrong. I suggest you download MemSQL, compile a query and read the generated C++ code. Then download MySQL source and look at that too. At that point you will be coming to your own conclusions and I'll be happy to resume this debate.
I've actually written a compiler from database queries to native code (JIT'ed bytecode actually, but doesn't make a difference here) once.
Some queries in databases are indeed CPU bound, but the general statement that well-optimized databases turn to be CPU bound is not correct like that. It all depends on the use case.
If you're thinking full table scans with complicated processing on a database that's entirely in memory, then yes, you'll be CPU bound. But many (most?) databases are not entirely in memory, and there's no reason they need to be. With reasonable memory and indexing, most queries boil down to only a few IOs/disk seeks, so you're looking at <50ms, which is entirely fine for your average web request.
That's the case for the majority of "get by ID" style queries, which in my experience really are the lions share of queries, and in particular are also what needs to be fast. A complicated reporting query can take a couple of seconds, but your "main page/item view" must be fast.
If you have random reads on a large database that cannot be kept in memory, this will be the majority of your workload. Those queries will spend all their time waiting for the disk, and more or less none interpreting your query tree, so optimizing the query tree processing part makes no sense at all.
TL;DR, in my experience optimizing query tree interpretation does not yield significant benefits for the majority of queries, and IMHO won't be a significant differentiator for MemSQL outside of marketing.
@Nitramp: Fortunately for all of us this world is full of unsolved problems many of which exist outside of consumer web.
There are use cases where 50ms is way too long, and where jitter associated with disk I/O is not acceptable. Capital markets for example. You are looking for sub-millisecond latency to be in the intraday post-trade game, and single digit milliseconds for quote-to-trade. The number of instructions in the hot code path (and cycles per instruction) actually starts to matter. Lock-free sturcutres utilized for storage are also important. This combination gives MemSQL some rather unique low-latency properties that certain folks can exploit to their delight.
To your other point, random reads and key-value lookups is where MemSQL shines. Since you never wait on I/O and don't take locks, optimizing execution makes a lot of sense actually. All that's left is execution and network.
Since you never wait on I/O and don't take locks, optimizing execution makes a lot of sense actually. All that's left is execution and network.
I'm not sure where you are getting this information. According to the MemSQL documentation, they have one and only one isolation level, READ COMMITTED. But READ COMMITTED takes write locks and blocks reads, but doesn't take any read locks. Blocking behaviour still (and should!) still occur.
It sounds like you are looking for dirty read behaviour, but from the literature MemSQL doesn't support READ UNCOMMITTED.
Update: Read a later FAQ on the MemSQL website. They are using MVCC, so write locks are irrelevant. My apologies to the MemSQL team.
@Criss
Isolation levels are used to specify semantics only. DBMS is free to implement them in different ways.
Take a look at this VLDB 2012 paper about optimistic concurrent control in main memory databases, like MemSQL, which also describes implementation of isolation levels:
http://arxiv.org/pdf/1201.0228v1.pdf
Get by id queries are majority when the database is used as a key-value store, and the only reason to such an underuse of RDBMSes is precisely because under heavy load some of them have hard time doing proper queries, with joins, subqueries and all the stuff databases are designed for.
It's not "underusing" an RDBMs, those are just the lions share of queries in many applications. Also note that a join say from some parent item by ID to children who have a parent_id makes essentially two lookups by ID in two indices, so the same reasoning applies; this is not just SELECT * FROM bla WHERE id = :x.
I don't know how to tell you this, but query plans parsing is never the bottleneck in any database. You're just plain wrong.
Compiling queries to C++ is great for some BS bullet point on a website with BS claims of performance gains, but it's only a niche feature at the end of the day. There are more important things that should be getting tackled first: multi-node distribution, better fsync handling, etc.
It is not about the cost of parsing SQL text. Think of it as intermediate bytecode for SQL that is interpreted as query is being executed. Every SQL database does some variation of it.
What is unique about MemSQL is that this bytecode is actually compiled to native code using GCC.
I agree with what you have to say, but I don't see how you can determine that this query:
SELECT * FROM table WHERE id > 5 LIMIT 10;
is actually of the same form as this query:
SELECT * FROM table WHERE id > 10 LIMIT 5;
without actually parsing the two. I mean you will incur a parsing overhead either way. What I think MemSQL is trying to optimize out is the actual execution of the query. I mean rather than interpreting and running it with checks performed in every iteration, they are compiling the code. I don't know how much of a difference this would make without actually seeing the numbers.
That's why prepared statements are preferable because those two statements have to be parsed as unique queries. They are actually the same query with different parameters.
As far as I know that's the same on just about all databases.
At least some databases I've used have parsed out constants in the client library code and submitted prepared statements to the server, passing those constants in. That meant that the server got better use out of its query cache; on the server side, those are the same query.
Not sure which ones if any do this today, the optimisation might no longer be relevant - I spotted DBs doing this 10 years ago when CGIs that didn't bother with prepared statements were rife.
Not every parsing is equal. MySQL and other databases produce a tree as they parse a query. If you just want to match parameters you can avoid memory allocations and just replace parameters with ^ or @ signed in place in the query string.
Actually, it can have a significant impact but it's a corner case. It's a moot point though - as an example, SQL Server stores plans in a memory cache. [1]
edit: incidentally, one of the corner cases I can think of is joining 10 tables, which gives you 10! combinations for the query processor to work through. This is irrelevant in MemSQL, because you can only join to another two tables, and you cannot do right outer for full outer joins.
The original DB2 for VSE and VM (not DB2 for z-series/390) which was the productized version of System R did exactly this - compiled to native code. Subsequent DB2 implementations chose not to go down this path - even for single OS systems like DB2/390 and DB2/400.
In any event, I'm skeptical if this is going to make very much of a difference for the following reasons:
1. The time spent by the runtime/interpreter in "evaluating" the operators or expressions is really small relative to the heavy lifting of the actual relational operators to say nothing of I/O etc.
2. Most serious implementations don't really interpret the plan using a giant switch statement. For instance, with PostgreSQL, the interpreter puts in function pointers on first iteration. Thereafter it's only slightly worse than compiled native code.
Re 2.: you can get significant improvements when compiling to native code. You still have to dereference those function pointers, which is essentially the same as a big switch statement (if not worse, your function pointers might be anywhere in the heap, not in your cache line).
When you compile to native code, you can essentially save the dynamic dispatch on query parts, plus all the other benefits you get from compilation.
E.g. if you do good type analysis, you can also compile to much more efficient data representations, which will again have benefits for memory access. The compiler can also inline code, do loop lifting, and all those goodies.
But overall I strongly agree with 1), any time spent doing execution is usually dwarfed by IO.
"And what's different is that MemSQL translates you SQL query into extremely efficient C++ code."
What does this mean? Does the SQL get converted into actual C++ source code, then compiled with an internal C++ compiler? That seems like a weird thing to do. Or is it translated directly into an AST? If yes, I can't imagine other db's not doing it? I don't understand the claim being made here.
Yes, they compile SQL statements into linux binaries using a bundled GNU compiler toolchain.
I am also curious if there is some way the developers have demonstrated the benefits of this approach vs. what other systems do. Most of the CPU time in a simple query in VoltDB is spent in networking, validating and suffering the cache-miss pain of walking index data structures. I can't see clearly how compiling to native code makes any of that much faster.
In-memory storage layer is very efficient. There are no buffer pool pages to manage or cache misses that result in I/O. That puts an accent on what goes on between storage engine and network, that is on query execution. Tight loops in native code is a good way to cut down on CPU cycles required to evaluate a query.
Actually, no. You speak like database engines have to parse a query every single time it's run. That's not true - any reasonable database engine has a plan cache. Just because something is compiled into object code, doesn't make it that impressive.
In fact, making a plan completely stable can cause problems if the data distribution changes dramatically. I asked about this problem in a previous article about MemSQL, and the answer was that the plan won't expire unless you drop an index or add a new one. [1]
Whenever one sees a claim about new database technology outperforming standbys like MySQL and Postgres by factors of 10x or more, it's a good idea to suspect the out-of-the-box configuration isn't really durable. What kills write performance in databases is the time it takes to write to disk, not the software overhead.
Very good observation. Established DBs already do a decent job optimizing the path from network to disk, at least in the most basic one client, one server test. So say they are hitting the 80MB/s throughput. If someone's benchmark shows 800MB/s throughput on same hardware -- you know they are making stuff up.
Vendors lie. That is the sole truth of IT purchasing. When testing network gear, some of our network engineers would say, "Vendors lie, but packets don't".
Nice quote! More generally when considering any solution you should always test it in your environment according to your performance needs BEFORE you buy it. Any vendor worth their salt will let you do this.
For what it's worth SQL Server doesn't meet his definition of "durable." It writes a transaction log and then writes those transactions to disk on a checkpoint [1]. Typically, that checkpoint automagically happens, but you can also force it to clear the buffer with a "checkpoint" command.
InnoDB also does this, too [2]. I don't know if MemSQL actually has checkpointing or what, but it's just worth noting.
I think you may be missing the point. Most databases use transaction/redo logs, it has to do with how often the writes to the transaction log are flushed. With proper durability, the transaction log is flushed with every commit, not when buffers become full or a background thread gets around to it.
This is a little misleading; an internal checkpoint happens upon transaction commit[1]. The comment makes it sound as if one or more transactions can commit before a checkpoint writes them to disk.
The transaction is actually written to disk twice. Once when it is written to the transaction log, and then again when a checkpoint happens and it's written to the underlying data blocks. Transaction log writes are sequential and fast, writes to the underlying data blocks are random and slow. The whole point of the checkpoint is to convert this sequential i/o to random i/o in an efficient manner.
Still, that's not what we're talking about. We're talking about the difference between writing data to the transaction log, and writing data to the transaction log and then flushing it. Once you call fsync() and the data is flushed you know for a fact (provided the hardware isn't lying to you) the bytes are on the damn platter, and not in some OS buffer cache.
However in practice most operating systems and/or file systems and/or disks cheat. fsync() is usually buffered in the hard disk itself. You can disable that and force the hard disk to truly write out on fsync, but that is so prohibitively slow that people rarely do that.
If you want absolute durability, you'll have to have hard disks running on some battery buffered power supply, which is a common configuration.
On the other hand, in a proper database system, at least the data files won't be corrupted by a missing fsync, so it'll come up. Figuring out whether that one commit did or did not go out in the very rare event of a fatal power failure and a just pending fsync() and the commit making it out of the network stack in time is probably a Heisenberg-esque inquiry into obscure realms of uncertainty.
First of all, ext(2|3|4) and XFS filesystems honor fsync. That's what all the noise about Firefox and SQLite was all about. Secondly the HP RAID controllers I'm familiar with disable the drive write cache by default, and throw dire warnings if you try to turn it on, and make you ACK your choice like:
Without the proper safety precautions, use of write cache on physical drives could cause data loss in the event of a power failure....
The bottom line is that Database pros insist on and get actual durability, and with a battery backed write cache, it's not painful.
SQL Server writes to the transaction log immediately, it writes the contents of modified data pages on checkpoints, but only for transactions that have committed. For data pages that haven't been written to (dirty pages) when the system crashes, the recovery process can recreate completed transactional changes from the log (which is rolling forward), or wipes out failed transactions from the log (rolling backward).
From MemSQL's home page: "MemSQL writes back to disk/SSD as soon as the transaction is acknowledged in memory."
From the article: "See, MemSQL also has “transaction-buffer” setting, which, in default “full durability mode” will asynchronously return “ok” until 128M buffer is full (or background log flusher thread writes it out)."
Whenever I read an article like this, I often end up feeling that on the 'truthiness' spectrum the scale goes like this: lies, damn lies, statistics, benchmarks ;-)
Every public benchmark is a horrible blasphemous lie. The only benchmarks that are honest are the ones you conduct in private, with your very specific use case. There is no point in publishing them, because they're useless to everybody but you. Public benchmarks, on the other hand, are crafted as advertising for your favorite [fill in the blank].
If you generate some data which would be dishonest when published, it is also dishonest in private, except that you are only being dishonest to yourself and nobody can review your dishonesty.
If you generate data which are honest in your use case, then you can publish them honestly as long as you are specific about your use case.
And yet, there are useful statistics and useless ones, useful benchmarks and useless ones. The benchmarks posted here seem good enough to demonstrate a few truths about MemSQL for people that might be earlier been taken in by hype.
This reminds me of some early MongoDB benchmarks. Claims of extremely high write throughput when really it's writing to memory and relying on mmap to eventually sync the data to disk.
Making write speed claims when you're not touching the disk is like bragging about how fast you can cat a file to /dev/null.
When debating a fresh new database project - benchmarks aren't really the way to go, because those can be improved. What should be debated is what underlying fundamental has the new project gotten right. Almost every successful database project got something fundamentally right - MySQL, Redis (in-memory), MongoDB(page based storage), Riak(Distributed).
On that note, here's my take on MemSQL - People who love MySQL, almost never want to migrate. People who hate MySQL, usually hate it because of SQL. People who are far along in the projects enough that decision of database is no longer a matter of taste, don't use a brand new untested DB.
Therefore, I am unsure how a brand new, in-memory, SQL based solution fits.
> People who hate MySQL, usually hate it because of SQL.
Or because of the shortcuts it has taken to avoid the "hard stuff", which results in developers taking shortcuts by using MySQL to avoid the "hard stuff".
Love or hate SQL for what it is, not the MySQL implementation of it.
> Or because of the shortcuts it has taken to avoid the "hard stuff", which results in developers taking shortcuts by using MySQL to avoid the "hard stuff".
One thing he may be referring to is a cost-based optimizer (CBO). Dismissing the significance of a CBO is a mistake.
For one thing, it's not an optimizer like gcc at all. It doesn't take your existing algorithm and speed it up, it chooses a better algorithm. How does it know which is better? It collects statistics and runs them through a cost model.
This is what allows something like PostgreSQL to offer nested loop joins, merge joins, and hash joins. Without a CBO, how would it know which to choose? Similarly, it can offer multiple ways to use the same index (normal index scan, bitmap index scan) and multiple ways to aggregate (Sort+Group or HashAggregate).
Those are just a few of the many algorithms postgres offers that are chosen in a principled way by the CBO.
Avoiding RI (in part due to poor FK support depending on version / storage engine), designs that avoid simple schema changes because they can't be done online, point version migration difficulties.
Consider unsupported SQL constructs from other engines (CTEs, window functions, better join and subquery support), and the effort and mess to accomplish what should be straightforward tasks generates SQL hate.
Some NOSQL projects have merit as they provide mechanisms for working with data structures that do not map well to rows and columns. However, difficulties faced with MySQL have also pushed some projects to use equivalently broken data storage technology, when a "better" RDBMS would be more appropriate.
Citation needed. Which database system on earth doesn't do page based storage? In fact, you cannot get any durability guarantees if you don't write page-at-a-time.
MongoDB is relying on the operating system's cache manager by using memory mapped files. Whether that is a good idea depends a lot on your operating system, and whether it's page replacement algorithm fits your database use case. Most database implement their own page cache because they want tighter control over those algorithms.
I like MongoDB's data model, I don't understand their choice of query language, and I think their database engineering is odd to say the least. What's the point in having a non-durable system if you don't even get scalability across nodes? Where's the hot backup? And so on. It seems like something people use who just couldn't figure out indices in relational databases, just to run around and tout that NoSQL is so much faster :-(
This is actually a fair and valid point. On all but the most delicately configured systems, the standard configuration of basically all SQL databases does not occlude committed data loss during power failure due to the hard disk's write cache. See for example http://www.postgresql.org/docs/8.3/static/wal-reliability.ht...:
When the operating system sends a write request to the disk
hardware, there is little it can do to make sure the data has
arrived at a truly non-volatile storage area. Rather, it is the
administrator's responsibility to be sure that all storage
components ensure data integrity. Avoid disk controllers that have
non-battery-backed write caches. At the drive level, disable
write-back caching if the drive cannot guarantee the data will be
written before shutdown.
Which almost nobody does, because performance falls through the floor unless you're pimping a $500 RAID card (don't even mention software RAID).
The SQLite documentation goes into further detail surrounding the atomicity of writing a single 512 byte sector during a power failure (on old drives, on 4096 byte drives, and on SSDs). Few people seem to account for any of this stuff, yet sleep soundly at night regardless.
While backup power might be expensive in a hard drive situation, it looks like even cheap SSDs are available with power loss protection. And as far as I'm aware you're better off using SSDs for databases anyway.
Note that writing data to a single disk (or SAN array, or RAID controller) really isn't durable either, even if the the data does actually get to the disk and isn't in a write cache somewhere.
What if that disk crashes, or the SAN array brakes and kills all the data, or the data center burns down?
Since it's impossible to completely avoid catastrophic hardware failure in the real world, obviously that can't be the threshold for durability. But losing data to anything short of hardware failure means you're non-durable. Software crashes can't lose data. Kernel panics can't lose data. Power outages can't lose data.
> Isn't that just a higher likelihood of "durability"?
It's much, much, much more likely to have a memory failure (a crash anywhere from thread to hardware) than a hard drive failure. I don't mind them claiming that as durability.
Well you have to come up with some way to measure it. How about "time I expect this data to be retrievable"?
So you can measure memory and process durability in hours to days, and spinning disk durability in years. I don't know how long SSDs last yet.
Say, (just making up numbers) you expect your servers to run 10 days without a reboot or crash, and your disks to last 5 years before failing. That means writing to memory is about 0.005479 as durable. So it's not great, but it's not zero either.
An interesting thought experiment: how many machines replicating data only in RAM does it take to be more durable than one machine with a spinning disk?
I think this whole conversation is making a mountain out of a molehill. When you're building sites which have enormous performance requirements, you take what you have and you force it to scale. You don't need to compare one tool to the other and find the perfect benchmark. Whatever you use, I promise you, you can make it perform as fast and scale as far as you need it to.
So just pick the tool that fits your requirements the best and stick with it. That being said, I also urge you to pick something free and open-source if you can. Some of the ways big sites have leveraged MySQL to the extent they have is due to the custom patchsets they apply. Closed-source solutions may not provide similar hotfixes or customization (and when it comes it's months later than they claimed they'd have it ready)
[1] Someone create a project on GitHub
[2] Upload an amply sized CSV full of data
[3] Publish list of queries needed to "gauge" speed
[4] Allow anyone to post their metrics to see how well tuned any database can be in comparison
The only other criteria would be to ask them to all use the same form factor. EC2 makes the most sense.
When someone writes on their intro "... and now in return I want to waste your time a bit." it's a sign that someone isn't being serious.
Domas' main criticism is that our video, http://vimeo.com/44087431, uses a MySQL with standard defaults. Since tuning a database is by definition a custom process, we wanted to demonstrate what performance you'd get "out of the box." The video speaks for itself. MemSQL can push 80k/sec vs 3.5k/sec for MySQL on EC2.
Domas actually demonstrates MemSQL's robust durability controls: he bounds MemSQL to disk. If you want to throw an anchor behind a speed boat, you can do it, though we don't recommend it.
Even Domas agrees his article isn't serious: "I do agree, that its execution speeds for table scans are not bad – 8M rows scanned a second (for SELECT COUNT(*) query) from single thread is sure a great achievement."
If that's the highest praise I can get from Domas, I can live with it.
No, that is not the highest praise you can get from me. It definitely is very interesting technology, and what it does well, it does really well - though I didn't get to benchmark parallel performance, as I had work to do :-)
I wouldn't have written that post the way it is if your marketing would be honest and with lots of actual technical merit. There're plenty other companies in the community that have invested much more into decent technical communication.
I agree that in some cases pure durability isn't needed and the best-effort with enforced maximum slippage is good enough, but still, pure durability implementation is extremely naive, and I can show that it hits benchmarks really badly :-)
Your optimizer does not optimize very simple things, and though you can claim that in-memory performance hides that, still, it is nowhere close a drop-in replacement of any kind.
I understand what you've done from technology perspective, and it is an interesting approach (albeit I'm not trying to evaluate applicability).
I just cannot approve the way you did initial publicity - unfair comparisons are unfair comparisons, and if you're using them that way, I can come up with my unfair comparison :)
I thought his main criticism was that MemSQL isn't durable by default, and that when you enable that it becomes incredibly slow. That's a pretty big deal, and IMO running database benchmarks on an unsafe configuration like this is extremely dishonest.
That's a bit heavy-handed; additional benchmarks will be published against a variety of configurations and other databases.
Companies use MemSQL because it uses memory as the primary locus of data. If you have a fast data problem, you couldn't use a disk-based system. At high speeds, you'd want to deploy any database in an active-active mode.
How is it heavy handed? As I understand the article, MemSQL by default will often say that a transaction has been committed even though it only exists in memory, not on disk. Comparing that scenario with a MySQL setup which only says a transaction has been committed once it's on non-volatile storage is just wrong.
I just want to echo mike above with a slightly different tact that perhaps will get across my perception of our feelings.
It's dishonest, at the very least, to claim ACID compliance and then do benchmarks against a known ACID compliant competitor without having the same level of ACIDity enabled in your program.
If you want to prove how fast MemSQL is against MongoDB, that's one thing (I understand it also can be "fully durable" but it is not by default due to speed concerns). To prove how fast MemSQL is against MySQL, you have to prove it on the same playing field, which means it has to, necessarily, be just as ACIDic and respond to "real" queries on large datasets with high concurrency.
tl;dr Do some benchmarks of apples-to-apples ACID, and do something about the common case of a head of a table, then you will be honestly faster (or not) than MySQL
Since 2.0, the journal is enabled. It flushes to disk every 100ms. Operations can return successfully without writing the data to disk. This is similar to what MemSQL is offering by default.
then your marketing is bullshit, claiming durability when you know that's not what you're doing with memsql. talk about wasting people's time with claims of 30x speedup, laughable
> Since tuning a database is by definition a custom process, we wanted to demonstrate what performance you'd get "out of the box."
Yes, and Domas was criticizing that as a strawman; as comparing apples to oranges. Since they tailor their defaults to different criteria (e.g. different standards of durability), of course they will perform differently.
The only really valid benchmarks are when you get an expert in each to tune their respective products to optimize for a specific scenario, and see how outperforms who. And then make public exactly what configuration was done to get there.
How do you feel about the reported performance of "SELECT * FROM table ORDER BY id DESC LIMIT 5;"? Is this something that needs to be improved there, or are you happy with that being one of the built-in anchors?
Additionally, I'm not sure that the implementation of the hard reliable case sounds very mature or even well thought-out. Having a syncer running at 20hz and waiting for it even in the synchronous path seems a bad idea. That's kind of like adding an anchor to your anchor so your speedboat can crawl while it crawls. Or was that a misrepresentation?
And who cares if the tone is serious? The criticisms seem to be valid, and ones that anyone releasing software making bold claims should expect. Statements about performance need to be qualified. Out-of-the-box behavior is not a reasonable point of comparison if (1) that's not what people do in the real world (2) the tests you use are ones that one piece of software is tuned for and the other is tuned against. Some kind of sorted index by field is necessary for some very common and desirable queries. Some of the performance gains come at an unacknowledged cost. The durability just isn't there (and for trivial reasons that could be fixed by making a few more parameters tunable, it sounds like), but is implicitly lauded by claiming to be ACID.
I don't know if you need to go looking for praise in that article. It really seems like there are some criticisms it would be beneficial to take seriously, even if you object to the author's tone or lack of praise. Even-handedness is not a marker of honesty, and honest criticism is worth the headaches it causes and the growth it spurs if you're willing to take it seriously.
When someone writes on their intro "... and now in return I want to waste your time a bit." it's a sign that someone isn't being serious.
Perhaps the vimeo video needs the same disclaimer.
Since tuning a database is by definition a custom process, we wanted to demonstrate what performance you'd get "out of the box."
Since you're picking the tests and setting what the "out of the box" defaults are the whole thing is kind of a waste of time, which is really what Domas is pointing out.
For me, the take away from this article is that MemSQL's ACID compliance is half-assed at best, which is a much more damning criticism than using a flawed benchmarking methodology.
"If you want to throw an anchor behind a speed boat, you can do it, though we don't recommend it."
There's a weird implication here that once you need disk persistence, you might as well give up. Group commit and well tuned, append-focused IO can do wonders if leveraged correctly. For example, we've benchmarked VoltDB using synchronous disk writes (group commit every 2ms) at half a million SQL statements/sec and 100k txns/sec on a single box using a magnetic media and disk controller with a BBU. This same config on EC2/EBS will run dramatically slower, but would still be considered respectable compared to MySQL.
Personally, I think out-of-the-box comparisons are fairly useless, because anyone who doesn't tune their database probably doesn't have enough data to make a difference.
I have no opinions either way on your product, but have you tried benchmarking against an out-of-the-box MySQL installed entirely on a ramdisk, so that it doesn't touch disk at all as well? That would make the comparisons more valid, since both installations wouldn't touch a physical disk.
I think it is reasonably fair to test MySQL's default, recommended configuration which writes to disk frequently, particularly if the other configuration is not worth using.
If it is worth using, then the real question is what you give up by using MemSQL instead of MySQL in the alternate configuration.
If you mean to ding MemSQL on durability then you can ding MySQL on speed. If you aren't going to ding MemSQL on durability (because you are comparing it to less disk-intensive MySQL configs) then presenting a case for MySQL means you have to come up with something that you are giving up other than durability based on frequent disk writes.
1.) The range query issue you pointed out can be explained by a well known limitation of skip lists. Unlike B-Trees, skip lists are unidirectional. By default, our indexes are ascending, so indeed you have to skip to the end to run a MAX() or "ORDER BY id DESC" query. To fix this, just change the primary key to be descending in your schema:
This is explained here http://developers.memsql.com/docs/1b/indexes.html#skip-list-.... If you want both behaviors in your app, you'll have to create two skip list indexes (for now).2.) The transaction-buffer > 0 setting is really where we shine. Synchronous durability is something we have for flexibility, but I'll be the first to admit that it's not really optimized to perform well. The customers that we're working with are okay with this. And it's inspired by what modern companies do. Maybe it's changed in the time since I've left (Domas?) but Facebook certainly does not write comments synchronously to disk.
3.) Our synchronous durability does indeed flush in 50 MS cycles, so you'll see poor performance on single threaded inserts. However, as you pointed out, we're optimized for the case with multiple concurrent writers. Since MemSQL implements group commit on high parallel load throuput picks up. Sure, writing your own very specific benchmark you can show us writing poorly, but we've never worked with a customer that's needed single threaded, synchronous writes to shine. If this is your use case, unless you need substantial read performance, MemSQL is not the database for you.