TokuDB is a very modern engine that uses a data-structure called "Fractal Trees". Imagine something like a B-Tree with buffers in the non-leaf nodes. It also heavily compress the row data.
The main point of TokuDB is not writing everything to the disk thanks to the buffers, and with the very efficient compression, more data fits in the memory and saves the space for big-data applications. ( please someone correct me if I´m wrong)
In our tests TokuDB was vital for our startup. With it, we can use cheap dedicated servers and our performance is amazing. We tried PostgreSQL, MongoDB 3.0 and MySQL 5.7 and they can´t fit our data in a 2TB disk or were slow in our tests.
The GP is slightly incorrect. The buffers are on disk; the benefit comes because the buffers are laid out in transaction order, so they can be written sequentially under any workload; whereas a standard B-tree is laid out in sorted order, so can only be written sequentially under sequential workloads. The benefit is similar to that of log-structured databases.
The log is always written, what's not written asap is changes on the tree (changes go into the buffers first and then trickle down when buffers are full).
This is the critical insight. TokuDB uses a write-ahead log which is synced according to the configuration, and can be made as immediate as full fsync on commit. This provides the strongest durability available on a single machine.
Where TokuDB gets its speed boost is by delaying the random reads associated with updating the indexing structure (the Fractal Tree). The buffers are written to disk on checkpoint, but because they're buffers, the potentially random writes are localized to a smaller number of nodes high in the tree, which minimizes the number of disk seeks required. Since sequential I/O is cheaper than random, the sequential writes to the write-ahead log are very fast, so even in very strict durability configurations, TokuDB can easily outperform databases which use random writes to update the indexing structures, such as the B-trees used by InnoDB and most other RDBMSes.
I don't have a good understanding of TokuDB, but I'm not sure the buffers are necessarily in memory. The problem the buffers solve is the cost of rebuilding the index when you insert or delete data -- buffers allow you to delay the rebuild so as to batch many updates together. Even if the buffers were in memory, you'd still have the old (inconsistent) index on disk (presumably the index is marked inconsistent until the writes in the buffers are applied). You just need to rebuild it to match the new data.
Isn't delaying index updates until more operations accumulate a standard for all DB for 20 or more years? Without it write operation would be painfully slow.
It's fine for people who use RDBMSs for data processing, like loading daily log data into tables and running one or more massive queries to yield averages, etc. etc.
It's fine for regular loads, too, so long as they do regular saves to disk and have a high availability setup. Plenty of setups out there using in-memory, checkpointed DB's and RAM drives on reliable hardware without serious problems.
Tokutek and some staff where common vititors to the boston mysql meetup. If I remember they were using "fractal trees" instead of b-trees. Trading cpu cycles for disk io. It was interesting tech.
Quote:"The idea behind FT(fractal Trees) indexes is to maintain a B tree in which each internal node of the tree contains a buffer. When a data record is inserted into the tree, instead of traversing the entire tree the way a B tree would, we simply insert the Eventually the root buffer will fill up with new data records. At that point the FT index copies the inserted records down a level of the tree. Eventually the newly inserted records will reach the leaves, at which point they are simply stored in a leaf node as a B tree would store them. The data records descending through the buffers of the tree can be thought of as messages that say “insert this record”. FT indexes can use other kinds of messages, such as messages that delete a record, or messages that update a record."
Are there any comparisons between fractal trees and the database cracking/adaptive indices advocated by [1]? It seems like they're going after the same use pattern but cracking is relatively simpler and even faster for inserts.
Cracking is kind of similar in that it delays the "sorting work" done in the indexing structure. However, cracking is fairly heuristic and therefore hard to analyze without an intimate understanding of the workload. Fractal Trees do pretty much the same thing under all workloads (modulo optimizations for things like sequential and Zipfian workloads), so they're easier to analyze and prove bounds for.
An interesting new development is http://getkudu.io/ which applies some ideas common with Fractal Trees and other write-optimized data structures (like LSM-trees) to column stores.
At Tokutek, we had some designs for how to implement column store-like data models on Fractal Trees (we called them "matrix stores" because they had some of the benefits of both row stores and column stores), but we didn't get around to implementing them.
My favourite, TokuDB has online DDL. Which is a welcome change to the MySQL ecosystem.
I am pretty sure the only other engines that support online DDL is Sybase and DB2.
Online DDL generally means lock free DDL operations.
Not even Oracle, MSSQL, or Postgresql do online DDL. But at least they have transactional DDL, even if it isn't entirely lock-free all the time.
Compared to InnoDB/XtraDB; TokuDB lacks foreign key support, because this was worked into InnoDB years ago because the MySQL server layer doesn't support any kind of constraints. I am hoping for MariaDB to start implementing more modern SQL features which would this be available in supported storage engines.
That's not really online, it still rebuilds the entire table, it just does it in the background and shows the results once it's done. This is pretty similar to how TokuDB hot indexing works, but TokuDB's hot column add/rename/delete is truly online, the results are visible immediately and the actual disk work is done in a lazy fashion.
For our workload (discussion forum, custom engine) the move to TokuDB was so effective that we ended up completely eliminating a number of front-end caching stages, because rendering the page from scratch every time ended up being very nearly as fast as pulling from the cache.
(The hot path was a single well-optimised query that retrieved all content for the page. The query didn't need to change.)
To be more precise, the cache was still faster than building on the fly, but given the quantity of content we were dealing with, the cache was a resource hog. Were it not for TokuDB our next move probably would have been to deploy a dedicated cache server with huge amounts of RAM.
As it stands now, this content is now being handled by a single (and quite unremarkable) MariaDB slave and assembled on-the-fly like any other web page.
For anyone dealing with gigabytes of text content, I highly recommend giving it a go. You don't even need to screw with your master, just change the table type on the slave.
It has very high insert speeds, and amazing compression (up to 500% in my tests). It's very powerful if you're inserting terabytes of data every day on commodity hardware.