We been using it for a large dataset, and has been fantastic. Compared to MySQL 5.7 and PostgreSQL, it have the advantage of supporting the TokuDB engine out of box. My data uncompressed is 3TB, with it , we can fit in 300GB with all indexes.
Read Free Replication with TokuDB (https://github.com/percona/tokudb-engine/wiki/Read-Free-Repl...) also enable us to have a very cheap VPS as slave.
How does one ,by tunning a vanilla PostgreSQL (on a per instance or per database or per table basis) can get the same kind of advantages (and tradeoff) than in MySQL/MariaDB by switching from InnoDB to TokuDB ?
TokuDB is built for a single-process model with threads for connections. This is incompatible with PostgreSQL's multi-process model, and patching TokuDB to support such a model would be a large effort. Not impossible, but a lot of work.
That compression ratio is one I'd love to have. PG's built-in data compression is limited to large values in a single row - if you have a hundred billion 100-byte tuples, TOAST isn't going to help.
The closest thing I know of to achieve something comparable with PG would be to use a data directory mounted on something with transparent filesystem-level compression (meaning, practically speaking, ZFS). This gives you the less-than-ideal choice of a not-mainline-linux filesystem (for your database's data directory, which is worth being nervous about) or running an OpenSolaris descendant, which is a big departure for plenty of people who have only ever run production dbs on Linux servers.
On a tangent: I think it's rather sad that many nerds aren't prepared to try a different UNIX, when we're talking the lifeblood of their business: data.
ZFS is the only filesystem it's reasonable to trust critical data on right now (I can't think of any other OSS self-validating Merkle trees that have been hammered in production for nearly a decade...), yet somehow some minor differences in Unicies trumps that.
I'm going to make a bold claim I know will stir the nest, but I feel confident in making given all the bad shit most filesystems miss: if you're not running your OSS RDBMS on ZFS right now, and you don't have compelling specialized needs to explain why not, you shouldn't be let near a production DB due to plain negligence and/or incompetence.
It depends on how you slice it. If your expertise in this system is less than your others, it could cause real problems down the line. If a mistake was made in the setup, maintenance or recovery of the system because it's different, it would be just as easy to say it was negligent to use it instead of something that was well understood by your team. Trade-offs like this are made all the time, to claim negligence is to ignore the complexities of the situation.
This. I'm more than willing to try different UNIXes, and I have. But when we're talking about running a business-critical database, it's a hard requirement that both myself and my teammates are able to maintain, keep patched, and generally competently administer the system. Trading an OS that my team has decades of combined experience running in production with for one where we collectively have zero production experience is a non-starter.
A good DBMS includes a bunch of integrity-preserving measures to deal with data corruption. No reason not to run your DBMS on ZFS as far as I'm aware (edit: at least if the workload does not trigger excessive fragmentation), but I'd reserve the word 'negligence' for, say, not using an appropriate backup solution.
Something funny about this: where are the OSS RDBMSs which do not rely on a general purpose filesystem? Because in the commercial RDBMS space that is a thing.
I've used PostgreSQL with BTRFS + LZO compression on Linux. After I set it up, our large queries (2 minutes or > runtime) saw a speedup of 5-10x and our TPS went up by around 25%.
BTRFS has been in the stable kernel for over two years, so it might be worth checking out!
Mechanical or flash storage? Compression could give advantage on mechanical disks due to the reduced time spent on seeking, but I'm curious to know how they perform on SSD for which seek time is reduced to a minimum if not 0.
TokuDB is a very modern engine that uses a data-structure called "Fractal Trees". Imagine something like a B-Tree with buffers in the non-leaf nodes. It also heavily compress the row data.
The main point of TokuDB is not writing everything to the disk thanks to the buffers, and with the very efficient compression, more data fits in the memory and saves the space for big-data applications. ( please someone correct me if I´m wrong)
In our tests TokuDB was vital for our startup. With it, we can use cheap dedicated servers and our performance is amazing. We tried PostgreSQL, MongoDB 3.0 and MySQL 5.7 and they can´t fit our data in a 2TB disk or were slow in our tests.
The GP is slightly incorrect. The buffers are on disk; the benefit comes because the buffers are laid out in transaction order, so they can be written sequentially under any workload; whereas a standard B-tree is laid out in sorted order, so can only be written sequentially under sequential workloads. The benefit is similar to that of log-structured databases.
The log is always written, what's not written asap is changes on the tree (changes go into the buffers first and then trickle down when buffers are full).
This is the critical insight. TokuDB uses a write-ahead log which is synced according to the configuration, and can be made as immediate as full fsync on commit. This provides the strongest durability available on a single machine.
Where TokuDB gets its speed boost is by delaying the random reads associated with updating the indexing structure (the Fractal Tree). The buffers are written to disk on checkpoint, but because they're buffers, the potentially random writes are localized to a smaller number of nodes high in the tree, which minimizes the number of disk seeks required. Since sequential I/O is cheaper than random, the sequential writes to the write-ahead log are very fast, so even in very strict durability configurations, TokuDB can easily outperform databases which use random writes to update the indexing structures, such as the B-trees used by InnoDB and most other RDBMSes.
I don't have a good understanding of TokuDB, but I'm not sure the buffers are necessarily in memory. The problem the buffers solve is the cost of rebuilding the index when you insert or delete data -- buffers allow you to delay the rebuild so as to batch many updates together. Even if the buffers were in memory, you'd still have the old (inconsistent) index on disk (presumably the index is marked inconsistent until the writes in the buffers are applied). You just need to rebuild it to match the new data.
Isn't delaying index updates until more operations accumulate a standard for all DB for 20 or more years? Without it write operation would be painfully slow.
It's fine for people who use RDBMSs for data processing, like loading daily log data into tables and running one or more massive queries to yield averages, etc. etc.
It's fine for regular loads, too, so long as they do regular saves to disk and have a high availability setup. Plenty of setups out there using in-memory, checkpointed DB's and RAM drives on reliable hardware without serious problems.
Tokutek and some staff where common vititors to the boston mysql meetup. If I remember they were using "fractal trees" instead of b-trees. Trading cpu cycles for disk io. It was interesting tech.
Quote:"The idea behind FT(fractal Trees) indexes is to maintain a B tree in which each internal node of the tree contains a buffer. When a data record is inserted into the tree, instead of traversing the entire tree the way a B tree would, we simply insert the Eventually the root buffer will fill up with new data records. At that point the FT index copies the inserted records down a level of the tree. Eventually the newly inserted records will reach the leaves, at which point they are simply stored in a leaf node as a B tree would store them. The data records descending through the buffers of the tree can be thought of as messages that say “insert this record”. FT indexes can use other kinds of messages, such as messages that delete a record, or messages that update a record."
Are there any comparisons between fractal trees and the database cracking/adaptive indices advocated by [1]? It seems like they're going after the same use pattern but cracking is relatively simpler and even faster for inserts.
Cracking is kind of similar in that it delays the "sorting work" done in the indexing structure. However, cracking is fairly heuristic and therefore hard to analyze without an intimate understanding of the workload. Fractal Trees do pretty much the same thing under all workloads (modulo optimizations for things like sequential and Zipfian workloads), so they're easier to analyze and prove bounds for.
An interesting new development is http://getkudu.io/ which applies some ideas common with Fractal Trees and other write-optimized data structures (like LSM-trees) to column stores.
At Tokutek, we had some designs for how to implement column store-like data models on Fractal Trees (we called them "matrix stores" because they had some of the benefits of both row stores and column stores), but we didn't get around to implementing them.
My favourite, TokuDB has online DDL. Which is a welcome change to the MySQL ecosystem.
I am pretty sure the only other engines that support online DDL is Sybase and DB2.
Online DDL generally means lock free DDL operations.
Not even Oracle, MSSQL, or Postgresql do online DDL. But at least they have transactional DDL, even if it isn't entirely lock-free all the time.
Compared to InnoDB/XtraDB; TokuDB lacks foreign key support, because this was worked into InnoDB years ago because the MySQL server layer doesn't support any kind of constraints. I am hoping for MariaDB to start implementing more modern SQL features which would this be available in supported storage engines.
That's not really online, it still rebuilds the entire table, it just does it in the background and shows the results once it's done. This is pretty similar to how TokuDB hot indexing works, but TokuDB's hot column add/rename/delete is truly online, the results are visible immediately and the actual disk work is done in a lazy fashion.
For our workload (discussion forum, custom engine) the move to TokuDB was so effective that we ended up completely eliminating a number of front-end caching stages, because rendering the page from scratch every time ended up being very nearly as fast as pulling from the cache.
(The hot path was a single well-optimised query that retrieved all content for the page. The query didn't need to change.)
To be more precise, the cache was still faster than building on the fly, but given the quantity of content we were dealing with, the cache was a resource hog. Were it not for TokuDB our next move probably would have been to deploy a dedicated cache server with huge amounts of RAM.
As it stands now, this content is now being handled by a single (and quite unremarkable) MariaDB slave and assembled on-the-fly like any other web page.
For anyone dealing with gigabytes of text content, I highly recommend giving it a go. You don't even need to screw with your master, just change the table type on the slave.
It has very high insert speeds, and amazing compression (up to 500% in my tests). It's very powerful if you're inserting terabytes of data every day on commodity hardware.
This is sorta misleading. A quote from the article, "The changes in the MariaDB source code leading to those impressive performance improvements are part of porting MariaDB to Power8."
Pretty frustrating. I was pretty excited since MariaDB is a drop in replacement for MySQL (more or less). But you have to use it on a totally different architecture which can't be justified without a lot of deliberation.
From my reading of the article, a lot of the performance improvements come getting rid of the mutex in front of their hash based metadata cache by using a lock free hashing algo. These changes originally came from Oracle's MySQL 5.7 work[1]. Looks like MariaDB merged the lock-free hashing stuff into their version, and applied it to the table definition hash which also suffered from lock contention.
These are changes that all architectures can benefit from.
It's also 2 socket, 20 core, 8 thread per core = 160 hardware thread system. And it's not general queries, but just "point selects" from 20 tables of 1 million rows combined.
That's far from typical hardware to deploy maria/mysql on, and far from realistic workload.
Yeah, I want to drool but utilizing that many threads in "practical" scenarios seems difficult. It would probably be great for messaging systems though!
Not just that, but database performance drops quickly with writes.
1M transactions per second with 100 % reads might as well mean 10K transactions per second with 95 % readsm 5 % writes ("the usual OLTP workload" per common wisdom) and 1K transactions per second on 50 % reads 50 % writes. Or even less. Or much more. It's difficult to even quess.
"The data set is 1 million rows in 20 tables. Fewer tables can be used, but below 4 tables the performance drops somewhat due to a hot spot in the table definition cache."
Since version 10.1, MariaDB has merged the "classic" server with the galera server. Galera promises easy-to-do scaling (see [1],[2] for details) which is nice to have.
Please read the linked info and decide for yourself if this feature sounds tempting enough to you.
Isn't scaling usually in the way of sharding? While in this case galera it's just replication (master-master sycn), and you can only scale read access.
The TokuDB implementation of Fractal Trees uses a single-process, multi-threaded model, which is incompatible with PostgreSQL's multi-process model. In theory, one could patch TokuDB to be suitable, but this is a ton of work.
Definitely no foreign keys according to another post in this thread. Which is not something I want to go back to, however much I'd like this level of compression.
We been using it for a large dataset, and has been fantastic. Compared to MySQL 5.7 and PostgreSQL, it have the advantage of supporting the TokuDB engine out of box. My data uncompressed is 3TB, with it , we can fit in 300GB with all indexes. Read Free Replication with TokuDB (https://github.com/percona/tokudb-engine/wiki/Read-Free-Repl...) also enable us to have a very cheap VPS as slave.