Does the presence of a HDD or SSD across the three architectures make a difference, or is this purely a CPU problem to be solved? I see the testing made sure to fit all tables in memory with a large enough innodb buffer pool, so was storage even a factor?
These specific tests we did were read only where the working set fits into memory, so SSD vs HDD doesn’t matter because they were CPU bound tests to highlight the performance improvements. So storage isn’t a factor here. If that wasn’t the case then faster storage would help make it more CPU bound and Dynimizer would make a bigger impact when using SSD vs HDD.
It looks like it's taking measurements from live process, so probably not arch=native. But most likely the runtime profile guided optimisation would do a similar job.
Yes it’s definitely being used in production. We’re starting to collect production use cases and will provide some on our website soon. Here’s an example website with growing traffic using MariaDB + Dynimizer with Wordpress, and they found Dynimizer very helpful: https://www.cgmagonline.com
In terms of innacurate reads or corrupted writes, that would be a bug if it ever happens. That would not be part of normal operation and would not be expected. That said, all software including MySQL, gcc, and Linux are full of bugs and Dynimizer is not immune to that of course. However it has been stress tested thoroughly with MySQL, MariaDB, and Percona Server up to MySQL 5.7, MariaDB 10.2.
Very, very cool. Section 7 of the manual[0] gives some hints on how this black magic works:
> 7. Workload Requirements
> To obtain benefit from the current version of Dynimizer, all of the following workload conditions must be met:
> A small number of CPU intensive processes - On a given OS host where the workload is running, the workload must be comprised of one or a few CPU intensive processes. Optimizing a large number of processes at once is not recommended.
> Long running programs - The processes being optimized have long lifetimes, and their workloads are long running in order to amortize the warmup time associated with optimization.
> x86-64 - Optimized processes must be 64-bit, derived from x86-64 executables and shared libraries, which must comply with the x86-64 ABI and ELF-64 formats. Most statically compiled applications on Linux meet this requirement.
> Dynamically Linked - Target processes must be dynamically linked to its shared libraries. Statically linked processes are not yet supported. Most Linux programs are dynamically linked.
> No self modifying code - The target application must not be running its own Just-In-Time compiler such as those found in Java virtual machines. This therefore excludes Java Applications.
> Front-end CPU stalls - The workload wastes a lot of time in CPU instruction cache misses, instruction TLB misses, and to a lesser extent branch mispredictions.
> User mode execution - Much of that wasted time is spent in user mode execution (as opposed to kernel mode), as Dynimizer only optimizes user mode machine code.
> Because of these requirements, Dynimizer takes a whitelist approach when determining if programs are allowed to be optimized, with MySQL and its variants being the currently supported optimization targets on that list for this early beta release. Other programs are not currently supported, and while they can be used with Dynimizer, they should be very thoroughly tested by the user or system administrator before being deployed in a production environment.
> Future versions of Dynimizer may eliminate many of these workload requirements, broadening the variety of applicable scenarios as well as further increasing the performance delivered in previously beneficial cases.
The real important bits: "Front-end CPU stalls - The workload wastes a lot of time in CPU instruction cache misses, instruction TLB misses, and to a lesser extent branch mispredictions".
My educated guess is that it relocates the hot path of the text segment to better pack into the instruction cache. Cool.
I wonder if similar techniques can be applied to PC games. Specially for older ones, considering they use less threads and certain CPU features where not available at the time.
Of course these projects were a major source of inspiration for Dynimizer. However they are not JIT compilers. They are more like virtual machines or binary translators. Today DynamoRIO and Mojo (which ended up as Intel PIN) are used for program introspection and analysis, not for application acceleration.
"Dynamic binary translation" is the term of art. Which of course VMWare and VirtualPC were doing 20 years ago in dynamically translating x86 ring-0 code to ring-3 code.
Dynamizer is translating x86-64 to faster x86-64, but the concept is the same.
DynamoRIO was actually talked about for application acceleration. There was at least a PoC that did dynamic function inlining.
Their product seems to be simpler to install and directed to specific software and workload. It's really amazing to get 10% more TPS by just running a background process.
Maybe they should present themselves as "fire and forget TPS optimizer"
I would be very interested to hear a sampling of the war stories that came out of building this. I had a friend working on the IBM zPDT JIT at one point, and while I unfortunately can't remember many of the details at the moment, I remember boggling (in that sort of emergently satisfying way) at some of the 'oh shit' moments that came up.
Fair enough. This is valuable feedback. We have provided a more secure method on our home page and will provide package managed downloads as well, however the reality is that nothing is 100% secure. In the meantime, you can just download the script and inspect it (it’s pretty simple) and then do it all manually if you prefer.
I agree, it may be cliche, but I think the exact same thing whenever I see this kind of practice, too. Or that the developer that has never had to manage a live system with users that know his phone number and his boss's phone number.
"Oh, this is just for a test mock up. Nobody is supposed to actually use this to install it for real."
Well, to experienced people it makes you look moderately stupid, and to inexperienced people it looks like an elegant solution. It's actively hostile to secure system planning.
It reminds me of the NPM left-pad debacle[0] and some of the criticism[1] that came up from that.
I’ve given up waiting for nodejs to become a reliable environment. Just recently the `is-even` package came to light and highlighted that things aren’t getting any better than when leftpad was a thing.
I can’t wait to see tc39’s response to the `is-even` shit show after they decided to just add leftpad to the stdlib.
The “best” part of it all is that apparently js engines have an internal optimisation for `foo % 2 === 0`, because it’s such a common thing.
This clown was using a bit wise operation in `is-even` “because everyone already knows about % 2 === 0`, and thus was hurting performance (on top of whatever extra memory is used for the module, function call overhead etc)
Just pasting commands alone into a terminal is pretty insecure now too. Their are proof-of-concepts that show some control characters and other invisible characters will make it to the clipboard and even someone pasting into a text editor won't see them.
Between that and delivering different responses to curl|sh vs a browser or regular curl [1] you’d think this kind of bullshittery would be abandoned, but no.
This looks like seriously impressive technology, and yields impressive results, but: I don't think I'd be comfortable with the idea of something rewriting my database in production. What I -would- probably be OK with is having dynamize analyze my workload in a staging/load test environment, and then producing a new binary for me which I could then run through its paces.
Beyond actual errors being produced, I'm wondering what'd happen in weird scenarios such as one where by primary gets heavily optimized for its write load and creeps up to, say, 80% CPU or so at peak.. What then happens then if my replica which has been heavily optimized for its read load gets promoted in a failure scenario and gets pegged at high CPU?
Final thought here is if this tech really is solid, when is AWS going to start shipping it with my VM?
This does not rewrite your database. It optimizes the live in-memory machine code of the mysqld (MySQL Server) process. It must run on the same OS host as the mysqld (MySQL Server) process being optimized. So if you are using this on the master and not on the replica, the replica won’t be touched. Hope that makes sense.
I think the point was that profile guided optimization relies on the workload staying relatively fixed. If the workload suddenly changes (like promoting a readonly slave to be the writable master) the assumptions made by PGO may not be valid, and performance could be worse than if no modifications were made in the first place.
I think you'd just have to measure scenarios before using it in production.
grogers explained my intent well, especially around the promotion of a replica to a primary work load (thank you). I think we're arguing semantics between optimizing machine code and rewriting mysqld. I want to emphasize that Dynamizer looks super awesome, and I do intend to try it, so no offense was intended. Well done!
Thanks! Sorry for terse tone... just responding quickly from a smartphone while travelling :-) Your concerns are definitely valid and we are working on improving the situation.
Dynimizer can detect a drastic workload change as you described and reoptimize in response. That’s the default operation which can be turned off. What has been observed is that if we optimized for say a write-heavy workload and then change it to read-only without reoptimizing (or vice versa), it will still show an improvement, just not as much. Hope that makes sense.
For a future update we will be optionally caching optimizations to forgo the optimizations/warmup period. You can then move those cached optimizations from test to production to reduce uncertainty. Coming soon. Many advantages there over PGO which is very difficult to use.
> ...It profiles applications using the Linux perf_events subsystem and interfaces with a target application's machine code through the Linux ptrace system call. When optimizing a program, it loads a code cache into the target program's address space...
@davidyeager: In the legend of the graph in Dynimizer System Overhead in https://dynimize.com/product, both series are labelled as "Without Dynimizer".
Couple of questions. Since this seems to be a very general technology, why the emphasis on MySQL (and DBs in general)? Marketing? Also, I found Dynimize vs Dynimizer confusing - is that company name vs product name?
Yes we will correct the legend in that graph, thanks for reporting it.
Dynimize is the company, Dynimizer the product. We may ditch the name Dynimizer and just go with Dynimize to avoid confusion. Thoughts?
It is a general purpose approach to optimization and MySQL is just a starting point. It was chosen first because it has a broad user base and is relatively easier to support compared to many other Linux programs: single process architecture, long process lifetimes, OLTP workloads are known to spend much of their time in front-end CPU stalls on the CPU side which are effectively targeted with profile guided compiler optimizations, and it’s statically compiled. We’ve tried it on MongoDB and seen similar benefit but not supported yet. Coming soon. Windows will probably require some driver development for effective sample based profiling and will happen later on. We will improve the effectiveness of our other optimizations that don’t target front-end CPU stalls and better support multiprocess workloads with short process lifetimes, which will allow us to target many other types of programs in the future.
It isn't any different than running your production in a container, a VM, or the cloud, all of which can significantly affect what's actually going on.
Works with VMs or on a VPS. We have done a lot of testing on KVM, Xen, and a bit on VMware. Still need to do a bit of work to properly support containers.
I wonder, do the authors have a reverse engineering background? It seems like a lot of concepts from reverse engineering were applied to the JIT compiler, which I find incredibly cool.
The main infographic shows an impressive improvement in TPS, but what about single query execution time? I often see the CPU maxed out by complex queries against large tables.
Heyo, workloads that fit in RAM are not that interesting. I hope they have identified their market ahead of time, for their sake, because I don't think it's valuable.
That was used to highlight the maximum improvement expected. When the working set doesn’t fit into RAM then you will get some combination of a smaller amount of performance improvement plus a reduction in CPU usage. The faster the storage, the greater the increase in tps that you’ll see. Note that replication is often CPU bound. We will be applying this to non-database workloads as well in the near future.
https://dynimize.com/blog/discussions/dynimizer-mysql-cross-...
It seems to me the only system with SSDs (Ivy Bridge) sees higher transactions per second improvement than the other systems.
https://dynimize.com/performanceSpeedup