Database Learning: Toward a Database That Becomes Smarter Every Time

Roritharr · on Sept 9, 2017

I recently wondered why we still don't have self optimizing databases, creating their own indexes and tweaking their settings when needed. Are there such products out there or is there no demand?

jandrewrogers · on Sept 9, 2017

Good databases do self-optimize quite a bit but they select where and how they do that optimization carefully. There are two big constraints on practical dynamic optimization: the cost must be less than the benefit and it must be applied to the system incrementally.

These constraints have effects that are not always obvious. For example, architecting a database kernel so that it can be dynamically optimized slows down computational throughput in the general case! This means that the benefits of dynamic optimization have to be sufficiently large that it offsets the performance tax of doing dynamic optimization at all. Also, because the performance tax is general, there is an implication that the optimization needs to be general across the set of runtime operations as well i.e. you can't optimize one operation at the cost of slowing down others. Nonetheless, there many areas such as cache replacement algorithms where dynamic optimization has big performance payoffs for the added complexity.

Second, dynamic optimization needs to be applicable incrementally, otherwise it tends to have a "stop the world" effect on performance when it is being applied, which is bad. A good example of this is dynamically optimizing storage for the actual distribution of queries it sees. Dynamically adding a new secondary index imposes a huge background cost that is visible in performance, so databases don't do it. However, dynamically optimizing individual pages for queries is common because it can be applied incrementally for pages you are already accessing anyway for a small one-time cost per page distributed over time as pages are accessed. And it only applies to pages you actually touch; building a new index pulls a lot of cold storage through the cache. Page level query optimization may be less effective overall than adding a second index but the fact that it can be applied incrementally makes it a preferred form of dynamic optimization.

In short, dynamic optimization has been studied and tried for many decades. Sophisticated databases do a lot of it but the mechanisms are chosen carefully to minimize adverse side effects in real world conditions.

Roritharr · on Sept 9, 2017

Great Answer, thank you! Going further, would this mean one could build a DB where i could send off "optimize now" queries and telling thereby the DB that I accept stop-the-world type of effects for a limited time? Or even specifying the effects that i'm willing to accept?

My main issue is that most of the DB Stuff i do as a non-dba is so basic that it could be automated, but isn't for reasons that until your post came up weren't clear to me.

elvinyung · on Sept 9, 2017

CMU is doing a bunch of research on exactly what you mentioned. [1] is a general SQL database that aims to operate "autonomously" by predicting workloads (although AFAIK its current capabilities are mostly limited to adaptively moving records between row-oriented and column-oriented storage). I think [2] is based on the same research, but is a more concrete tool for tuning MySQL and Postgres configuration.

I think ultimately, it's a much less tractable problem than anyone expected.

[1] http://pelotondb.io/

[2] https://github.com/cmu-db/ottertune

justinclift · on Sept 9, 2017

Looking at the requirements doc for [2]'s front end code, it seems like MySQL only. :(

bladecatcher · on Sept 9, 2017

MonetDB[0] is an open source OLAP engine which does cracking.

From the Wikipedia page:

"Cracking is a technique that shifts the cost of index maintenance from updates to query processing. The query pipeline optimizers are used to massage the query plans to crack and to propagate this information. The technique allows for improved access times and self-organized behavior."

[0] https://www.monetdb.org/content/column-store-features

dogruck · on Sept 9, 2017

Can anyone provide a bit more detail on database cracking? What does "massage the query plans to crack" mean?

bladecatcher · on Sept 9, 2017

Their paper "Self-organizing tuple reconstruction in column-stores" [0] provides more details on this.

[0] https://pdfs.semanticscholar.org/1d5c/bc071f918143dbedf67a51...

garyclarke27 · on Sept 9, 2017

I agree, the query optimizer is a good example of AI, but it could be made much better, if it learned from real world experience. Instead of relying on the first plan (created within very short time constraints), it should observe which queries are used most often and which ones take the longest, then during slack periods, it could tryout different plans and add missing indexes, to find faster plans. Also auto materialized views could be created by such an AI agent. Should also remove unused indexes. Of course there could be parameters, such as how much space to use for indexes or materialized views.

sixdimensional · on Sept 9, 2017

Actually, I worked on a query optimizer that did just such a thing in terms of running multiple physical plans in sort of a race to see which finished first and tried to learn valuable statistics from that process.

Actually, what it did first was use a time, CPU and memory constrained space to explore multiple logical query plans. Then it would choose the top 3 logical query plans, as well as one or two random logical query plans (whether they performed well or not). The top 3 plans would be turned into physical execution plans and run in parallel (assuming the underlying hardware could handle it). The first query to finish would be returned and the others cancelled/killed.

Later when there was slack time on the system, it could take the other logical plans, convert them into physical plans and test them out to see how they performed. Thus generated more useful information to feed back.

Machine learning is a great fit to database engine query optimizers, even for unexpected costly things, like data type conversions and results of expression evaluations (think something like memoization).

pkaye · on Sept 9, 2017

How did it work out in the end? Are there any limitations in its use?

sixdimensional · on Sept 9, 2017

One observation was the extra load it created (obviously) doing the learning work. In practice it became a feature you could turn on or off and many seemed too fearful to use it in production. The unpredictability of the performance during the early stages also caused fear. There was talk about running query plan explorations autonomously as soon as the system was turned on and had tables defined with relationships, but that path seemed to deviate from the goal of the project at the time too much, so was ignored.

That said, optimizers don't always get things right the first time, so there were clear cases I saw in early concept stages where running multiple options made a huge difference. For example, testing two different join orders could result in one query that took under a minute and one that took over 20 minutes. So it definitely made an impact.

We found that using learning in other areas also had a meaningful impact, like in saving time during data type calculations.

Generally, I can say, for the project I worked on, there were clear benefits but it didn't progress to a point of learning all the limitations, unfortunately.

I'd say still an area worth research, based on my experience.

JimDabell · on Sept 9, 2017

Not exactly the same thing, but PostgreSQL has a query optimiser based on genetic algorithms:

https://www.postgresql.org/docs/9.6/static/geqo.html

Roshie5u · on Sept 11, 2017

Oracle extended it reliance on table statistics a bit in direction of "learning" and calls it "Optimizer Adaptive Features". I am not sure about performance benefits but developers don't like when query plans changes after few weeks, even on old tables with up to date statistics.

Having to guess why planer has wrong cost for some operation is already annoying, trying to guess why ML/AI created weird query plan would be even worse.

psadri · on Sept 9, 2017

Same with tuning a server for best performance given its actual work load, e.g. Tweaking all the tcp_* configs.

dsun180 · on Sept 9, 2017

Parse did that and it sucked.

einrealist · on Sept 9, 2017

How abou a database that uses behavioral pattern detection or behavior profiles to make data exfiltration through connected applications harder?