Making full table scan 10x faster in InnoDB

jjoe · on Oct 11, 2013

Yoshinori is the engineer who developed MHA, MySQL replication auto failover. It's essentially a perl script that ensures a master is always available and functional. This is so DBAs don't have to spend hours manually fixing a broken master/slave replication.

jey · on Oct 11, 2013

> When doing full table scan, InnoDB scans pages and rows by primary key order.

What the fuck?

chrisbolt · on Oct 11, 2013

Doesn't that make sense if data is clustered by the primary key, as InnoDB does?

jey · on Oct 11, 2013

Derp, yes. It's not an auxiliary index. However, it still seems weird for a full table scan to be implemented this way instead of whatever's convenient given the page layout. Why does it guarantee that the results will be in PK order?

tveita · on Oct 11, 2013

It does not guarantee it. A select without a order by clause is not guaranteed to be in a repeatable or consistent order, neither in SQL nor MySQL.

MySQL does have an implicit ordering on GROUP BY queries, but it is deprecated and should not be relied on.

chris_wot · on Oct 11, 2013

It always amazes me how many people don't know that there is no guaranteed repeatable order in a select statement without an order by clause.

ajanuary · on Oct 11, 2013

The problem is compounded by the fact that with the small data sets people tend to use to test during development, things are usually returned in insertion order.

chris_wot · on Oct 11, 2013

Oh, it can often be worse. People sometimes try to do tricky things like running totals by relying on implicit order in SQL queries. Often they are trying to squeeze as much performance out of the database server as possible (usually SQL Server) by avoiding cursors.

My favourite ever SQL Server hack was one done by a guy called Jeff Moden, who came up with a running total update that relied on a specific quirk of SQL Server clustered indexes. The amount of effort he put into it is quite remarkable! [1]

1. (Create a temp account to check it out, they're not spammers) http://www.sqlservercentral.com/articles/T-SQL/68467/

bickfordb · on Oct 11, 2013

Many MySQL applications expect results to be delivered in primary key order if no "order by" clause is supplied

jey · on Oct 11, 2013

That seems dumb. If you require a particular ordering, you should have to specify it and it might cost extra to put the results in that order.

chris_wot · on Oct 11, 2013

Then those MySQL apps are broken.

IgorPartola · on Oct 11, 2013

One issue might be that InnoDB by default uses a single idb file where all tables are stored. Further, this file also never shrinks even if you delete rows or drop indecies. So basically if page n contains your data page n + 1 might be useless to you, so InnoDB looks at pages by pk instead to only read pages it needs. (All this is my wild speculation.)

continuations · on Oct 11, 2013

InnoDB uses clustered index. PK order is how rows are ordered "naturally".

chris_wot · on Oct 11, 2013

I'd not be relying on that!

adwf · on Oct 11, 2013

I don't know the ins and outs of InnoDB, but some databases match the physical order to the primary key order. Therefore this would probably make sense in that case.

mikkelewis · on Oct 11, 2013

This is because InnoDB actually stores the data via the primary key index. I'm fairly certain this is the only way they could do it (sanely).

As a side note, because of this, secondary indices just have a pointer to a PK. So when looking up by a secondary index, it has to traverse two trees.

dunham · on Oct 11, 2013

Unless, of course, the secondary index has all of the columns you need to execute your query. (Sometimes it can help to add an extra column or two to your secondary index if you want to avoid the extra lookup.)

sqlite3 stores rows in an integer primary key index. If you do not specify an "integer primary key" column, it synthesizes one behind the scenes (essentially a rowid) and your primary key lookups end up going through two indexes.

est · on Oct 11, 2013

the shell command "ls" also does that. If you have huge quantity of files under a directory, use "ls -f"

rkalla · on Oct 10, 2013

Only because it was the first thing that popped into my head, you can use async file IO in Java 7+ with the AsynchronousFileChannel class[1]

Currently digging through Google to see if I can find how those async calls are implemented on Linux/Windows to see if you can expect similar speedups in performance as Yoshinori saw in this article.

[1] http://openjdk.java.net/projects/nio/javadoc/java/nio/channe...

b0b0b0b · on Oct 11, 2013

I would guess that InnoDB is trying to preserve some behavior of MySQL where full table scans are expected to proceed by primary key order.

The solution sounds fragile; I guess you can't query by extent?