Hacker News new | past | comments | ask | show | jobs | submit login
Making full table scan 10x faster in InnoDB (yoshinorimatsunobu.blogspot.com)
98 points by aritraghosh007 on Oct 10, 2013 | hide | past | favorite | 20 comments



Yoshinori is the engineer who developed MHA, MySQL replication auto failover. It's essentially a perl script that ensures a master is always available and functional. This is so DBAs don't have to spend hours manually fixing a broken master/slave replication.


> When doing full table scan, InnoDB scans pages and rows by primary key order.

What the fuck?


Doesn't that make sense if data is clustered by the primary key, as InnoDB does?


Derp, yes. It's not an auxiliary index. However, it still seems weird for a full table scan to be implemented this way instead of whatever's convenient given the page layout. Why does it guarantee that the results will be in PK order?


It does not guarantee it. A select without a order by clause is not guaranteed to be in a repeatable or consistent order, neither in SQL nor MySQL.

MySQL does have an implicit ordering on GROUP BY queries, but it is deprecated and should not be relied on.


It always amazes me how many people don't know that there is no guaranteed repeatable order in a select statement without an order by clause.


The problem is compounded by the fact that with the small data sets people tend to use to test during development, things are usually returned in insertion order.


Oh, it can often be worse. People sometimes try to do tricky things like running totals by relying on implicit order in SQL queries. Often they are trying to squeeze as much performance out of the database server as possible (usually SQL Server) by avoiding cursors.

My favourite ever SQL Server hack was one done by a guy called Jeff Moden, who came up with a running total update that relied on a specific quirk of SQL Server clustered indexes. The amount of effort he put into it is quite remarkable! [1]

1. (Create a temp account to check it out, they're not spammers) http://www.sqlservercentral.com/articles/T-SQL/68467/


Many MySQL applications expect results to be delivered in primary key order if no "order by" clause is supplied


That seems dumb. If you require a particular ordering, you should have to specify it and it might cost extra to put the results in that order.


Then those MySQL apps are broken.


One issue might be that InnoDB by default uses a single idb file where all tables are stored. Further, this file also never shrinks even if you delete rows or drop indecies. So basically if page n contains your data page n + 1 might be useless to you, so InnoDB looks at pages by pk instead to only read pages it needs. (All this is my wild speculation.)


InnoDB uses clustered index. PK order is how rows are ordered "naturally".


I'd not be relying on that!


I don't know the ins and outs of InnoDB, but some databases match the physical order to the primary key order. Therefore this would probably make sense in that case.


This is because InnoDB actually stores the data via the primary key index. I'm fairly certain this is the only way they could do it (sanely).

As a side note, because of this, secondary indices just have a pointer to a PK. So when looking up by a secondary index, it has to traverse two trees.


Unless, of course, the secondary index has all of the columns you need to execute your query. (Sometimes it can help to add an extra column or two to your secondary index if you want to avoid the extra lookup.)

sqlite3 stores rows in an integer primary key index. If you do not specify an "integer primary key" column, it synthesizes one behind the scenes (essentially a rowid) and your primary key lookups end up going through two indexes.


the shell command "ls" also does that. If you have huge quantity of files under a directory, use "ls -f"


Only because it was the first thing that popped into my head, you can use async file IO in Java 7+ with the AsynchronousFileChannel class[1]

Currently digging through Google to see if I can find how those async calls are implemented on Linux/Windows to see if you can expect similar speedups in performance as Yoshinori saw in this article.

[1] http://openjdk.java.net/projects/nio/javadoc/java/nio/channe...


I would guess that InnoDB is trying to preserve some behavior of MySQL where full table scans are expected to proceed by primary key order.

The solution sounds fragile; I guess you can't query by extent?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: