SELECT c,d FROM t WHERE b=X ORDER BY a DESC LIMIT 1
on MySQL and was surprised to find that MySQL was using index over a and in his opinion fully ignored the existence of the index over (b,c) to discover where b is equal to X? I admit I didn't understand his explanation why, especially not why it would maybe do the right thing with the LIMIT 2. I can't imagine that any serious SQL engine would ignore the possibility to use the index in such a case.
Can you please explain: Does MySQL use index over (b,c) or not where evaluating "where b=X" or not? As I've said, I'd expect any engine which has an index to use it.
Can you elaborate on that with an example? I can't imagine when running MAX() or MIN() could be better than a static digit, so an example would help me understand your comment!
SELECT c,d FROM t WHERE a=(SELECT MAX(a) WHERE b=X from t)
But that post got flagged because he also gave his opinion on the level of understanding the article writer has. The query you posted is actually the one by the article writer.
Right. MySQL's optimizer knows how many records each table has, but not much else. It doesn't do statistical tests on the tables. Depending on table contents, "WHERE b=X" can select any number of records, from zero to the whole database. Whether the optimal strategy is to do the "WHERE b=X" first, possibly getting a large number of hits, or simply making a sequential pass, is not decidable from the database size only.
An alternative would be for the optimizer to do an indexed SELECT on b for "WHERE b=X", and if the number of hits exceeded some threshold, abandon that effort as inefficient and start over using a sequential pass. But MySQL doesn't abandon strategies once selected, or keep such statistics as a hint for future queries. As someone pointed out, Oracle's flagship database product does do that. Anyone know about Postgres?
>>Unfortunately, optimizer doesn’t understand, that there’re humans who are building these systems, and humans have their own rationale and thinking. One of the ideas that a human would have is that if you have 100GB table, you better understand things like data locality and other sorts of efficiencies.
I have not used MySql and cannot comment for that but with my experience with Oracle, i believe its histogram generation process is actually trying to solve the same problem.It has a binning strategy for histogram generation that tries to help with data patterns.
It sounds like you have a lot to teach us about query writing, but gratuitously putting someone else down, even when they don't know as much as you do, is not ok here. It violates both important rules of HN comments: civility (it truly is gratuitously negative) and substance, because it impedes the absorption of real information in your comment.
In the future, please edit that stuff out and stick to the substance when commenting here.
So sad I can't see or reply to that comment anymore - I'm the author.
Sorry, HN, my personal blog rarely gets that wide circulation, I should've done better job at introducing the context.
This was just one of many systems that thousands of facebook engineers build and they did not think of some edge cases. We do know how to write queries or to operate our databases, in general.
SELECT c,d FROM t WHERE b=X ORDER BY a DESC LIMIT 1
on MySQL and was surprised to find that MySQL was using index over a and in his opinion fully ignored the existence of the index over (b,c) to discover where b is equal to X? I admit I didn't understand his explanation why, especially not why it would maybe do the right thing with the LIMIT 2. I can't imagine that any serious SQL engine would ignore the possibility to use the index in such a case.