It's surprising to me that "where id = 42 or id = 4711" behaves differently than "where id in (42, 4711)".
I would have guessed the database optimizes this away. It seems like an easy to implement and safe optimization. (Look at abstract syntax tree, find "or" expressions, find equality test expressions under there, check if two or more compare the same sub-expression against constants, then combine.)
Did I misunderstand what the article is saying PostgreSQL does, or does it actually not do this optimization for you?
> I would have guessed the database optimizes this away
And one thing about this type of "beneath the covers" optimization is that, over time, the tool ends up doing it for you and applying these sorts of optimizations can end up hurting more than helping - so by all means learn it, but keep an eye on it as versions progress and make sure you're not fighting the optimizer.
Kind of an issue for optimizers in any situation. You're relying on implicit behavior which may change.
Unless you decide to document it and guarantee the optimization will be applied if certain conditions are met. Though that has its own downsides because now the system is locked into providing that guarantee. (Real-world example of a guarantee: in C, the "&&" and "||" operators are guaranteed to be lazy. Writing "0 && (1/0)" should be safe; you won't divide by zero because the spec guarantees it's optimized away.)
Or I guess you could try other approaches like having the database detect that it could be optimized, but then tell you to rewrite your own code instead of rewriting it for you.
I think the simpler takeaway is that OR consistently represents a branch in the query planning, where the planner now always has to do two things and merge back instead of one. It makes sense for the OR operations to always preserve this inefficiency instead of leaving the user guessing when it will or won’t.
I have long since resigned myself to the fact that "declarative language" in practice only means that you don't have to manually write the for-loops anymore. The concept of declarative languages seems like it has so much potential, but I haven't seen any that have even come close to it.
I would have guessed the database optimizes this away. It seems like an easy to implement and safe optimization. (Look at abstract syntax tree, find "or" expressions, find equality test expressions under there, check if two or more compare the same sub-expression against constants, then combine.)
Did I misunderstand what the article is saying PostgreSQL does, or does it actually not do this optimization for you?