I'd read somewhere [1] that the built-in Python sort function has a lot of good / clever optimizations too, though maybe not of the same kinds that you describe, i.e. may not be at machine language level. Tim Peters did a lot of that, per what I read, though others may have too.
[1] Think I read it in the Python Cookbook, 2nd Edition, which is a very good book, BTW.
Different kinds of optimizations. Timsort tries to collect runs and falls back on insertion sort for them; it exploits the fact that much real-world data is already partially sorted to reduce the number of comparisons made. This optimization exploits the fact that MapReduce keys are always strings in contiguous areas of memory, and are often fairly large, to compare them really quickly.
[1] Think I read it in the Python Cookbook, 2nd Edition, which is a very good book, BTW.