Hacker News new | past | comments | ask | show | jobs | submit login
Libdivide, optimized integer division (by ridiculous_fish) (libdivide.com)
85 points by yan on Dec 30, 2010 | hide | past | favorite | 6 comments



There's a scheme described in this paper: http://gmplib.org/~tege/division-paper.pdf which will work in practice for 2^b bit divisors (e.g. 2^b = 32 or 64). It's trivial to implement and amazingly doing a precomputation plus division actually beats the chip. Normally these methods only beat the chip when doing the division part after some precomputation, not when you include the time for the precomputation too! This trick is used internally in the GMP library and its fork MPIR (and various other places).


Most compilers already do this. For the theory behind it, see Hank Warren's *Hacker's Delight", a book that ought to be in everyone's library.


Third sentence on the page:

Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime.


"Hacker's Delight" happens to be in O'Reilly's Safari, I'm happy to notice:

http://my.safaribooksonline.com/book/networking/security/020...

Thanks for the nudge to check; you've put another item on my reading list.


I wonder if there are a lot of edge cases (large numbers that aren't easily decomposed into shifts/additions) where the performance decreases a lot. If these optimizations are cheap, why aren't they used everywhere?


No, actually the performance is good for almost all divisors. I had the same thought you did, so I got the SVN and ran the benchmark. If we are try trust their timing, on a Core i7 the fast divisions were almost all around 4x the speed of the native. There were surprisingly small differences between the 2^n numbers and the harder ones --- the magic of long pipelines, I guess.

These optimizations aren't used much because they are specific to the divisor used. For a single division, it would be faster just to let the silicon do the work. They only help when you are repeatedly doing division by the same denominator, which is rarely a bottleneck. But when it is (repeated integer divisions but not hardcoded), this looks like a fine solution!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: