Hacker News new | past | comments | ask | show | jobs | submit login

Does a multiply/shift pair take a different amount of time based on the operands? Otherwise, I don't see how this would make a difference.



The compiler has to assume that your operands might be large enough that they would overflow after a straight multiply.


As acqq/I have mentioned below, there are ways of telling the compiler that the overflow can't multiply so it can freely optimize these away.


My code was two instructions: imul eax,ecx,67h sar eax,0Ah. The compiler generated code was mov eax,66666667h imul ecx sar edx,2 mov eax,edx shr eax,1Fh add eax,edx.


When input unsigned but before if < 99, output unsigned int:

clang 7

        mov     ecx, edi
        mov     eax, 3435973837
        imul    rax, rcx
        shr     rax, 35
gcc 8.2

        mov     eax, edi
        mov     edx, -858993459
        mul     edx
        mov     eax, edx
        shr     eax, 3
msvc 19

        mov     eax, -858993459  ; cccccccdH
        mul     edx
        shr     edx, 3
        mov     eax, edx
icc 19

        mov       eax, 1717986919                               
        imul      edx                                           
        sar       edx, 2                          
        mov       eax, edx
The same, but when input unsigned char:

clang 7

        movzx   eax, dil
        imul    eax, eax, 205
        shr     eax, 11
gcc 8.2

        movzx   edi, dil
        lea     eax, [rdi+rdi*4]
        lea     eax, [rdi+rax*8]
        lea     eax, [rax+rax*4]
        shr     ax, 11
        movzx   eax, al
msvc 19

        movzx   ecx, al
        mov     eax, -858993459  ; cccccccdH
        mul     ecx
        shr     edx, 3
        mov     eax, edx
icc 19

        mov       eax, 1717986919
        imul      edi            
        sar       edx, 2        
        mov       eax, edx

clang and gcc with -O3, msvc -O2


This was exactly what I was about to mention; I can reproduce your results in Clang but it seems GCC doesn't want to optimize this? Here's what Godbolt gives me: https://godbolt.org/z/aFTdur


Just write a function “mydiv10” which returns if x < 99 { x /10 and 0 otherwise and try again.


Like this:

https://godbolt.org/z/bS2yzB

But also note that when the input is unsigned short we get:

https://godbolt.org/z/sbDbMU

        movzx   eax, di
        imul    eax, eax, 52429
        shr     eax, 19


It's conceivable that multiplication time might be proportional to the number of 1 bits in the multiplier.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: