[Bug rtl-optimization/48077] New: [Code Improvement] Use multiplication by magic number for integer division by constant

Fri Mar 11 16:41:00 GMT 2011

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48077

           Summary: [Code Improvement] Use multiplication by magic number
                    for integer division by constant
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: wschmidt@gcc.gnu.org
                CC: bergner@gcc.gnu.org, pthaugen@gcc.gnu.org,
                    meissner@gcc.gnu.org, dje.gcc@gmail.com,
                    wschmidt@gcc.gnu.org
            Target: powerpc64-linux, others

We noticed a significant performance difference between gcc and xlc at all
optimization levels for a small application that repeatedly takes a value
modulo an integer constant.  Gcc expands division by a constant into a sequence
of shifts, adds, and subtracts.  Xlc converts division by a constant into
multiplication by a corresponding magic number.  See H.S. Warren's _Hacker's
Delight_ for details.  See http://www.hackersdelight.org/magic.htm for a magic
number calculator.

A small test case shows the difference:

-----------------------------------------
int *values;
int *hash_table;

int demonstrate (int i)
{
  return hash_table [values [i] % 1000];
}
-----------------------------------------

On powerpc64-linux, xlc -c -S -O3 -q32 test.c produces:

demonstrate:
    addis      r4,r0,values@ha
    rlwinm     r3,r3,2,0,29
    addis      r6,r0,hash_table@ha
    addis      r5,r0,4194
    addi       r0,r5,19923
    lwz        r4,values@l(r4)
    lwz        r5,hash_table@l(r6)
    lwzx       r3,r4,r3
    rlwinm     r4,r3,1,31,31
    mulhw      r0,r3,r0
    srawi      r0,r0,6
    add        r6,r0,r4
    rlwinm     r0,r6,10,0,21
    rlwinm     r4,r6,5,0,26
    subf       r0,r4,r0
    rlwinm     r6,r6,3,0,28
    add        r4,r0,r6
    subf       r0,r4,r3
    rlwinm     r3,r0,2,0,29
    lwzx       r3,r5,r3
    bclr       BO_ALWAYS,CR0_LT

Note the multiply by 274877907, built by:

    addis      r5,r0,4194
    addi       r0,r5,19923

If you plug 1000 into the magic number calculator at the website above, that's
the magic number produced.

By contrast, gcc -c -S -O3 -m32 test.c produces:

.L.demonstrate:
    addis 9,2,.LC0@toc@ha
    ld 9,.LC0@toc@l(9)
    sldi 3,3,2
    addis 11,2,.LC1@toc@ha
    ld 9,0(9)
    ld 10,.LC1@toc@l(11)
    lwzx 9,9,3
    extsw 0,9
    sldi 8,0,7
    sldi 11,0,9
    add 11,8,11
    subf 11,0,11
    sldi 11,11,3
    add 11,11,0
    sldi 8,11,2
    add 11,11,8
    sldi 11,11,2
    add 11,11,0
    sldi 11,11,3
    add 11,11,0
    sldi 8,11,3
    subf 11,11,8
    sldi 11,11,4
    add 0,11,0
    sldi 11,0,2
    subf 0,0,11
    srdi 0,0,32
    srawi 11,9,31
    srawi 0,0,6
    subf 0,11,0
    slwi 8,0,7
    slwi 11,0,2
    subf 11,11,8
    add 0,11,0
    ld 11,0(10)
    slwi 0,0,3
    subf 9,0,9
    extsw 9,9
    sldi 9,9,2
    lwax 3,11,9
    blr

Note the long string of dependencies in the gcc code; not only is the code
quite a bit larger, but there is very little ILP.

Consider converting integer division to multiplication by a magic number when
appropriate for the target machine.