[Bug rtl-optimization/48077] New: [Code Improvement] Use multiplication by magic number for integer division by constant
wschmidt at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Mar 11 16:41:00 GMT 2011
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48077
Summary: [Code Improvement] Use multiplication by magic number
for integer division by constant
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: wschmidt@gcc.gnu.org
CC: bergner@gcc.gnu.org, pthaugen@gcc.gnu.org,
meissner@gcc.gnu.org, dje.gcc@gmail.com,
wschmidt@gcc.gnu.org
Target: powerpc64-linux, others
We noticed a significant performance difference between gcc and xlc at all
optimization levels for a small application that repeatedly takes a value
modulo an integer constant. Gcc expands division by a constant into a sequence
of shifts, adds, and subtracts. Xlc converts division by a constant into
multiplication by a corresponding magic number. See H.S. Warren's _Hacker's
Delight_ for details. See http://www.hackersdelight.org/magic.htm for a magic
number calculator.
A small test case shows the difference:
-----------------------------------------
int *values;
int *hash_table;
int demonstrate (int i)
{
return hash_table [values [i] % 1000];
}
-----------------------------------------
On powerpc64-linux, xlc -c -S -O3 -q32 test.c produces:
demonstrate:
addis r4,r0,values@ha
rlwinm r3,r3,2,0,29
addis r6,r0,hash_table@ha
addis r5,r0,4194
addi r0,r5,19923
lwz r4,values@l(r4)
lwz r5,hash_table@l(r6)
lwzx r3,r4,r3
rlwinm r4,r3,1,31,31
mulhw r0,r3,r0
srawi r0,r0,6
add r6,r0,r4
rlwinm r0,r6,10,0,21
rlwinm r4,r6,5,0,26
subf r0,r4,r0
rlwinm r6,r6,3,0,28
add r4,r0,r6
subf r0,r4,r3
rlwinm r3,r0,2,0,29
lwzx r3,r5,r3
bclr BO_ALWAYS,CR0_LT
Note the multiply by 274877907, built by:
addis r5,r0,4194
addi r0,r5,19923
If you plug 1000 into the magic number calculator at the website above, that's
the magic number produced.
By contrast, gcc -c -S -O3 -m32 test.c produces:
.L.demonstrate:
addis 9,2,.LC0@toc@ha
ld 9,.LC0@toc@l(9)
sldi 3,3,2
addis 11,2,.LC1@toc@ha
ld 9,0(9)
ld 10,.LC1@toc@l(11)
lwzx 9,9,3
extsw 0,9
sldi 8,0,7
sldi 11,0,9
add 11,8,11
subf 11,0,11
sldi 11,11,3
add 11,11,0
sldi 8,11,2
add 11,11,8
sldi 11,11,2
add 11,11,0
sldi 11,11,3
add 11,11,0
sldi 8,11,3
subf 11,11,8
sldi 11,11,4
add 0,11,0
sldi 11,0,2
subf 0,0,11
srdi 0,0,32
srawi 11,9,31
srawi 0,0,6
subf 0,11,0
slwi 8,0,7
slwi 11,0,2
subf 11,11,8
add 0,11,0
ld 11,0(10)
slwi 0,0,3
subf 9,0,9
extsw 9,9
sldi 9,9,2
lwax 3,11,9
blr
Note the long string of dependencies in the gcc code; not only is the code
quite a bit larger, but there is very little ILP.
Consider converting integer division to multiplication by a magic number when
appropriate for the target machine.
More information about the Gcc-bugs
mailing list