Bug 48812 - optimizing integer power of 2
Summary: optimizing integer power of 2
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.4.5
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2011-04-28 20:24 UTC by Matthieu CASTET
Modified: 2021-07-26 18:39 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-07-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthieu CASTET 2011-04-28 20:24:44 UTC
gcc correctly optimize
int divu(uint a, uint b)
{
        return a / (1<<b);
}
to
divu:
        mov     r0, r0, lsr r1
        mov     pc, lr

but it fails to optimize
int divu3(uint a, uint b)
{
        return a / ((1U<<b) / 4);
}

gcc generate
(arm-linux-gnueabi-gcc -Os  p.c -march=armv4 -mno-thumb-interwork -S)
divu3:
        stmfd   sp!, {r3, lr}
        mov     r3, #1
        mov     r1, r3, asl r1
        mov     r1, r1, lsr #2
        bl      {{{__}}}aeabi_uidiv
        ldmfd   sp!, {r3, pc}
or
(gcc p.c -S -O3 -fomit-frame-pointer -mregparm=3)
divu3:
        pushl   %ebx
        movl    %edx, %ecx
        movl    $1, %ebx
        xorl    %edx, %edx
        sall    %cl, %ebx
        shrl    $2, %ebx
        divl    %ebx
        popl    %ebx
        ret

but ((1U<<b) / 4) is 0 or a power of 2. Div by 0 is undefined in C ( C99 6.5.5p5)

So why can we generate : 
        mov     r3, #1
        mov     r1, r3, asl r1
        mov     r1, r1, lsr #2
        mov     r0, r0, lsr r1

?

Note that gcc correctly optimize
int divu5(uint a, uint b)
{
        return a / ((1U<<b) * 4);
}
Comment 1 Richard Biener 2011-04-29 09:43:16 UTC
We do not exploit the fact that shifts bigger than the width of the type
are undefined (in fact we even try to preserve the fact that some CPUs
truncate the shift count when constant folding ...).

We also have to make sure the shift count does not get negative, which
we can't in this case.  Thus (1U<<(b-2)) is not equivalent to
(1U<<b) / 4.
Comment 2 Matthieu CASTET 2011-04-29 19:18:43 UTC
> We also have to make sure the shift count does not get negative, which
we can't in this case.  Thus (1U<<(b-2)) is not equivalent to
(1U<<b) / 4.

yes, but 
a / c is undefined for c = 0
if c = (1U<<b) / 4 is not 0, then (1U<<b) / 4 is equivalent to (1U<<(b-2))
Then a / ((1U<<b) / 4) is equivalent to a / (1U<<(b-2))
Then a / ((1U<<b) / 4) is equivalent to (a >> (b-2))

But I agree it is not trivial optimisation
Comment 3 Andrew Pinski 2011-07-23 22:53:44 UTC
Confirmed.