48812 – optimizing integer power of 2

Bug 48812 - optimizing integer power of 2

Summary: optimizing integer power of 2

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.4.5

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2011-04-28 20:24 UTC by Matthieu CASTET
Modified:	2021-07-26 18:39 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-07-26 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matthieu CASTET 2011-04-28 20:24:44 UTC

gcc correctly optimize
int divu(uint a, uint b)
{
        return a / (1<<b);
}
to
divu:
        mov     r0, r0, lsr r1
        mov     pc, lr

but it fails to optimize
int divu3(uint a, uint b)
{
        return a / ((1U<<b) / 4);
}

gcc generate
(arm-linux-gnueabi-gcc -Os  p.c -march=armv4 -mno-thumb-interwork -S)
divu3:
        stmfd   sp!, {r3, lr}
        mov     r3, #1
        mov     r1, r3, asl r1
        mov     r1, r1, lsr #2
        bl      {{{__}}}aeabi_uidiv
        ldmfd   sp!, {r3, pc}
or
(gcc p.c -S -O3 -fomit-frame-pointer -mregparm=3)
divu3:
        pushl   %ebx
        movl    %edx, %ecx
        movl    $1, %ebx
        xorl    %edx, %edx
        sall    %cl, %ebx
        shrl    $2, %ebx
        divl    %ebx
        popl    %ebx
        ret

but ((1U<<b) / 4) is 0 or a power of 2. Div by 0 is undefined in C ( C99 6.5.5p5)

So why can we generate : 
        mov     r3, #1
        mov     r1, r3, asl r1
        mov     r1, r1, lsr #2
        mov     r0, r0, lsr r1

?

Note that gcc correctly optimize
int divu5(uint a, uint b)
{
        return a / ((1U<<b) * 4);
}

Comment 1 Richard Biener 2011-04-29 09:43:16 UTC

We do not exploit the fact that shifts bigger than the width of the type
are undefined (in fact we even try to preserve the fact that some CPUs
truncate the shift count when constant folding ...).

We also have to make sure the shift count does not get negative, which
we can't in this case.  Thus (1U<<(b-2)) is not equivalent to
(1U<<b) / 4.

Comment 2 Matthieu CASTET 2011-04-29 19:18:43 UTC

> We also have to make sure the shift count does not get negative, which
we can't in this case.  Thus (1U<<(b-2)) is not equivalent to
(1U<<b) / 4.

yes, but 
a / c is undefined for c = 0
if c = (1U<<b) / 4 is not 0, then (1U<<b) / 4 is equivalent to (1U<<(b-2))
Then a / ((1U<<b) / 4) is equivalent to a / (1U<<(b-2))
Then a / ((1U<<b) / 4) is equivalent to (a >> (b-2))

But I agree it is not trivial optimisation

Comment 3 Andrew Pinski 2011-07-23 22:53:44 UTC

Confirmed.