Bug 11707 - [3.4 Regression] [new unroller] constants not propagated in unrolled loop iterations with a conditional
Summary: [3.4 Regression] [new unroller] constants not propagated in unrolled loop ite...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 3.4.0
: P1 normal
Target Milestone: 4.0.0
Assignee: Zdenek Dvorak
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: missed-optimization, patch
Depends on:
Blocks: 22366
  Show dependency treegraph
 
Reported: 2003-07-29 13:34 UTC by Richard Biener
Modified: 2006-02-28 08:43 UTC (History)
3 users (show)

See Also:
Host: i686-pc-linux-gnu
Target:
Build:
Known to work: 3.3 4.0.0
Known to fail: 3.4.0 3.4.3
Last reconfirmed: 2004-08-12 07:44:21


Attachments
Patch moving the jump bypassing after loop optimizer (428 bytes, patch)
2003-09-29 12:30 UTC, Zdenek Dvorak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2003-07-29 13:34:36 UTC
The following testcase fails to cprop:

int foo2()
{
        unsigned int n = 5, i;
        int a = 1;
        for (i=0; i<2; ++i) {
                n /= 2;
                if (n)
                        a *= a;
        }
        return a;
}

while if omitting the if (n) check which is always true, cprop works and
the result is computed at compile time.

The lack of this optimization hurts optimization of libstdc++ pow(T, int)
implementation.
Comment 1 Andrew Pinski 2003-07-29 13:40:26 UTC
I can confirm this on the mainline (20030729) but with the rtlopt branch, gcc produces great code:
_foo2:
        li r3,1
        blr
(which is just return 1; for you non powerpc asm readers.)
Also with the tree-ssa branch (20030718) with -funroll-loops, gcc produces the same code as 
above.

So when either rtlopt or tree-ssa branch is rolled into gcc, this will be fixed.
Comment 2 Richard Biener 2003-07-29 13:45:01 UTC
Subject: Re:  constants not propagated in unrolled
 loop iterations with a conditional

On 29 Jul 2003, pinskia at physics dot uc dot edu wrote:

> I can confirm this on the mainline (20030729) but with the rtlopt branch, gcc produces great code:
> _foo2:
>         li r3,1
>         blr
> (which is just return 1; for you non powerpc asm readers.)
> Also with the tree-ssa branch (20030718) with -funroll-loops, gcc produces the same code as
> above.
>
> So when either rtlopt or tree-ssa branch is rolled into gcc, this will be fixed.

This probably means its not going to be fixed for 3.4? Can you assign the
PR to Zdenek Dvorak? Maybe there are a few hunks from rtlopt that fixes
this.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

Comment 3 Richard Biener 2003-07-29 14:07:44 UTC
This is actually a regression from 2.95.3 (-O2 -funroll-loops):

gcc2_compiled.:
.text
        .align 16
.globl foo2
        .type    foo2,@function
foo2:
        pushl %ebp
        movl %esp,%ebp
        movl $1,%eax
        movl %ebp,%esp
        popl %ebp
        ret

and a regression from 3.3 to which this is a regression from 2.92.3:

        .text
        .p2align 4,,15
.globl foo2
        .type   foo2, @function
foo2:
        pushl   %ebp
        movl    $1, %eax        #  a
        movl    %esp, %ebp
        popl    %ebp
        imull   %eax, %eax      #  a,  a
        ret

and of course worst for gcc3.4:

        .text
        .p2align 4,,15
.globl foo2
        .type   foo2, @function
foo2:
        pushl   %ebp    #
        movl    $5, %eax        #, n
        shrl    %eax    # tmp62 
        testl   %eax, %eax      # n
        movl    %esp, %ebp      #, 
        movl    $1, %edx        #, a
        je      .L11    #,      
        imull   %edx, %edx      # a, a
.L11:
        shrl    %eax    # n     
        testl   %eax, %eax      # n
        je      .L4     #,      
        imull   %edx, %edx      # a, a
.L4:
        popl    %ebp    #
        movl    %edx, %eax      # a, <result>
        ret

With gcc3.4 and -fold-unroll-loops we get back to 2.95.3 code:

       .text
        .p2align 4,,15
.globl foo2
        .type   foo2, @function
foo2:
        pushl   %ebp    #
        movl    $1, %eax        #, <result>
        movl    %esp, %ebp      #,
        popl    %ebp    #
        ret

So can this be targeted at 3.3 and marked as regression? Thanks.
Comment 4 Zdenek Dvorak 2003-09-29 12:30:43 UTC
Created attachment 4860 [details]
Patch moving the jump bypassing after loop optimizer
Comment 5 Zdenek Dvorak 2003-09-29 12:31:20 UTC
This is just a pass ordering problem. Copy propagation in jump bypassing
is being run between old and new loop unroller; so it acts on already unrolled
loop with -fold-unroll-loops, but is not able to do anything senseful
with -funroll-loops.

On rtlopt-branch I have moved unroller before jump bypassing, this is why it 
does not reproduce there.

On tree-ssa constant propagation is able to detect that a*=a is no-op in this
case. Cool.

Anyway, the attached patch fixes the problem.  I believe moving jump bypassing
after unroller should not harm anything.
Comment 6 Richard Biener 2003-09-29 14:14:15 UTC
Subject: Re:  [3.4 Regression] [new unroller] constants
 not propagated in unrolled loop iterations with a conditional

On 29 Sep 2003, rakdver at gcc dot gnu dot org wrote:

> Anyway, the attached patch fixes the problem.  I believe moving jump bypassing
> after unroller should not harm anything.

This may be the fix for the 3.4 regression. What about the 3.3 regression?
I.e. the unneccessary imull produced on x86? This is a regression to
2.95.3 as noted in comment #3.

Maybe someone could look what is preventing constant folding from being
done here. Roger?

Thanks,

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

Comment 7 Richard Biener 2003-09-29 16:01:26 UTC
Subject: Re:  [3.4 Regression] [new unroller] constants
 not propagated in unrolled loop iterations with a conditional

On 29 Sep 2003, rakdver at gcc dot gnu dot org wrote:

> ------- Additional Comments From rakdver at gcc dot gnu dot org  2003-09-29 12:31 -------
> This is just a pass ordering problem. Copy propagation in jump bypassing
> is being run between old and new loop unroller; so it acts on already unrolled
> loop with -fold-unroll-loops, but is not able to do anything senseful
> with -funroll-loops.
>
> On rtlopt-branch I have moved unroller before jump bypassing, this is why it
> does not reproduce there.
>
> On tree-ssa constant propagation is able to detect that a*=a is no-op in this
> case. Cool.
>
> Anyway, the attached patch fixes the problem.  I believe moving jump bypassing
> after unroller should not harm anything.

The attached patch fixes the regression to the 3.3 behavior which is still
a regression from 2.95.3 (unnecessary imull).

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

Comment 8 Andrew Pinski 2003-12-05 03:33:46 UTC
Zdenek: did you ask for approval of this patch?
Comment 9 Zdenek Dvorak 2003-12-05 21:47:27 UTC
Subject: Re:  [3.4 Regression] [new unroller] constants not propagated in unrolled loop iterations with a conditional

IIRC, no.

Zdenek

> ------- Additional Comments From pinskia at gcc dot gnu dot org  2003-12-05 03:33 -------
> Zdenek: did you ask for approval of this patch?
> 
> -- 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>    Last reconfirmed|2003-07-29 13:40:27         |2003-12-05 03:33:47
>                date|                            |
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11707
> 
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
Comment 10 Andrew Pinski 2004-01-04 08:54:04 UTC
pressimizes-code is not a criticial bug, reducing severity.
Comment 11 Mark Mitchell 2004-03-09 23:27:16 UTC
This patch is too risky for 3.4.0, so I will postpone it until 3.4.1.  However,
if someone would please test the patch on the 3.4 branch and confirm in this
audit trail that the patch works, bootstraps, etc., then we can include it for
3.4.1.
Comment 12 Steven Bosscher 2004-03-21 18:24:04 UTC
Zdenek, will you post this patch for inclusion on mainline? 
Comment 13 Mark Mitchell 2004-06-18 23:41:11 UTC
Nobody has updated the PR to indicate that the patch has been tested; postponing
until GCC 3.4.2.
Comment 14 Steven Bosscher 2004-08-12 07:33:25 UTC
Mainline produces this:  
  
        .file   "t.c"  
        .text  
        .p2align 2,,3  
.globl foo2  
        .type   foo2, @function  
foo2:  
        movl    $1, %eax  
        ret  
        .size   foo2, .-foo2  
        .ident  "GCC: (GNU) 3.5.0 20040811 (experimental)"  
        .section        .note.GNU-stack,"",@progbits  
 
which means 3.5.0 does not fail. 
Comment 15 Mark Mitchell 2004-08-29 19:02:45 UTC
Postponed until GCC 3.4.3.
Comment 16 Mark Mitchell 2004-10-30 19:34:19 UTC
Postponed until GCC 3.4.4.
Comment 17 Mark Mitchell 2004-11-01 00:45:44 UTC
Postponed until GCC 3.4.4.
Comment 18 Richard Biener 2005-01-12 11:05:27 UTC
I can re-confirm that the patch moves 3.4 to the state of 3.3 - i.e. with an
extra imull compared to 2.95 and 4.0.  The patch has bootstrapped with checking
enabled and -funroll-loops on ia64, testing is in process.  I'll formally submit
the patch shortly.

For the imull regression I'll file a separate bug with a possibly reduced testcase.
Comment 19 Richard Biener 2005-02-10 12:46:46 UTC
Patch at
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00656.html
pinged.  Or WONTFIX - it's up to Mark.
Comment 20 Steven Bosscher 2005-02-10 12:48:18 UTC
In reply to comment #13 - I have tested the patch on i686, amd64, ppc, and 
ia64. 
Comment 21 Steven Bosscher 2006-02-27 13:55:48 UTC
With the old loop optimizer gone, moving jump bypassing after loop2 is a quite reasonable thing to do.  Richi, now you have the chance to get your patch in after all, if it's still useful.

Other good thing about it: Maybe this makes another RTL jump threading pass (the one after loop2) even more unuseful...
Comment 22 Richard Biener 2006-02-27 14:03:40 UTC
I don't know - the original testcase was fixed by tree-ssa merge, so, apart from a general pass ordering overhaul there's nothing to be done for this particular bug.
Comment 23 Gabriel Dos Reis 2006-02-28 08:43:16 UTC
Fixed in 4.0 and higher.
Won't fix for 3.4.x