Bug 36905 - [4.3/4.4/4.5 Regression] IV-opts needs a little help with a[i+1]
Summary: [4.3/4.4/4.5 Regression] IV-opts needs a little help with a[i+1]
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.4.0
: P2 normal
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 39201
  Show dependency treegraph
 
Reported: 2008-07-23 02:09 UTC by Drea Pinski
Modified: 2010-03-02 18:15 UTC (History)
4 users (show)

See Also:
Host:
Target: powerpc-linux
Build:
Known to work: 3.4.0
Known to fail: 4.1.0 4.4.0 4.3.0
Last reconfirmed: 2008-09-20 15:23:43


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Drea Pinski 2008-07-23 02:09:34 UTC
Take:
void fred(unsigned short in, unsigned short *out1)
{
    __SIZE_TYPE__ i;
    for (i=0;i<100;i++)
        out1[i+1] = in;
}

For PPC we currently generate:
.L2:
        addi 9,9,1
        slwi 0,9,1
        sthx 3,4,0
        bdnz .L2

But if change the code just so slightly to:
void fred(unsigned short in, unsigned short *out1)
{
    __SIZE_TYPE__ i;
    out1 ++;
    for (i=0;i<100;i++)
      out1[i] = in;
}
--- CUT ---
we get great code:
.L2:
        sthu 3,2(4)
        bdnz .L2
Even without update we still get:
.L2:
        sth 3,2(4)
        addi 4,4,2
        bdnz .L2

Even if we use the variable out1 afterwards (by a return), we still get the better code in the second case.
Comment 1 Drea Pinski 2008-07-23 02:16:05 UTC
In fact this is one case where the old loop.c gets it correct :(.
4.1.1 with IV-opts off:
.L2:
        sth 3,0(9)
        addi 9,9,2
        bdnz .L2

Which means I can declare this as a regression from 3.4.x.
Comment 2 Richard Biener 2008-08-04 18:38:53 UTC
I believe we have a dup for this issue - it looks very familiar.
Comment 3 Richard Biener 2008-09-20 15:23:43 UTC
Confirmed anyway.  Time for a IVOPTs meta-bug ...
Comment 4 Paolo Bonzini 2009-02-05 08:26:54 UTC
We miss that out1_5+2 could be hoisted out of the loop and used as a base object

use 0
  address
  in statement *D.1259_6 = in_7(D);

  at position *D.1259_6
  type short unsigned int *
  base out1_5(D) + 2
  step 2
  base object (void *) out1_5(D)
  related candidates


I also tried disabling DOM and PRE/FRE so that we get code that is supposedly easier to optimize, but then IVopts produces

  D.1281_16 = out1_5(D) + ivtmp.17_13;
  MEM[base: D.1281_16, offset: 2]{*D.1259} = in_7(D);

and again does not recognize that it can be reassociated and moved out of the loop.
Comment 5 Joseph S. Myers 2009-03-31 20:55:55 UTC
Closing 4.2 branch.
Comment 6 Richard Biener 2009-08-04 12:29:19 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 7 Alexander Monakov 2010-02-16 17:43:58 UTC
(In reply to comment #6)

Looks like this has been fixed.  We do generate good code:

fred:
        li 0,100
        mtctr 0
.L2:
        sthu 3,2(4)
        bdnz .L2
        blr
        .size   fred, .-fred
        .ident  "GCC: (GNU) 4.5.0 20100215 (experimental)"

And ivopts dump is quite sane:

<bb 2>:
  ivtmp.13_17 = (unsigned int) out1_5(D);

<bb 3>:
  # i_13 = PHI <i_3(4), 0(2)>
  # ivtmp.13_15 = PHI <ivtmp.13_16(4), ivtmp.13_17(2)>
  i_3 = i_13 + 1;
  ivtmp.13_16 = ivtmp.13_15 + 2;
  D.2031_18 = (void *) ivtmp.13_16;
  MEM[base: D.2031_18] = in_7(D);
  if (i_3 != 100)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;
Comment 8 Drea Pinski 2010-03-02 18:15:34 UTC
Fixed on the trunk.