This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c/38856] New: loop iv detection failure, SSA autoincrement


I apologize if it is a well disguised feature, but I am forced to consider this
being a performance regression/bug. 

In the following trivial example:
void
VecADD(
    long long *In1,
    long long *In2,
    long long *Out,
    unsigned int samples
){
  int i;
  for (i = 0; i < samples; i++) {
    Out[i] = In1[i] + In2[i];
  }
}

there is an implicit imprecision in the way C is used - type of 'samples' is
unsigned, while type of 'i' is signed. 

The problem on the high level - induction variable analysis fails for this
loop, which impairs further tree level loop optimizations from functioning
properly (including autoincrement). In my port performance is off by 50% for
this loop. GCC 3.4.6 was able to handle this situation fine. 

What I believe to be the problem at the lowest level is a non-minimal (or
overly restrictive) SSA representation right before the iv detection:

VecADD (In1, In2, Out, samples)
{
  int i;
  long long int D.1857;
  long long int D.1856;
  long long int * D.1855;
  long long int D.1854;
  long long int * D.1853;
  long long int * D.1852;
  unsigned int D.1851;
  unsigned int i.0;

<bb 2>:

<bb 6>:
  # i_10 = PHI <0(2)>
  i.0_5 = (unsigned int) i_10;
  if (i.0_5 < samples_4(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 3>:
  # i.0_9 = PHI <i.0_3(4), i.0_5(6)>
  # i_14 = PHI <i_1(4), i_10(6)>
  D.1851_6 = i.0_9 * 8;
  D.1852_8 = Out_7(D) + D.1851_6;
  D.1853_12 = In1_11(D) + D.1851_6;
  D.1854_13 = *D.1853_12;
  D.1855_17 = In2_16(D) + D.1851_6;
  D.1856_18 = *D.1855_17;
  D.1857_19 = D.1854_13 + D.1856_18;
  *D.1852_8 = D.1857_19;
  i_20 = i_14 + 1;

<bb 4>:
  # i_1 = PHI <i_20(3)>
  i.0_3 = (unsigned int) i_1;
  if (i.0_3 < samples_4(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  return;
}

The two PHI nodes in the beginning of BB3 break the iv detection. Same example
when types of ?i? and ?samples? would match will be analyzed perfectly fine
with the SSA at the same point looking like this:

VecADD (In1, In2, Out, samples)
{
  int i;
  long long int D.1857;
  long long int D.1856;
  long long int * D.1855;
  long long int D.1854;
  long long int * D.1853;
  long long int * D.1852;
  unsigned int D.1851;
  unsigned int i.0;

<bb 2>:

<bb 6>:
  # i_9 = PHI <0(2)>
  if (i_9 < samples_3(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 3>:
  # i_13 = PHI <i_1(4), i_9(6)>
  i.0_4 = (unsigned int) i_13;
  D.1851_5 = i.0_4 * 8;
  D.1852_7 = Out_6(D) + D.1851_5;
  D.1853_11 = In1_10(D) + D.1851_5;
  D.1854_12 = *D.1853_11;
  D.1855_16 = In2_15(D) + D.1851_5;
  D.1856_17 = *D.1855_16;
  D.1857_18 = D.1854_12 + D.1856_17;
  *D.1852_7 = D.1857_18;
  i_19 = i_13 + 1;

<bb 4>:
  # i_1 = PHI <i_19(3)>
  if (i_1 < samples_3(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  return;
}

On one hand I seem to understand that a danger of signed/unsigned overflow at
increment can force this kind of conservatism, but on the high level this
situation was handled fine by gcc 3.4.6 and is handled with no issues by
another SSA based compiler. If there is a way to relax this strict
interpretation of C rules by GCC 4.3.2, I would gladly learn about it, but my
brief flag mining exercise yielded no results. Thank you.


-- 
           Summary: loop iv detection failure, SSA autoincrement
           Product: gcc
           Version: 4.3.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sergei_lus at yahoo dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38856


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]