38856 – loop iv detection failure

Bug 38856 - loop iv detection failure

Summary: loop iv detection failure

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.3.2

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2009-01-15 18:24 UTC by Sergei Larin
Modified:	2021-11-28 04:31 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:	4.1.1, 4.3.0, 4.4.0
Last reconfirmed:	2009-01-15 21:17:17

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sergei Larin 2009-01-15 18:24:13 UTC

I apologize if it is a well disguised feature, but I am forced to consider this being a performance regression/bug. 

In the following trivial example:
void
VecADD(
    long long *In1,
    long long *In2,
    long long *Out,
    unsigned int samples
){
  int i;
  for (i = 0; i < samples; i++) {
    Out[i] = In1[i] + In2[i];
  }
}

there is an implicit imprecision in the way C is used - type of 'samples' is unsigned, while type of 'i' is signed. 

The problem on the high level - induction variable analysis fails for this loop, which impairs further tree level loop optimizations from functioning properly (including autoincrement). In my port performance is off by 50% for this loop. GCC 3.4.6 was able to handle this situation fine. 

What I believe to be the problem at the lowest level is a non-minimal (or overly restrictive) SSA representation right before the iv detection:

VecADD (In1, In2, Out, samples)
{
  int i;
  long long int D.1857;
  long long int D.1856;
  long long int * D.1855;
  long long int D.1854;
  long long int * D.1853;
  long long int * D.1852;
  unsigned int D.1851;
  unsigned int i.0;

<bb 2>:

<bb 6>:
  # i_10 = PHI <0(2)>
  i.0_5 = (unsigned int) i_10;
  if (i.0_5 < samples_4(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 3>:
  # i.0_9 = PHI <i.0_3(4), i.0_5(6)>
  # i_14 = PHI <i_1(4), i_10(6)>
  D.1851_6 = i.0_9 * 8;
  D.1852_8 = Out_7(D) + D.1851_6;
  D.1853_12 = In1_11(D) + D.1851_6;
  D.1854_13 = *D.1853_12;
  D.1855_17 = In2_16(D) + D.1851_6;
  D.1856_18 = *D.1855_17;
  D.1857_19 = D.1854_13 + D.1856_18;
  *D.1852_8 = D.1857_19;
  i_20 = i_14 + 1;

<bb 4>:
  # i_1 = PHI <i_20(3)>
  i.0_3 = (unsigned int) i_1;
  if (i.0_3 < samples_4(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  return;
}

The two PHI nodes in the beginning of BB3 break the iv detection. Same example when types of ‘i’ and ‘samples’ would match will be analyzed perfectly fine with the SSA at the same point looking like this:

VecADD (In1, In2, Out, samples)
{
  int i;
  long long int D.1857;
  long long int D.1856;
  long long int * D.1855;
  long long int D.1854;
  long long int * D.1853;
  long long int * D.1852;
  unsigned int D.1851;
  unsigned int i.0;

<bb 2>:

<bb 6>:
  # i_9 = PHI <0(2)>
  if (i_9 < samples_3(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 3>:
  # i_13 = PHI <i_1(4), i_9(6)>
  i.0_4 = (unsigned int) i_13;
  D.1851_5 = i.0_4 * 8;
  D.1852_7 = Out_6(D) + D.1851_5;
  D.1853_11 = In1_10(D) + D.1851_5;
  D.1854_12 = *D.1853_11;
  D.1855_16 = In2_15(D) + D.1851_5;
  D.1856_17 = *D.1855_16;
  D.1857_18 = D.1854_12 + D.1856_17;
  *D.1852_7 = D.1857_18;
  i_19 = i_13 + 1;

<bb 4>:
  # i_1 = PHI <i_19(3)>
  if (i_1 < samples_3(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  return;
}

On one hand I seem to understand that a danger of signed/unsigned overflow at increment can force this kind of conservatism, but on the high level this situation was handled fine by gcc 3.4.6 and is handled with no issues by another SSA based compiler. If there is a way to relax this strict interpretation of C rules by GCC 4.3.2, I would gladly learn about it, but my brief flag mining exercise yielded no results. Thank you.

Comment 1 Andrew Pinski 2009-01-15 18:41:51 UTC

Well first off, iv-opts on the tree level should be the place where autoincrement is helped out.  See PR 31849 which I think this is a duplicate of.

Comment 2 Sergei Larin 2009-01-15 21:03:32 UTC

Andrew, thank you for the prompt reply. 

I have seen PR 31849 and used the patch suggested. Without it autoincrement would not work at all for either case. But the patch seems to deal with the case when iv _were_ detected and generally alters cost computation to take that addressing mode into account, while in this case I see that ivs are _not_ detected in the first place.  This seems to be a more fundamental issue - should SSA be made "minimal" at that point or should iv detection be improved to "see" trough “aliased” PHI node – I lack better words to describe it.

Comment 3 Andrew Pinski 2009-01-15 21:13:04 UTC

This bug is a bit funny, autoincrement is not the issue, just the detecting of the induction variable's limits correctly.  For an example I noticed that if I compile this on a LP64 target, the induction variable is found correctly.

Comment 4 Andrew Pinski 2009-01-15 21:17:17 UTC

Here is a good testcase:
void
VecADD(
    int *In1,
    int *In2,
    int *Out,
    unsigned int samples
){
  int i;
  for (i = 0; i < samples; i++) {
    Out[i] = In1[i];
  }
}

This testcase should be vectorized with -maltivec -O3 on PowerPC but is not while it is with -m64.

Comment 5 Richard Biener 2009-01-16 09:57:24 UTC

I think this boils down to the usual POINTER_PLUS fallout.

Comment 6 pinskia@gmail.com 2009-01-16 11:46:22 UTC

Subject: Re:  loop iv detection failure



Sent from my iPhone

On Jan 16, 2009, at 1:57 AM, "rguenth at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #5 from rguenth at gcc dot gnu dot org  2009-01-16  
> 09:57 -------
> I think this boils down to the usual POINTER_PLUS fallout.

It failed in 4.1 also so nope :).

>
>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38856
>

Comment 7 Andrew Pinski 2009-01-16 18:53:45 UTC

Here is a testcase where IV can be found but we end up with two copies of the same register in the loop:
int f(unsigned long s, unsigned long *c)
{
  long j;
  for(j = 0;j < s; j++)
    {
      c[j]=j;
    }
}

Which is basically the same issue.

Comment 8 Sergei Larin 2010-12-01 20:48:47 UTC

Hello, 

  This has not been touched in a while, nevertheless the issue still exist with 4.5.1

Comment 9 Andrew Pinski 2021-11-28 04:31:48 UTC

I think the original issue has been fixed since GCC 4.5.0 or so.