Bug 43182 - GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]
Summary: GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks:
 
Reported: 2010-02-25 23:37 UTC by Changpeng Fang
Modified: 2021-07-26 07:45 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-07-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Changpeng Fang 2010-02-25 23:37:34 UTC
gcc 4.5 can not vectorize this simple loop:

void foo(int a[], int n) {
 int i;
 for(i=1; i< n; i++)
  a[i] = a[0];
}

"gcc -O3 -fdump-tree-vect-all -c foo.c" shows:
foo.c:3: note: not vectorized: unhandled data-ref 
foo.c:3: note: bad data references.
foo.c:1: note: vectorized 0 loops in function.

It seems gcc gets confused at a[0] and gives up vectorization. There
is no dependence in this loop, and we should teach gcc to handle a[0]
to vectorize it.
Comment 1 Andrew Pinski 2010-02-25 23:45:01 UTC
Actually a[0] should be load hoisted from the loop as it not changed from inside the loop at all.
Comment 2 Andrew Pinski 2010-02-25 23:50:10 UTC
So currently inside LIM (which does load motion in general):
  D.2724_7 = a_6(D) + D.2723_5;
  D.2725_8 = *a_6(D);
  *D.2724_7 = D.2725_8;

But LIM/alias oracle does not know that D.2723_5 has a range of [4, n_3*4] which means D.2724_7 can never equal a_6 so we don't pull out the load from a_6.
Comment 3 Andrew Pinski 2010-02-25 23:54:15 UTC
Related to PR 29751 but that only does a simple method and does not handle this case as we need range info.
Comment 4 Changpeng Fang 2010-02-26 18:53:42 UTC
Here is another similar case but more general. We know that a(j) and a(i)
never access the same memory location. intel ifort can vectorize this triangular
loop:

      do 10 j = 1,n
         do 20 i = j+1, n
            a(i) = a(i) - aa(i,j) * a(j)
  20     continue
  10  continue
Comment 5 Andrew Pinski 2010-02-26 18:55:33 UTC
(In reply to comment #4)
> Here is another similar case but more general.

Actually it is a totally different case.  Please file a new bug with that case; though there might already be a bug about that one.
Comment 6 Changpeng Fang 2010-02-26 19:06:41 UTC
> 
> Actually it is a totally different case.  Please file a new bug with that case;
> though there might already be a bug about that one.
> 

I could not see the difference even though j is not a compile-time constant. (it
is an invariant to the innermost loop). I can say:

GCC does not pull out a[j] from loop that changes a[i] for i:[j+1,n]
Comment 7 Andrew Pinski 2021-07-26 07:45:34 UTC
So even though we can vectorize this loop these days, the non-vectorized loop still has the load each iteration.
at -O2:
.L3:
        movl    (%ecx), %edx
        addl    $4, %eax
        movl    %edx, -4(%eax)
        cmpl    %ebx, %eax
        jne     .L3