Bug 32131 - knowing that stride==1 when using allocated arrays and escaping allocatable arrays
Summary: knowing that stride==1 when using allocated arrays and escaping allocatable a...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
: 33753 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-05-28 19:56 UTC by Thomas Koenig
Modified: 2009-09-13 12:54 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-08-08 17:29:48


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Koenig 2007-05-28 19:56:53 UTC
Look at this:

$ cat allocate-loop.f90
program main
  implicit none
  real, allocatable, dimension(:) :: a, b, c
  real, dimension(10) :: d, e, f
  real :: s
  allocate (a(10), b(10), c(10))
  call random_number(a)
  call random_number(b)
  c = a+b
  s = sum(c)
  print *,s
  call random_number(d)
  call random_number(e)
  f = d+e
  s = sum(f)
  print *,f
end program main
$ gfortran -march=athlon-xp -O3 -ftree-vectorize -ftree-vectorizer-verbose=4 allocate-loop.f90

allocate-loop.f90:15: note: not vectorized: unsupported use in stmt.
allocate-loop.f90:14: note: LOOP VECTORIZED.
allocate-loop.f90:10: note: not vectorized: unsupported use in stmt.
allocate-loop.f90:9: note: not vectorized: unhandled data-ref
allocate-loop.f90:1: note: vectorized 1 loops in function.

The loop at line 9 (c=a+b) could be vectorized if the stride was
known to be 1 (which isn't made known to the middle end in this case).
Comment 1 Andrew Pinski 2007-05-28 20:56:12 UTC
This is an aliasing issue.  The reason why we don't optimize this is because we think a/b and escape except in Fortran that is not the case.

I think there is another bug about this case somewhere too.
Comment 2 Janne Blomqvist 2007-05-29 04:30:13 UTC
PR31738 is another "missed vectorization in Fortran", though I don't think it's really that related to this one.
Comment 3 Andrew Pinski 2007-05-29 04:52:20 UTC
(In reply to comment #2)
> PR31738 is another "missed vectorization in Fortran", though I don't think it's
> really that related to this one.

The vectorizer issue just exposes the real issue and maybe why GCC is slow in some cases (fixing the aliasing issue here will speed up fortran code even without the vectorizer help).  The issue is a.stride is reloaded after the call to random_number as we think random_number can change the stride.  If we change the code into:
program main
  implicit none
  real, allocatable, dimension(:) :: a, b, c
  real, dimension(10) :: d, e, f
  real :: s
  integer :: i
  allocate (a(10), b(10), c(10))
  do i = 0, 10
    call random_number(a(i))
    call random_number(b(i))
  enddo
  c = a+b
  s = sum(c)
  print *,s
  call random_number(d)
  call random_number(e)
  f = d+e
  s = sum(f)
  print *,f
end program main

We get:
t.f90:18: note: not vectorized: unsupported use in stmt.
t.f90:17: note: LOOP VECTORIZED.
t.f90:13: note: not vectorized: unsupported use in stmt.
t.f90:12: note: Alignment of access forced using peeling.
t.f90:12: note: Vectorizing an unaligned access.
t.f90:12: note: Vectorizing an unaligned access.
t.f90:12: note: LOOP VECTORIZED.
t.f90:8: note: not vectorized: unhandled data-ref
t.f90:1: note: vectorized 2 loops in function.

Which is correct and shows the aliasing issue, we think we change a's stride
which we don't (and cannot in this case).
Comment 4 Tobias Burnus 2007-05-29 06:51:57 UTC
Hmm, rebuild and works for me.
Comment 5 tkoenig@alice-dsl.net 2007-05-29 17:47:29 UTC
Subject: Re:  knowing that stride==1 when using
	allocated arrays and escaping allocatable arrays

On Tue, 2007-05-29 at 04:52 +0000, pinskia at gcc dot gnu dot org wrote:

> we think we change a's stride
> which we don't (and cannot in this case).

I can see two ways to address this issue (both of them worth pursuing):

a) For allocatable arrays, we can always assume stride=1.

b) We can tell the middle-end that our random number generator doesn't
   modify the array descriptor (similar to PR 20165).  Once we've fixed
   PR 20165, this should be easy, but I don't see anybody working on it.

So, maybe looking at a) is better.

Personally, I like the allocatable array feature of Fortran 95 very
much.  It'd be a pity if this carried a big performance overhead.


Comment 6 Janne Blomqvist 2007-05-29 17:51:10 UTC
Reopening. This vectorizes only partly, with -ffast-math to boot. We should be able to vectorize it without doing any "unsafe" math.

gfortran -O2 -ffast-math -march=native -mfpmath=sse -ftree-vectorize -ftree-vectorizer-verbose=6 allocate-loop.f90

allocate-loop.f90:15: note: LOOP VECTORIZED.
allocate-loop.f90:14: note: LOOP VECTORIZED.
allocate-loop.f90:10: note: Vectorizing an unaligned access.
allocate-loop.f90:10: note: LOOP VECTORIZED.
allocate-loop.f90:9: note: not vectorized: data ref analysis failed D.1424_40 = (*D.1371_14)[D.1423_39]
allocate-loop.f90:1: note: vectorized 3 loops in function.

and without -ffast-math:

gfortran -O2 -march=native -mfpmath=sse -ftree-vectorize -ftree-vectorizer-verbose=6 allocate-loop.f90

allocate-loop.f90:15: note: not vectorized: unsupported use in stmt.
allocate-loop.f90:14: note: LOOP VECTORIZED.
allocate-loop.f90:10: note: not vectorized: unsupported use in stmt.
allocate-loop.f90:9: note: not vectorized: data ref analysis failed D.1424_40 = (*D.1371_14)[D.1423_39]
allocate-loop.f90:1: note: vectorized 1 loops in function.
Comment 7 Janne Blomqvist 2007-06-27 15:56:52 UTC
(In reply to comment #5)
> I can see two ways to address this issue (both of them worth pursuing):
> 
> a) For allocatable arrays, we can always assume stride=1.

But this helps only locally in the procedure where the array is declared. If you call another procedure with an explicit interface, that procedure cannot assume that stride==1. I wonder, would it make sense to generate code like

if (stride ==1) then
  some array operation, simplified for the case stride==1
else
  general case
end if

Then at least the stride==1 case could be vectorized, and presumably that is also the overwhelmingly common case. Of course it would imply some code bloat. Or is this something the middle-end could do for us?

Of course, with IPA this problem could be solved by looking at all the callers.. :)

> b) We can tell the middle-end that our random number generator doesn't
>    modify the array descriptor (similar to PR 20165).  Once we've fixed
>    PR 20165, this should be easy, but I don't see anybody working on it.

Another question, do we at the moment tell the middle-end anything about Fortran aliasing rules? E.g. that after the call to random_number (or any other procedure) the a->data is not reachable via some other variable? Or is this another manifestation of the pointer escaping thing from PR 20165? But I would assume some support exists for C99 restrict, which is similar?
Comment 8 Francois-Xavier Coudert 2007-08-09 23:06:59 UTC
How on earth did that PR get assigned to me?
Comment 9 Janne Blomqvist 2007-10-12 20:17:38 UTC
*** Bug 33753 has been marked as a duplicate of this bug. ***
Comment 10 Tobias Burnus 2009-07-02 14:44:05 UTC
Michael Matz fixed that for allocatable arrays, but the patch needs to be extended to nonallocatable arrays, cf.
http://gcc.gnu.org/ml/fortran/2009-07/msg00004.html
Comment 11 Michael Matz 2009-07-02 15:31:41 UTC
Subject: Bug 32131

Author: matz
Date: Thu Jul  2 15:31:28 2009
New Revision: 149178

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=149178
Log:
fortran/
        PR fortran/32131
        * trans-array.c (gfc_conv_descriptor_stride_get): Return
        constant one for strides in the first dimension of ALLOCATABLE
        arrays.

testsuite/
        PR fortran/32131
        * gfortran.dg/pr32921.f: Adjust.

Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/trans-array.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gfortran.dg/pr32921.f

Comment 12 Tobias Burnus 2009-07-03 11:33:13 UTC
> Michael Matz fixed that for allocatable arrays, but the patch needs to be
> extended to nonallocatable arrays, cf.
> http://gcc.gnu.org/ml/fortran/2009-07/msg00004.html

Actually, there it already works. Left is only to do the same optimization for CONTIGUOUS arrays, but this F2008 feature does not exist, yet. Thus I am closing this PR.

For the contiguous attribute, see PR 40632.

Comment 13 Thomas Koenig 2009-09-13 12:54:05 UTC
Actually closing.