Bug 19049 - not vectorizing a fortran loop
Summary: not vectorizing a fortran loop
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: missed-optimization
Depends on:
Reported: 2004-12-17 04:02 UTC by Andrew Pinski
Modified: 2015-10-22 13:34 UTC (History)
2 users (show)

See Also:
Known to work: 6.0
Known to fail: 4.6.0
Last reconfirmed: 2013-02-01 00:00:00

patch which fixes the non vectorizor problem (750 bytes, patch)
2004-12-17 04:27 UTC, Andrew Pinski
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2004-12-17 04:02:55 UTC
From a benchmark which deals with vectorizing loops, I noticed that all of the loops were not being 
vectorized because we don't merge two BBs together.
      subroutine s111 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)
c     linear dependence testing
c     no dependence - vectorizable
      integer ntimes, ld, n, i, nl
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)
      real t1, t2, second, chksum, ctime, dtime, cs1d
      do 1 nl = 1,2*ntimes
      do 10 i = 2,n,2
         a(i) = a(i-1) + b(i)
  10  continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
  1   continue

Why we don't merge the loop with the label 10, I don't know.
I wonder if it is related to PR 19038.
Comment 1 Andrew Pinski 2004-12-17 04:05:43 UTC
If anyone wants the full benchmark I can attach it or sent it to them.
Comment 2 Andrew Pinski 2004-12-17 04:15:01 UTC
The problem is that tree_can_merge_blocks_p returns false as BB a has a user label, what should 
happen instead is just move the user label.
Comment 3 Andrew Pinski 2004-12-17 04:27:00 UTC
Created attachment 7768 [details]
patch which fixes the non vectorizor problem

This patch fixes the merging of the BB and the vectorizer can see the loop now.
Comment 4 Andrew Pinski 2005-04-02 00:16:58 UTC
pr19049.f:10: note: not vectorized: can't determine dependence between: (*a_38)[D.722_49] and 
pr19049.f:10: note: bad data dependence.
Comment 5 Ira Rosen 2005-04-25 09:58:44 UTC
The vectorizer fails to determine dependence between: (*a_38)[D.719_49] and 
(*a_38)[D.718_51], since it fails to determine that both of the data-refs have 
the same base, *a_38. This is already fixed in autovect branch, and I am 
working on a patch to bring the changes in data-refs analysis to mainline.
Comment 6 Ira Rosen 2005-07-26 07:07:20 UTC
The data dependence issue was solved by this patch http://gcc.gnu.org/ml/gcc-
patches/2005-07/msg01195.html (committed). However, this loop is still not 
vectorizable because of noncontinuous access.
Comment 7 Ira Rosen 2006-09-19 07:29:09 UTC
Even though vectorization of strided accesses is already implemented in the autovect branch (and will be committed to the mainline 4.3), this case contains
a store with a gap (store to a[i] without a store to a[i-1]), and such stores are not supported (the current implementation supports only loads with gaps).

Note, however, that adding a store to a[i-1] will create a data dependence in the loop.


Comment 8 Thomas Koenig 2010-11-09 20:07:30 UTC
Still working on this?

$ gfortran -S -O3 -ftree-vectorizer-verbose=8 vect.f

vect.f:9: note: not vectorized: inner-loop count not invariant.
vect.f:10: note: Detected single element interleaving *a_107(D)[D.1623_106] step 8
vect.f:10: note: Detected single element interleaving *b_111(D)[D.1620_110] step 8
vect.f:10: note: not vectorized: complicated access pattern.
vect.f:1: note: vectorized 0 loops in function.
Comment 9 Ira Rosen 2010-11-10 06:59:44 UTC
This is still not implemented. And at the moment I am not planning to do that.

Comment 10 Andrew Pinski 2014-12-01 05:33:30 UTC
We now get (at least on aarch64):
t.f90:11:0: note: === vect_pattern_recog ===
t.f90:11:0: note: === vect_analyze_data_ref_accesses ===
t.f90:11:0: note: Detected single element interleaving *a_23(D)[_22] step 8
t.f90:11:0: note: Data access with gaps requires scalar epilogue loop
t.f90:11:0: note: not consecutive access *a_23(D)[_25] = _28;

t.f90:11:0: note: not vectorized: complicated access pattern.
t.f90:11:0: note: bad data access.
Comment 11 Richard Biener 2015-10-22 13:33:49 UTC
Author: rguenth
Date: Thu Oct 22 13:33:17 2015
New Revision: 229172

URL: https://gcc.gnu.org/viewcvs?rev=229172&root=gcc&view=rev
2015-10-22  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/19049
	PR tree-optimization/65962
	* tree-vect-data-refs.c (vect_analyze_group_access_1): Fall back
	to strided accesses if single-element interleaving doesn't work.

	* gcc.dg/vect/vect-strided-store-pr65962.c: New testcase.
	* gcc.dg/vect/vect-63.c: Adjust.
	* gcc.dg/vect/vect-70.c: Likewise.
	* gcc.dg/vect/vect-strided-u8-i2-gap.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Likewise.
	* gfortran.dg/vect/pr19049.f90: Likewise.
	* gfortran.dg/vect/vect-8.f90: Likewise.

Comment 12 Richard Biener 2015-10-22 13:34:00 UTC