From a benchmark which deals with vectorizing loops, I noticed that all of the loops were not being
vectorized because we don't merge two BBs together.
subroutine s111 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)
c linear dependence testing
c no dependence - vectorizable
integer ntimes, ld, n, i, nl
real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)
real t1, t2, second, chksum, ctime, dtime, cs1d
do 1 nl = 1,2*ntimes
do 10 i = 2,n,2
a(i) = a(i-1) + b(i)
Why we don't merge the loop with the label 10, I don't know.
I wonder if it is related to PR 19038.
If anyone wants the full benchmark I can attach it or sent it to them.
The problem is that tree_can_merge_blocks_p returns false as BB a has a user label, what should
happen instead is just move the user label.
Created attachment 7768 [details]
patch which fixes the non vectorizor problem
This patch fixes the merging of the BB and the vectorizer can see the loop now.
pr19049.f:10: note: not vectorized: can't determine dependence between: (*a_38)[D.722_49] and
pr19049.f:10: note: bad data dependence.
The vectorizer fails to determine dependence between: (*a_38)[D.719_49] and
(*a_38)[D.718_51], since it fails to determine that both of the data-refs have
the same base, *a_38. This is already fixed in autovect branch, and I am
working on a patch to bring the changes in data-refs analysis to mainline.
The data dependence issue was solved by this patch http://gcc.gnu.org/ml/gcc-
patches/2005-07/msg01195.html (committed). However, this loop is still not
vectorizable because of noncontinuous access.
Even though vectorization of strided accesses is already implemented in the autovect branch (and will be committed to the mainline 4.3), this case contains
a store with a gap (store to a[i] without a store to a[i-1]), and such stores are not supported (the current implementation supports only loads with gaps).
Note, however, that adding a store to a[i-1] will create a data dependence in the loop.
Still working on this?
$ gfortran -S -O3 -ftree-vectorizer-verbose=8 vect.f
vect.f:9: note: not vectorized: inner-loop count not invariant.
vect.f:10: note: Detected single element interleaving *a_107(D)[D.1623_106] step 8
vect.f:10: note: Detected single element interleaving *b_111(D)[D.1620_110] step 8
vect.f:10: note: not vectorized: complicated access pattern.
vect.f:1: note: vectorized 0 loops in function.
This is still not implemented. And at the moment I am not planning to do that.
We now get (at least on aarch64):
t.f90:11:0: note: === vect_pattern_recog ===
t.f90:11:0: note: === vect_analyze_data_ref_accesses ===
t.f90:11:0: note: Detected single element interleaving *a_23(D)[_22] step 8
t.f90:11:0: note: Data access with gaps requires scalar epilogue loop
t.f90:11:0: note: not consecutive access *a_23(D)[_25] = _28;
t.f90:11:0: note: not vectorized: complicated access pattern.
t.f90:11:0: note: bad data access.
Date: Thu Oct 22 13:33:17 2015
New Revision: 229172
2015-10-22 Richard Biener <firstname.lastname@example.org>
* tree-vect-data-refs.c (vect_analyze_group_access_1): Fall back
to strided accesses if single-element interleaving doesn't work.
* gcc.dg/vect/vect-strided-store-pr65962.c: New testcase.
* gcc.dg/vect/vect-63.c: Adjust.
* gcc.dg/vect/vect-70.c: Likewise.
* gcc.dg/vect/vect-strided-u8-i2-gap.c: Likewise.
* gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Likewise.
* gfortran.dg/vect/pr19049.f90: Likewise.
* gfortran.dg/vect/vect-8.f90: Likewise.