From a benchmark which deals with vectorizing loops, I noticed that all of the loops were not being vectorized because we don't merge two BBs together. subroutine s111 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc) c c linear dependence testing c no dependence - vectorizable c integer ntimes, ld, n, i, nl real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n) real t1, t2, second, chksum, ctime, dtime, cs1d do 1 nl = 1,2*ntimes do 10 i = 2,n,2 a(i) = a(i-1) + b(i) 10 continue call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.) 1 continue return end Why we don't merge the loop with the label 10, I don't know. I wonder if it is related to PR 19038.
If anyone wants the full benchmark I can attach it or sent it to them.
The problem is that tree_can_merge_blocks_p returns false as BB a has a user label, what should happen instead is just move the user label.
Created attachment 7768 [details] patch which fixes the non vectorizor problem This patch fixes the merging of the BB and the vectorizer can see the loop now.
pr19049.f:10: note: not vectorized: can't determine dependence between: (*a_38)[D.722_49] and (*a_38)[D.721_51] pr19049.f:10: note: bad data dependence.
The vectorizer fails to determine dependence between: (*a_38)[D.719_49] and (*a_38)[D.718_51], since it fails to determine that both of the data-refs have the same base, *a_38. This is already fixed in autovect branch, and I am working on a patch to bring the changes in data-refs analysis to mainline.
The data dependence issue was solved by this patch http://gcc.gnu.org/ml/gcc- patches/2005-07/msg01195.html (committed). However, this loop is still not vectorizable because of noncontinuous access.
Even though vectorization of strided accesses is already implemented in the autovect branch (and will be committed to the mainline 4.3), this case contains a store with a gap (store to a[i] without a store to a[i-1]), and such stores are not supported (the current implementation supports only loads with gaps). Note, however, that adding a store to a[i-1] will create a data dependence in the loop. Ira
Still working on this? $ gfortran -S -O3 -ftree-vectorizer-verbose=8 vect.f vect.f:9: note: not vectorized: inner-loop count not invariant. vect.f:10: note: Detected single element interleaving *a_107(D)[D.1623_106] step 8 vect.f:10: note: Detected single element interleaving *b_111(D)[D.1620_110] step 8 vect.f:10: note: not vectorized: complicated access pattern. vect.f:1: note: vectorized 0 loops in function.
This is still not implemented. And at the moment I am not planning to do that. Ira
We now get (at least on aarch64): t.f90:11:0: note: === vect_pattern_recog === t.f90:11:0: note: === vect_analyze_data_ref_accesses === t.f90:11:0: note: Detected single element interleaving *a_23(D)[_22] step 8 t.f90:11:0: note: Data access with gaps requires scalar epilogue loop t.f90:11:0: note: not consecutive access *a_23(D)[_25] = _28; t.f90:11:0: note: not vectorized: complicated access pattern. t.f90:11:0: note: bad data access.
Author: rguenth Date: Thu Oct 22 13:33:17 2015 New Revision: 229172 URL: https://gcc.gnu.org/viewcvs?rev=229172&root=gcc&view=rev Log: 2015-10-22 Richard Biener <rguenther@suse.de> PR tree-optimization/19049 PR tree-optimization/65962 * tree-vect-data-refs.c (vect_analyze_group_access_1): Fall back to strided accesses if single-element interleaving doesn't work. * gcc.dg/vect/vect-strided-store-pr65962.c: New testcase. * gcc.dg/vect/vect-63.c: Adjust. * gcc.dg/vect/vect-70.c: Likewise. * gcc.dg/vect/vect-strided-u8-i2-gap.c: Likewise. * gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Likewise. * gfortran.dg/vect/pr19049.f90: Likewise. * gfortran.dg/vect/vect-8.f90: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/vect-63.c trunk/gcc/testsuite/gcc.dg/vect/vect-70.c trunk/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c trunk/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c trunk/gcc/testsuite/gfortran.dg/vect/pr19049.f90 trunk/gcc/testsuite/gfortran.dg/vect/vect-8.f90 trunk/gcc/tree-vect-data-refs.c
Fixed.