Created attachment 29861 [details] Vectorization details dump for the test case vect_analyze_group_access() in tree-vect-data-refs.c contains a test for load-store dependencies: if (GROUP_READ_WRITE_DEPENDENCE (vinfo_for_stmt (next)) || GROUP_READ_WRITE_DEPENDENCE (vinfo_for_stmt (prev))) Currently this always returns false because this field has not yet been set in the vinfo. This began with r196872, where the code to analyze accesses was moved ahead of the code to analyze dependences. I put together a test demonstrating that it's possible for us to generate incorrect code as a result: subroutine test(a,b,c,d,e,f) integer k real*4, intent(out) :: a(1000) real*4, intent(out) :: b(1000) real*4, intent(in) :: c(1000) real*4, intent(inout) :: d(2000) real*4, intent(out) :: e(1000) real*4, intent(out) :: f(1000) do k = 1,1000 a(k) = 3.0 * d(2*k) e(k) = 3.3 * d(2*k+1) d(2*k) = 2.0 * c(k) d(2*k+1) = 2.3 * c(k) b(k) = d(2*k) - 5.5; f(k) = d(2*k+1) + 5.5; enddo return end I'm attaching a detailed dump of the vectorization pass that shows that the values of d(2*k) and d(2*k+1) used to compute b(k) and f(k) are the ones loaded prior to the stores to those locations. To reproduce on powerpc64-unknown-linux-gnu: $ gfortran -O3 -ffast-math -mcpu=power7 -fno-vect-cost-model interl-lsl-2.f
Mine.
/* { dg-do run } */ extern void abort (void); void __attribute__((noinline,noclone)) foo (double *b, double *d, double *f) { int i; for (i = 0; i < 1024; i++) { d[2*i] = 2. * d[2*i]; d[2*i+1] = 4. * d[2*i+1]; b[i] = d[2*i] - 1.; f[i] = d[2*i+1] + 2.; } } int main() { double b[1024], d[2*1024], f[1024]; int i; for (i = 0; i < 2*1024; i++) d[i] = 1.; foo (b, d, f); for (i = 0; i < 1024; i+= 2) { if (d[2*i] != 2.) abort (); if (d[2*i+1] != 4.) abort (); } for (i = 0; i < 1024; i++) { if (b[i] != 1.) abort (); if (f[i] != 6.) abort (); } return 0; } /* { dg-final { cleanup-tree-dump "vect" } } */
Author: rguenth Date: Mon Apr 15 14:08:41 2013 New Revision: 197972 URL: http://gcc.gnu.org/viewcvs?rev=197972&root=gcc&view=rev Log: 2013-04-15 Richard Biener <rguenther@suse.de> PR tree-optimization/56933 * tree-vectorizer.h (struct _stmt_vec_info): Remove read_write_dep member. (GROUP_READ_WRITE_DEPENDENCE): Remove. (STMT_VINFO_GROUP_READ_WRITE_DEPENDENCE): Likewise. * tree-vect-data-refs.c (vect_analyze_group_access): Move dependence check ... vect_analyze_data_ref_dependence (vect_analyze_data_ref_dependence): ... here. * tree-vect-stmts.c (new_stmt_vec_info): Do not initialize GROUP_READ_WRITE_DEPENDENCE. * gcc.dg/vect/pr56933.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/vect/pr56933.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-data-refs.c trunk/gcc/tree-vect-stmts.c trunk/gcc/tree-vectorizer.h
The test case gcc.dg/vect/pr56933.c fails on a pentium II, because of invalid instruction. A fairly obvious fix (which works for me) would be: --- gcc/testsuite/gcc.dg/vect/pr56933.c.jj 2013-04-15 16:08:41.000000000 +0200 +++ gcc/testsuite/gcc.dg/vect/pr56933.c 2013-08-22 21:12:19.000000000 +0200 @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-require-effective-target sse2_runtime } */ extern void abort (void); void __attribute__((noinline,noclone))
(In reply to Bernd Edlinger from comment #4) > The test case gcc.dg/vect/pr56933.c fails on a pentium II, > because of invalid instruction. > > A fairly obvious fix (which works for me) would be: There is nothing obvious on it (maybe obviously broken). The vect.exp tests aren't i?86/x86_64 only, so you definitely can't add unguarded requirements for specific CPUs, it would cause the test not to be run on ppcs/arm/etc. The bug in this test is that it specifies { dg-do run }, either it shouldn't (then the default will be run if CPU supports the -msse2 flags in this case, or compile otherwise), or it should use the tree-vect.h + check_vect () framework like many other tests. > --- gcc/testsuite/gcc.dg/vect/pr56933.c.jj 2013-04-15 > 16:08:41.000000000 +0200 > +++ gcc/testsuite/gcc.dg/vect/pr56933.c 2013-08-22 21:12:19.000000000 +0200 > @@ -1,4 +1,5 @@ > /* { dg-do run } */ > +/* { dg-require-effective-target sse2_runtime } */ > > extern void abort (void); > void __attribute__((noinline,noclone))
(In reply to Jakub Jelinek from comment #5) > (In reply to Bernd Edlinger from comment #4) > The test case > gcc.dg/vect/pr56933.c fails on a pentium II, > because of invalid > instruction. > > A fairly obvious fix (which works for me) would be: There > is nothing obvious on it (maybe obviously broken). The vect.exp tests aren't > i?86/x86_64 only, so you definitely can't add unguarded requirements for > specific CPUs, it would cause the test not to be run on ppcs/arm/etc. The > bug in this test is that it specifies { dg-do run }, either it shouldn't > (then the default will be run if CPU supports the -msse2 flags in this case, > or compile otherwise), or it should use the tree-vect.h + check_vect () > framework like many other tests. Oh I thought that it is only for x86... The problem is that the option -msse2 is somehow used to compile this and therfore the following statement is generated. movapd .LC4, %xmm0 I do not see how this test case can test anything unless it scans the assembler output for this instruction. Maybe there is something special in the -fdump-tree-xxx that should be looked for?
Created attachment 30712 [details] fixed test case Looking deeper into the matter it seems like this an example where vectorisation is not supposed to happen. But this is still vectorized and causes the crash here: for (i = 0; i < 2*1024; i++) d[i] = 1.; So this is my proposed fix for the test case: - added check_vect() to prevent invalid instruction as Jakub suggested - addeded scan-tree-dump to detect the READ_WRITE depencency to make sure that the vectorization in foo() did not happen for the right reason.
(In reply to Bernd Edlinger from comment #7) > Created attachment 30712 [details] > fixed test case > > Looking deeper into the matter it seems like this an example > where vectorisation is not supposed to happen. > But this is still vectorized and causes the crash here: > > for (i = 0; i < 2*1024; i++) > d[i] = 1.; That loop is initialization only and can be vectorized. I have committed a fix (hopefully), so you may want to check again. > > So this is my proposed fix for the test case: > - added check_vect() to prevent invalid instruction as Jakub suggested > - addeded scan-tree-dump to detect the READ_WRITE depencency > to make sure that the vectorization in foo() did not happen > for the right reason.