Bug 56933 - [4.9 Regression] Vectorizer missing read-write dependency for interleaved accesses
Summary: [4.9 Regression] Vectorizer missing read-write dependency for interleaved acc...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: 4.9.0
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2013-04-12 12:40 UTC by Bill Schmidt
Modified: 2013-08-29 07:31 UTC (History)
4 users (show)

See Also:
Host: powerpc*-*-*
Target: powerpc*-*-*
Build: powerpc*-*-*
Known to work:
Known to fail:
Last reconfirmed: 2013-04-15 00:00:00


Attachments
Vectorization details dump for the test case (6.74 KB, text/plain)
2013-04-12 12:40 UTC, Bill Schmidt
Details
fixed test case (393 bytes, patch)
2013-08-28 15:09 UTC, Bernd Edlinger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Schmidt 2013-04-12 12:40:04 UTC
Created attachment 29861 [details]
Vectorization details dump for the test case

vect_analyze_group_access() in tree-vect-data-refs.c contains a test for load-store dependencies:

              if (GROUP_READ_WRITE_DEPENDENCE (vinfo_for_stmt (next))
                  || GROUP_READ_WRITE_DEPENDENCE (vinfo_for_stmt (prev)))

Currently this always returns false because this field has not yet been set in the vinfo.  This began with r196872, where the code to analyze accesses was moved ahead of the code to analyze dependences.

I put together a test demonstrating that it's possible for us to generate incorrect code as a result:

       subroutine test(a,b,c,d,e,f)
       integer k
       real*4, intent(out) :: a(1000)
       real*4, intent(out) :: b(1000)
       real*4, intent(in) :: c(1000)
       real*4, intent(inout) :: d(2000)
       real*4, intent(out) :: e(1000)
       real*4, intent(out) :: f(1000)

       do k = 1,1000
          a(k) = 3.0 * d(2*k)
          e(k) = 3.3 * d(2*k+1)
          d(2*k) = 2.0 * c(k)
          d(2*k+1) = 2.3 * c(k)
          b(k) = d(2*k) - 5.5;
          f(k) = d(2*k+1) + 5.5;
       enddo

       return
       end

I'm attaching a detailed dump of the vectorization pass that shows that the values of d(2*k) and d(2*k+1) used to compute b(k) and f(k) are the ones loaded prior to the stores to those locations.

To reproduce on powerpc64-unknown-linux-gnu:

$ gfortran -O3 -ffast-math -mcpu=power7 -fno-vect-cost-model interl-lsl-2.f
Comment 1 Richard Biener 2013-04-15 10:29:19 UTC
Mine.
Comment 2 Richard Biener 2013-04-15 11:18:43 UTC
/* { dg-do run } */

extern void abort (void);
void __attribute__((noinline,noclone))
foo (double *b, double *d, double *f)
{
  int i;
  for (i = 0; i < 1024; i++)
    {
      d[2*i] = 2. * d[2*i];
      d[2*i+1] = 4. * d[2*i+1];
      b[i] = d[2*i] - 1.;
      f[i] = d[2*i+1] + 2.;
    }
}
int main()
{
  double b[1024], d[2*1024], f[1024];
  int i;
  for (i = 0; i < 2*1024; i++)
    d[i] = 1.;
  foo (b, d, f);
  for (i = 0; i < 1024; i+= 2)
    {
      if (d[2*i] != 2.)
        abort ();
      if (d[2*i+1] != 4.)
        abort ();
    }
  for (i = 0; i < 1024; i++)
    {
      if (b[i] != 1.)
        abort ();
      if (f[i] != 6.)
        abort ();
    }
  return 0;
}

/* { dg-final { cleanup-tree-dump "vect" } } */
Comment 3 Richard Biener 2013-04-15 14:09:14 UTC
Author: rguenth
Date: Mon Apr 15 14:08:41 2013
New Revision: 197972

URL: http://gcc.gnu.org/viewcvs?rev=197972&root=gcc&view=rev
Log:
2013-04-15  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/56933
	* tree-vectorizer.h (struct _stmt_vec_info): Remove read_write_dep
	member.
	(GROUP_READ_WRITE_DEPENDENCE): Remove.
	(STMT_VINFO_GROUP_READ_WRITE_DEPENDENCE): Likewise.
	* tree-vect-data-refs.c (vect_analyze_group_access): Move
	dependence check ...
	vect_analyze_data_ref_dependence (vect_analyze_data_ref_dependence):
	... here.
	* tree-vect-stmts.c (new_stmt_vec_info): Do not initialize
	GROUP_READ_WRITE_DEPENDENCE.

	* gcc.dg/vect/pr56933.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr56933.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-data-refs.c
    trunk/gcc/tree-vect-stmts.c
    trunk/gcc/tree-vectorizer.h
Comment 4 Bernd Edlinger 2013-08-27 17:03:32 UTC
The test case gcc.dg/vect/pr56933.c fails on a pentium II,
because of invalid instruction.

A fairly obvious fix (which works for me) would be:

--- gcc/testsuite/gcc.dg/vect/pr56933.c.jj      2013-04-15 16:08:41.000000000 +0200
+++ gcc/testsuite/gcc.dg/vect/pr56933.c 2013-08-22 21:12:19.000000000 +0200
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target sse2_runtime } */

 extern void abort (void);
 void __attribute__((noinline,noclone))
Comment 5 Jakub Jelinek 2013-08-27 17:17:03 UTC
(In reply to Bernd Edlinger from comment #4)
> The test case gcc.dg/vect/pr56933.c fails on a pentium II,
> because of invalid instruction.
> 
> A fairly obvious fix (which works for me) would be:

There is nothing obvious on it (maybe obviously broken).
The vect.exp tests aren't i?86/x86_64 only, so you definitely can't add
unguarded requirements for specific CPUs, it would cause the test not to be run on ppcs/arm/etc.

The bug in this test is that it specifies { dg-do run }, either it shouldn't (then the default will be run if CPU supports the -msse2 flags in this case, or
compile otherwise), or it should use the tree-vect.h + check_vect () framework like many other tests.

> --- gcc/testsuite/gcc.dg/vect/pr56933.c.jj      2013-04-15
> 16:08:41.000000000 +0200
> +++ gcc/testsuite/gcc.dg/vect/pr56933.c 2013-08-22 21:12:19.000000000 +0200
> @@ -1,4 +1,5 @@
>  /* { dg-do run } */
> +/* { dg-require-effective-target sse2_runtime } */
> 
>  extern void abort (void);
>  void __attribute__((noinline,noclone))
Comment 6 Bernd Edlinger 2013-08-27 17:49:11 UTC
(In reply to Jakub Jelinek from comment #5)
> (In reply to Bernd Edlinger from comment #4)
> The test case
> gcc.dg/vect/pr56933.c fails on a pentium II,
> because of invalid
> instruction.
> 
> A fairly obvious fix (which works for me) would be:

There
> is nothing obvious on it (maybe obviously broken).
The vect.exp tests aren't
> i?86/x86_64 only, so you definitely can't add
unguarded requirements for
> specific CPUs, it would cause the test not to be run on ppcs/arm/etc.

The
> bug in this test is that it specifies { dg-do run }, either it shouldn't
> (then the default will be run if CPU supports the -msse2 flags in this case,
> or
compile otherwise), or it should use the tree-vect.h + check_vect ()
> framework like many other tests.

Oh I thought that it is only for x86...

The problem is that the option -msse2 is somehow used to compile this
and therfore the following statement is generated.

       movapd  .LC4, %xmm0

I do not see how this test case can test anything unless it scans
the assembler output for this instruction. Maybe there is something
special in the -fdump-tree-xxx that should be looked for?
Comment 7 Bernd Edlinger 2013-08-28 15:09:55 UTC
Created attachment 30712 [details]
fixed test case

Looking deeper into the matter it seems like this an example
where vectorisation is not supposed to happen.
But this is still vectorized and causes the crash here:

   for (i = 0; i < 2*1024; i++)
     d[i] = 1.;


So this is my proposed fix for the test case:
- added check_vect() to prevent invalid instruction as Jakub suggested
- addeded scan-tree-dump to detect the READ_WRITE depencency
  to make sure that the vectorization in foo() did not happen
  for the right reason.
Comment 8 Richard Biener 2013-08-29 07:31:48 UTC
(In reply to Bernd Edlinger from comment #7)
> Created attachment 30712 [details]
> fixed test case
> 
> Looking deeper into the matter it seems like this an example
> where vectorisation is not supposed to happen.
> But this is still vectorized and causes the crash here:
> 
>    for (i = 0; i < 2*1024; i++)
>      d[i] = 1.;

That loop is initialization only and can be vectorized.  I have committed
a fix (hopefully), so you may want to check again.

> 
> So this is my proposed fix for the test case:
> - added check_vect() to prevent invalid instruction as Jakub suggested
> - addeded scan-tree-dump to detect the READ_WRITE depencency
>   to make sure that the vectorization in foo() did not happen
>   for the right reason.