Bug 25413 - wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE
Summary: wrong alignment or incorrect address computation in vectorized code on Pentiu...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.0.3
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: monitored, wrong-code
: 19716 33958 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-12-14 15:24 UTC by David Monniaux
Modified: 2007-12-28 01:01 UTC (History)
8 users (show)

See Also:
Host: i486-linux-gnu
Target: i486-linux-gnu
Build: i486-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2006-01-30 22:24:29


Attachments
preprocessed C source code (assumes Linux glibc) exhibiting the code generation bug (695 bytes, text/plain)
2005-12-14 15:26 UTC, David Monniaux
Details
zlib testcase (576 bytes, text/plain)
2007-07-24 19:37 UTC, Ryan Hill
Details
gcc-PR25413-gdb.log (7.34 KB, text/plain)
2007-07-24 19:40 UTC, Ryan Hill
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Monniaux 2005-12-14 15:24:30 UTC
(The same phenomenon happens in 4.2 subversion alpha.)

Incorrect code is generated when using -ftree-vectorize on the SSE2 Pentium 4 target. (The same code works on the AMD64 64-bit target.)

See attached source code.

$ gcc -O2 -Wall -march=pentium4 -mfpmath=sse -ftree-vectorize -ftree-vectorizer-verbose=1 gcc_plantage2.c -o ./gcc_plantage2
$ ./gcc_plantage2
will segfault (the same program works perfectly without -ftree-vectorize).

The segfault appears at the second movapd instruction:

	movapd	.LC0, %xmm0
.L17:
	movapd	%xmm0, (%eax)
	addl	$1, %edx
	addl	$16, %eax
	cmpl	%edx, %ebx
	ja	.L17

%eax is at this point equal to 4 modulo 16, which results in a segmentation fault, since movapd assumes 16-byte alignment.
Comment 1 David Monniaux 2005-12-14 15:26:35 UTC
Created attachment 10486 [details]
preprocessed C source code (assumes Linux glibc) exhibiting the code generation bug

Compile this program with -O2 -march=pentium4 -ftree-vectorize on a Pentium 4 and run it. It will segfault. It works perfectly without vectorization.
Comment 2 Dorit Naishlos 2005-12-15 12:41:53 UTC
The problem is that the vectorizer applies loop-peeling in order to align the data reference *(m->c+i), and peeling only works correctly if the data is naturally aligned (aligned on it's type size). This is what the vectorizer currently blindly assumes, but on the Pentium4 doubles are not necessarily 64bit aligned.
 
Accidentally Devang and I discussed this issue last week, and Devang actually committed a patch to apple-ppc branch that works around the problem ( http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108214). Devang's patch however will not fix this PR - the patch he committed disables vectorization if the vectorizer was able to compute the misalignment, and discovered that it doesn't evenly divide by the type size. In this testcase the misalignment is unknown at compile time. 

To fix this problem we need to disable loop-peeling in the vectorizer if we can't prove that the data is naturally aligned. Alternatively, if we can't prove either way we can peel the loop but control the number of iterations it will execute using a runtime test (i.e. have the prolog loop iterate the entire loop-count if at runtime we discover that the data is not naturally aligned). 
Comment 3 Dorit Naishlos 2005-12-15 12:50:21 UTC
related discussion: http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html
Comment 4 Andrew Pinski 2006-01-30 22:24:29 UTC
Confirmed.
Comment 5 Volker Reichelt 2007-04-02 21:52:33 UTC
Any news on this one?
The bug makes tree vectorization on pentium 4 totally useless. :-(

Btw, here's a smaller code snippet for testing. Just compile it with
  gcc -O -msse2 -ftree-vectorize
on a pentium 4 and see the resulting executable segfault:

================================
struct
{
  char c;
  double d[2];
} a;

int main()
{
  int i;
  for ( i=0; i<2; ++i )
    a.d[i]=0;
  return 0;
}
================================
Comment 6 Dorit Naishlos 2007-04-03 20:22:02 UTC
So I see Devang had sent a patch for this over a year ago:
http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00167.html
I don't know what ever happened to it.
Maybe you want to give it a try? (you may need to implement the new target hook for Pentium4). If you have problems applying the patch (it is a bit old) - I could try to help update the patch (not before next week though).
Comment 7 dorit 2007-07-01 09:59:58 UTC
I'm testing the following patch (seems to fix the two testcases in this PR on Pentium4. still need to bootstrap etc, and check the powerpc bits)

Index: gcc/targhooks.c
===================================================================
*** gcc/targhooks.c	(revision 126162)
--- gcc/targhooks.c	(working copy)
*************** tree default_mangle_decl_assembler_name
*** 634,637 ****
--- 634,653 ----
     return id;
  }
  
+ bool
+ default_builtin_vector_alignment_reachable (tree type, bool is_packed)
+ {
+   if (is_packed)
+     return false;
+ 
+   /* Assuming that types whose size is > pointer-size are not guaranteed to be
+      naturally aligned.  */
+   if (tree_int_cst_compare (TYPE_SIZE (type), bitsize_int (POINTER_SIZE)) > 0)
+     return false;
+ 
+   /* Assuming that types whose size is <= pointer-size
+      are naturally aligned.  */
+   return true;
+ }
+ 
  #include "gt-targhooks.h"
Index: gcc/targhooks.h
===================================================================
*** gcc/targhooks.h	(revision 126162)
--- gcc/targhooks.h	(working copy)
*************** extern tree default_builtin_vectorized_c
*** 62,67 ****
--- 62,69 ----
  
  extern tree default_builtin_reciprocal (enum built_in_function, bool, bool);
  
+ extern bool default_builtin_vector_alignment_reachable (tree, bool);
+
  /* These are here, and not in hooks.[ch], because not all users of
     hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */

Index: gcc/tree.h
===================================================================
*** gcc/tree.h	(revision 126162)
--- gcc/tree.h	(working copy)
*************** extern tree get_inner_reference (tree, H
*** 4327,4332 ****
--- 4327,4338 ----
  				 tree *, enum machine_mode *, int *, int *,
  				 bool);
  
+ /* Given an expression EXP that may be a COMPONENT_REF or an ARRAY_REF,
+    look for whether EXP or any nested component-refs within EXP is marked
+    as PACKED.  */
+ 
+ extern bool contains_packed_reference (tree exp);
+
  /* Return 1 if T is an expression that get_inner_reference handles.  */
  
  extern int handled_component_p (tree);
Index: gcc/target.h
===================================================================
*** gcc/target.h	(revision 126162)
--- gcc/target.h	(working copy)
*************** struct gcc_target
*** 413,418 ****
--- 413,422 ----
         element-by-element products for the odd elements.  */
      tree (* builtin_mul_widen_even) (tree);
      tree (* builtin_mul_widen_odd) (tree);
+ 
+     /* Return true if vector alignment is reachable (by peeling N
+        interations) for the given type.  */
+     bool (* vector_alignment_reachable) (tree, bool);
    } vectorize;
  
    /* The initial value of target_flags.  */
Index: gcc/testsuite/gcc.dg/vect/vect-align-1.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-align-1.c	(revision 0)
--- gcc/testsuite/gcc.dg/vect/vect-align-1.c	(revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+ 
+ #include <stdlib.h>
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ /* Compile time known misalignment. Cannot use loop peeling to align
+    the store.  */
+ 
+ #define N 16
+
+ struct foo {
+   char x;
+   int y[N];
+ } __attribute__((packed));
+
+ int
+ main1 (struct foo * __restrict__ p)
+ {
+   int i;
+   int x[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+ 
+   for (i = 0; i < N; i++)
+     {
+       p->y[i] = x[i];
+     }
+ 
+   /* check results:  */
+   for (i = 0; i < N; i++)
+     {
+       if (p->y[i] != x[i])
+ 	abort ();
+     }
+   return 0;
+ }
+
+ 
+ int main (void)
+ {
+   int i;
+   struct foo *p = malloc (2*sizeof (struct foo));
+   check_vect ();
+   
+   main1 (p);
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr25413a.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/pr25413a.c	(revision 0)
--- gcc/testsuite/gcc.dg/vect/pr25413a.c	(revision 0)
***************
*** 0 ****
--- 1,129 ----
+ /* { dg-require-effective-target vect_double } */
+ 
+ #include <stdarg.h>
+ #include "tree-vect.h"
+ 
+ #define N 8
+
+ typedef unsigned int size_t;
+ 
+ extern void *malloc (size_t __size) __attribute__ ((__nothrow__)) __attribute__ ((__malloc__));
+ 
+ typedef double num_t;
+ static const num_t num__infty = ((num_t)1.0)/((num_t)0.0);
+ 
+ struct oct_tt;
+ typedef struct oct_tt oct_t;
+ 
+ typedef unsigned int var_t;
+ typedef enum {
+   OCT_EMPTY = 0,
+   OCT_NORMAL = 1,
+   OCT_CLOSED = 2
+ } oct_state;
+ 
+ struct oct_tt {
+   var_t n;
+ 
+   int ref;
+ 
+   oct_state state;
+   struct oct_tt* closed;
+ 
+   num_t* c;
+ };
+ 
+ void* octfapg_mm_malloc (size_t t);
+ oct_t* octfapg_alloc (var_t n);
+ oct_t* octfapg_full_copy (oct_t* m);
+ 
+ struct mmalloc_tt;
+ typedef struct mmalloc_tt mmalloc_t;
+ 
+ struct mmalloc_tt
+ {
+   int id;
+ 
+   int nb_alloc;
+   int nb_realloc;
+   int nb_free;
+ 
+   size_t rem;
+   size_t max;
+   size_t tot;
+ 
+ };
+ 
+ typedef struct
+ {
+   size_t size;
+ 
+   mmalloc_t* mm;
+   int id;
+ 
+   double dummy;
+ 
+ } mmheader_t;
+
+ void*
+ octfapg_mm_malloc (size_t t)
+ {
+   char* m = (char*)malloc(t+sizeof(mmheader_t));
+   return m+sizeof(mmheader_t);
+ }
+ 
+ oct_t* octfapg_empty (var_t n);
+ 
+ oct_t*
+ octfapg_empty (const var_t n)
+ {
+   oct_t* m;
+   /*octfapg_timing_enter("oct_empty",3);*/
+   m = ((oct_t*) octfapg_mm_malloc (sizeof(oct_t)));
+   m->n = n;
+   m->ref = 1;
+   m->state = OCT_EMPTY;
+   m->closed = (oct_t*)((void *)0);
+   m->c = (num_t*)((void *)0);
+   /*octfapg_timing_exit("oct_empty",3);*/
+   return m;
+ }
+
+ oct_t*
+ octfapg_alloc (const var_t n)
+ {
+   size_t nn = (2*(size_t)(n)*((size_t)(n)+1));
+   oct_t* m;
+   m = octfapg_empty(n);
+   m->c = ((num_t*) octfapg_mm_malloc (sizeof(num_t)*(nn)));
+   ;
+   m->state = OCT_NORMAL;
+   m->closed = (oct_t*)((void *)0);
+   return m;
+ }
+ 
+ oct_t*
+ octfapg_universe (const var_t n)
+ {
+   oct_t* m;
+   size_t i, nn = (2*(size_t)(n)*((size_t)(n)+1));
+   m = octfapg_alloc(n);
+   for (i=0;i<nn;i++) *(m->c+i) = num__infty;
+   for (i=0;i<2*n;i++) *(m->c+((size_t)(i)+(((size_t)(i)+1)*((size_t)(i)+1))/2)) = (num_t)(0);
+   m->state = OCT_CLOSED;
+   return m;
+ }
+ 
+ int main (void)
+ { 
+   int i;
+   check_vect ();
+
+   oct_t *p = octfapg_universe(10);
+   return 0;
+ } 
+ 
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-align-2.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-align-2.c	(revision 0)
--- gcc/testsuite/gcc.dg/vect/vect-align-2.c	(revision 0)
***************
*** 0 ****
--- 1,46 ----
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-do run } */
+
+ #include <stdlib.h>
+ #include <stdarg.h>
+ #include "tree-vect.h"
+ 
+ /* Compile time unknown misalignment. Cannot use loop peeling to align
+    the store.  */
+ 
+ #define N 17
+ 
+ struct foo {
+   char x0;
+   int y[N][N];
+ } __attribute__ ((packed));
+ 
+ struct foo f2;
+ int z[16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+ 
+ void fbar(struct foo *fp)
+ {
+   int i,j;
+    for (i=0; i<N; i++)
+       for (j=0; j<N; j++)
+         f2.y[i][j] = z[i];
+
+    for (i=0; i<N; i++)
+       for (j=0; j<N; j++)
+ 	if (f2.y[i][j] != z[i])
+ 	  abort ();
+ }
+
+ int main (void)
+ {
+   struct foo  *fp = (struct foo *) malloc (2*sizeof (struct foo));
+ 
+   fbar(fp);
+   return 0;
+ }
+ 
+ 
+ /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr31699.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/pr31699.c	(revision 126162)
--- gcc/testsuite/gcc.dg/vect/pr31699.c	(working copy)
*************** int main()
*** 31,35 ****
  }
  
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_intfloat_cvt } } } */
! /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_no_align } } } */
  /* { dg-final { cleanup-tree-dump "vect" } } */
--- 31,36 ----
  }
  
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_intfloat_cvt } } } */
! /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 "vect" } } */
! /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" } } */
  /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr25413.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/pr25413.c	(revision 0)
--- gcc/testsuite/gcc.dg/vect/pr25413.c	(revision 0)
***************
*** 0 ****
--- 1,37 ----
+ /* { dg-require-effective-target vect_double } */
+ 
+ #include <stdarg.h>
+ #include "tree-vect.h"
+ 
+ #define N 8
+ 
+ struct
+ {
+   char c;
+   double d[N];
+ } a;
+ 
+ int main1()
+ {
+   int i;
+   for ( i=0; i<N; ++i )
+     a.d[i]=1;
+   return 0;
+ }
+ 
+ int main (void)
+ { 
+   int i;
+   check_vect ();
+   
+   main1 ();
+   for (i=0; i<N; i++)
+     if (a.d[i] != 1)
+       abort ();
+   return 0;
+ } 
+ 
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "not vectorized: unsupported unaligned store" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/expr.c
===================================================================
*** gcc/expr.c	(revision 126162)
--- gcc/expr.c	(working copy)
*************** get_inner_reference (tree exp, HOST_WIDE
*** 5924,5929 ****
--- 5924,5966 ----
    return exp;
  }
  
+ bool
+ contains_packed_reference (tree exp)
+ {
+   bool packed_p = false;
+
+   while (1)
+     {
+       switch (TREE_CODE (exp))
+ 	{
+ 	case COMPONENT_REF:
+ 	  {
+ 	    tree field = TREE_OPERAND (exp, 1);
+ 	    packed_p = DECL_PACKED (field) 
+ 		       || TYPE_PACKED (TREE_TYPE (field)) /* CHECKME */
+ 		       || TYPE_PACKED (TREE_TYPE (exp));
+ 	    if (packed_p)
+ 	      goto done;
+ 	  }
+ 	  break;
+ 
+ 	case BIT_FIELD_REF:
+ 	case ARRAY_REF:
+ 	case ARRAY_RANGE_REF:
+ 	case REALPART_EXPR:
+ 	case IMAGPART_EXPR:
+ 	case VIEW_CONVERT_EXPR:
+ 	  break;
+ 
+ 	default:
+ 	  goto done;
+ 	}
+       exp = TREE_OPERAND (exp, 0);
+     }
+  done:
+   return packed_p;
+ }
+ 
  /* Return a tree of sizetype representing the size, in bytes, of the element
     of EXP, an ARRAY_REF.  */
  
Index: gcc/tree-vect-analyze.c
===================================================================
*** gcc/tree-vect-analyze.c	(revision 126162)
--- gcc/tree-vect-analyze.c	(working copy)
*************** Software Foundation, 51 Franklin Street,
*** 25,30 ****
--- 25,31 ----
  #include "tm.h"
  #include "ggc.h"
  #include "tree.h"
+ #include "target.h"
  #include "basic-block.h"
  #include "diagnostic.h"
  #include "tree-flow.h"
*************** vect_verify_datarefs_alignment (loop_vec
*** 1379,1384 ****
--- 1380,1449 ----
  }

  
+ static bool
+ vector_alignment_reachable_p (struct data_reference *dr)
+ {
+   tree stmt = DR_STMT (dr);
+   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+ 
+   if (DR_GROUP_FIRST_DR (stmt_info))
+     {
+       /* For interleaved access we peel only if number of iterations in
+ 	 the prolog loop ({VF - misalignment}), is a multiple of the
+ 	 number of the interleaved accesses.  */
+       int elem_size, mis_in_elements;
+       int nelements = TYPE_VECTOR_SUBPARTS (vectype);
+ 
+       /* FORNOW: handle only known alignment.  */
+       if (!known_alignment_for_access_p (dr))
+ 	return false;
+ 
+       elem_size = UNITS_PER_SIMD_WORD / nelements;
+       mis_in_elements = DR_MISALIGNMENT (dr) / elem_size;
+ 
+       if ((nelements - mis_in_elements) % DR_GROUP_SIZE (stmt_info))
+ 	return false;
+     }
+ 
+   /* If misalignment is known at the compiler time then apply peeling
+      only if natural alignment is reachable through peeling.  */
+   if (known_alignment_for_access_p (dr) && !aligned_access_p (dr))
+     {
+       HOST_WIDE_INT elmsize = 
+ 		int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
+       if (DR_MISALIGNMENT (dr) % elmsize)
+ 	{
+ 	  if (vect_print_dump_info (REPORT_DETAILS))
+ 	    {
+ 	      fprintf (vect_dump, "data size =" HOST_WIDE_INT_PRINT_DEC, elmsize);
+ 	      fprintf (vect_dump, ". misalignment = %d. ", DR_MISALIGNMENT (dr));
+ 	      fprintf (vect_dump, "data size does not divide the misalignment.\n");
+ 	    }
+ 	  return false;
+ 	}
+     }
+ 
+   if (!known_alignment_for_access_p (dr))
+     {
+       tree type = (TREE_TYPE (DR_REF (dr)));
+       tree ba = DR_BASE_OBJECT (dr);
+       bool is_packed = false;
+ 
+       if (ba)
+ 	is_packed = contains_packed_reference (ba);
+
+       if (vect_print_dump_info (REPORT_DETAILS))
+ 	fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed);
+       if (targetm.vectorize.vector_alignment_reachable (type, is_packed))
+ 	return true;
+       else
+ 	return false;
+     }
+ 
+   return true;
+ }
+ 
  /* Function vect_enhance_data_refs_alignment
  
     This pass will use loop versioning and loop peeling in order to enhance
*************** vect_enhance_data_refs_alignment (loop_v
*** 1540,1572 ****
  
        if (!DR_IS_READ (dr) && !aligned_access_p (dr))
          {
! 	  if (DR_GROUP_FIRST_DR (stmt_info))
! 	    {
! 	      /* For interleaved access we peel only if number of iterations in
! 		 the prolog loop ({VF - misalignment}), is a multiple of the
! 		 number of the interleaved accesses.  */
! 	      int elem_size, mis_in_elements;
! 	      tree vectype = STMT_VINFO_VECTYPE (stmt_info);
! 	      int nelements = TYPE_VECTOR_SUBPARTS (vectype);
! 
! 	      /* FORNOW: handle only known alignment.  */
! 	      if (!known_alignment_for_access_p (dr))
! 		{
! 		  do_peeling = false;
! 		  break;
! 		}
! 
! 	      elem_size = UNITS_PER_SIMD_WORD / nelements;
! 	      mis_in_elements = DR_MISALIGNMENT (dr) / elem_size;
! 
! 	      if ((nelements - mis_in_elements) % DR_GROUP_SIZE (stmt_info))
! 		{
! 		  do_peeling = false;
! 		  break;
! 		}
! 	    }
! 	  dr0 = dr;
! 	  do_peeling = true;
  	  break;
  	}
      }
--- 1605,1615 ----
  
        if (!DR_IS_READ (dr) && !aligned_access_p (dr))
          {
! 	  do_peeling = vector_alignment_reachable_p (dr);
! 	  if (do_peeling)
! 	    dr0 = dr;
! 	  if (!do_peeling && vect_print_dump_info (REPORT_DETAILS))
!             fprintf (vect_dump, "vector alignment may not be reachable");
  	  break;
  	}
      }
Index: gcc/target-def.h
===================================================================
*** gcc/target-def.h	(revision 126162)
--- gcc/target-def.h	(working copy)
*************** Foundation, 51 Franklin Street, Fifth Fl
*** 356,361 ****
--- 356,364 ----
    default_builtin_vectorized_conversion
  #define TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN 0
  #define TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD 0
+ #define TARGET_VECTOR_ALIGNMENT_REACHABLE \
+   default_builtin_vector_alignment_reachable
+ 
  
  #define TARGET_VECTORIZE                                                \
    {									\
*************** Foundation, 51 Franklin Street, Fifth Fl
*** 363,369 ****
      TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION,			\
      TARGET_VECTORIZE_BUILTIN_CONVERSION,				\
      TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN,                            \
!     TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD				\
    }
  
  #define TARGET_DEFAULT_TARGET_FLAGS 0
--- 366,373 ----
      TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION,			\
      TARGET_VECTORIZE_BUILTIN_CONVERSION,				\
      TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN,                            \
!     TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD,				\
!     TARGET_VECTOR_ALIGNMENT_REACHABLE					\
    }
  
  #define TARGET_DEFAULT_TARGET_FLAGS 0
Index: gcc/config/rs6000/rs6000.c
===================================================================
*** gcc/config/rs6000/rs6000.c	(revision 126162)
--- gcc/config/rs6000/rs6000.c	(working copy)
*************** static tree rs6000_builtin_mul_widen_odd
*** 717,722 ****
--- 717,723 ----
  static tree rs6000_builtin_conversion (enum tree_code, tree);
  
  static void def_builtin (int, const char *, tree, int);
+ static bool rs6000_vector_alignment_reachable (tree, bool);
  static void rs6000_init_builtins (void);
  static rtx rs6000_expand_unop_builtin (enum insn_code, tree, rtx);
  static rtx rs6000_expand_binop_builtin (enum insn_code, tree, rtx);
*************** static const char alt_reg_names[][8] =
*** 984,989 ****
--- 985,993 ----
  #undef TARGET_VECTORIZE_BUILTIN_CONVERSION
  #define TARGET_VECTORIZE_BUILTIN_CONVERSION rs6000_builtin_conversion
  
+ #undef TARGET_VECTOR_ALIGNMENT_REACHABLE
+ #define TARGET_VECTOR_ALIGNMENT_REACHABLE rs6000_vector_alignment_reachable
+ 
  #undef TARGET_INIT_BUILTINS
  #define TARGET_INIT_BUILTINS rs6000_init_builtins
  
*************** rs6000_builtin_mul_widen_odd (tree type)
*** 1806,1811 ****
--- 1810,1844 ----
      }
  }
  
+ 
+ /* Return true iff, data reference of TYPE can reach vector alignment (16)
+    after applying N number of iterations.  This routine does not determine
+    how may iterations are required to reach desired alignment.  */
+
+ static bool
+ rs6000_vector_alignment_reachable (tree type, bool is_packed)
+ {
+   if (is_packed)
+     return false;
+
+   if (TARGET_MACHO)
+     {
+       if (TARGET_32BIT)
+ 	{
+ 	  if (rs6000_alignment_flags == MASK_ALIGN_NATURAL)
+ 	    return true;
+ 
+ 	  if (rs6000_alignment_flags ==  MASK_ALIGN_POWER)
+ 	    return true;
+ 	}
+       return false;
+     }
+ 
+   /* Assuming that all other types are naturally aligned. CHECKME!  */
+   return true;
+ }
+ 
+
  /* Handle generic options of the form -mfoo=yes/no.
     NAME is the option name.
     VALUE is the option value.
Comment 8 patchapp@dberlin.org 2007-07-02 08:30:12 UTC
Subject: Bug number PR25413

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00082.html
Comment 9 dorit 2007-07-12 14:42:19 UTC
Subject: Bug 25413

Author: dorit
Date: Thu Jul 12 14:42:08 2007
New Revision: 126591

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126591
Log:
2007-07-12  Dorit Nuzman  <dorit@il.ibm.com>
            Devang Patel  <dpatel@apple.com>

        PR tree-optimization/25413
        * targhooks.c (default_builtin_vector_alignment_reachable): New.
        * targhooks.h (default_builtin_vector_alignment_reachable): New.
        * tree.h (contains_packed_reference): New.
        * expr.c (contains_packed_reference): New.
        * tree-vect-analyze.c (vector_alignment_reachable_p): New.
        (vect_enhance_data_refs_alignment): Call
        vector_alignment_reachable_p.
        * target.h (vector_alignment_reachable): New builtin.
        * target-def.h (TARGET_VECTOR_ALIGNMENT_REACHABLE): New.
        * config/rs6000/rs6000.c (rs6000_vector_alignment_reachable): New.
        (TARGET_VECTOR_ALIGNMENT_REACHABLE): Define.


Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr25413.c
    trunk/gcc/testsuite/gcc.dg/vect/pr25413a.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-align-1.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-align-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/expr.c
    trunk/gcc/target-def.h
    trunk/gcc/target.h
    trunk/gcc/targhooks.c
    trunk/gcc/targhooks.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/pr31699.c
    trunk/gcc/tree-vect-analyze.c
    trunk/gcc/tree.h

Comment 10 Ryan Hill 2007-07-13 00:13:26 UTC
any chance of a 4.2 backport?
Comment 11 dorit 2007-07-16 08:02:19 UTC
(In reply to comment #10)
> any chance of a 4.2 backport?

sure (as soon as 4.2 gets out of freeze. unfortunately we missed the 4.2.1 release).

Comment 12 Volker Reichelt 2007-07-17 11:05:39 UTC
*** Bug 19716 has been marked as a duplicate of this bug. ***
Comment 13 dorit 2007-07-24 09:05:45 UTC
David, can you confirm that this PR can now be closed?
Comment 14 Ryan Hill 2007-07-24 19:37:09 UTC
Created attachment 13965 [details]
zlib testcase

> A fix for PR25413 was committed to mainline.
> Ryan, could you please check if it solves the zlib miscompilation?
> Andrew, would you plase check if it solves the libgcc miscompilation that you
> are seeing?

Unfortunately it doesn't.  I'm still getting unaligned movdqa instructions with both mainline and the 4.2 patch.  Both testcases on this bug now work however, so maybe the problem lies elsewhere.

I'm attaching a (badly) reduced testcase from inftrees.c in zlib which i believe shows the bug.  Compile with -O -msse2 -ftree-vectorize.  On i686 i'm getting this:

inftrees.lo:     file format elf32-i386

Disassembly of section .text:

00000000 <inflate_table>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   53                      push   %ebx
   4:   83 ec 24                sub    $0x24,%esp
   7:   8b 5d 0c                mov    0xc(%ebp),%ebx
   a:   8b 4d 10                mov    0x10(%ebp),%ecx
   d:   8d 45 d8                lea    -0x28(%ebp),%eax
  10:   66 0f ef c0             pxor   %xmm0,%xmm0
  14:   66 0f 7f 00             movdqa %xmm0,(%eax)
  18:   66 0f 7f 40 10          movdqa %xmm0,0x10(%eax)
  1d:   85 c9                   test   %ecx,%ecx
  1f:   74 16                   je     37 <inflate_table+0x37>
  21:   ba 00 00 00 00          mov    $0x0,%edx
  26:   0f b7 04 53             movzwl (%ebx,%edx,2),%eax
  2a:   66 83 44 45 d8 01       addw   $0x1,-0x28(%ebp,%eax,2)
  30:   83 c2 01                add    $0x1,%edx
  33:   39 ca                   cmp    %ecx,%edx
  35:   75 ef                   jne    26 <inflate_table+0x26>
  37:   b8 0f 00 00 00          mov    $0xf,%eax
  3c:   8d 55 d8                lea    -0x28(%ebp),%edx
  3f:   66 83 3c 42 00          cmpw   $0x0,(%edx,%eax,2)
  44:   75 05                   jne    4b <inflate_table+0x4b>
  46:   83 e8 01                sub    $0x1,%eax
  49:   75 f4                   jne    3f <inflate_table+0x3f>
  4b:   83 c4 24                add    $0x24,%esp
  4e:   5b                      pop    %ebx
  4f:   5d                      pop    %ebp
  50:   c3                      ret


And without the vectorizer:

inftrees.lo:     file format elf32-i386

Disassembly of section .text:

00000000 <inflate_table>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   53                      push   %ebx
   4:   83 ec 20                sub    $0x20,%esp
   7:   8b 5d 0c                mov    0xc(%ebp),%ebx
   a:   8b 4d 10                mov    0x10(%ebp),%ecx
   d:   b8 00 00 00 00          mov    $0x0,%eax
  12:   8d 55 dc                lea    -0x24(%ebp),%edx
  15:   66 c7 04 42 00 00       movw   $0x0,(%edx,%eax,2)
  1b:   83 c0 01                add    $0x1,%eax
  1e:   83 f8 10                cmp    $0x10,%eax
  21:   75 f2                   jne    15 <inflate_table+0x15>
  23:   85 c9                   test   %ecx,%ecx
  25:   74 16                   je     3d <inflate_table+0x3d>
  27:   ba 00 00 00 00          mov    $0x0,%edx
  2c:   0f b7 04 53             movzwl (%ebx,%edx,2),%eax
  30:   66 83 44 45 dc 01       addw   $0x1,-0x24(%ebp,%eax,2)
  36:   83 c2 01                add    $0x1,%edx
  39:   39 ca                   cmp    %ecx,%edx
  3b:   75 ef                   jne    2c <inflate_table+0x2c>
  3d:   b8 0f 00 00 00          mov    $0xf,%eax
  42:   8d 55 dc                lea    -0x24(%ebp),%edx
  45:   66 83 3c 42 00          cmpw   $0x0,(%edx,%eax,2)
  4a:   75 05                   jne    51 <inflate_table+0x51>
  4c:   83 e8 01                sub    $0x1,%eax
  4f:   75 f4                   jne    45 <inflate_table+0x45>
  51:   83 c4 20                add    $0x20,%esp
  54:   5b                      pop    %ebx
  55:   5d                      pop    %ebp
  56:   c3                      ret
Comment 15 Ryan Hill 2007-07-24 19:40:35 UTC
Created attachment 13966 [details]
gcc-PR25413-gdb.log
Comment 16 David Monniaux 2007-07-24 21:21:30 UTC
(In reply to comment #13)
> David, can you confirm that this PR can now be closed?

I'm no seeing the bug any longer when compiling/testing the octagon library. This does not imply, though, that it no longer occurs on other examples.
Comment 17 dorit 2007-07-25 08:40:07 UTC
This looks like an unrelated problem - the vectorizer does not perform loop peeling here so it's not an issue of natural alignment. Lets open a separate PR for this one, unless there's already one open. In the meantime, would you please try this patch?:

Index: tree-vectorizer.c
===================================================================
*** tree-vectorizer.c   (revision 126902)
--- tree-vectorizer.c   (working copy)
*************** vect_can_force_dr_alignment_p (tree decl
*** 1527,1533 ****
         PREFERRED_STACK_BOUNDARY is honored by all translation units.
         However, until someone implements forced stack alignment, SSE
         isn't really usable without this.  */
!     return (alignment <= PREFERRED_STACK_BOUNDARY);
  }


--- 1527,1533 ----
         PREFERRED_STACK_BOUNDARY is honored by all translation units.
         However, until someone implements forced stack alignment, SSE
         isn't really usable without this.  */
!     return (alignment <= STACK_BOUNDARY);
  }




Comment 18 dorit 2007-07-25 08:51:25 UTC
Subject: Bug 25413

Author: dorit
Date: Wed Jul 25 08:51:12 2007
New Revision: 126904

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126904
Log:
2007-07-25  Dorit Nuzman  <dorit@il.ibm.com>
            Devang Patel  <dpatel@apple.com>

        PR tree-optimization/25413
        * targhooks.c (default_builtin_vector_alignment_reachable): New.
        * targhooks.h (default_builtin_vector_alignment_reachable): New.
        * tree.h (contains_packed_reference): New.
        * expr.c (contains_packed_reference): New.
        * tree-vect-analyze.c (vector_alignment_reachable_p): New.
        (vect_enhance_data_refs_alignment): Call
        vector_alignment_reachable_p.
        * target.h (vector_alignment_reachable): New builtin.
        * target-def.h (TARGET_VECTOR_ALIGNMENT_REACHABLE): New.
        * config/rs6000/rs6000.c (rs6000_vector_alignment_reachable): New.
        (TARGET_VECTOR_ALIGNMENT_REACHABLE): Define.

2007-07-25  Dorit Nuzman  <dorit@il.ibm.com>
            Devang Patel  <dpatel@apple.com>
            Uros Bizjak  <ubizjak@gmail.com>

        PR tree-optimization/25413
        * lib/target-supports.exp (check_effective_target_vect_aligned_arrays):
        New procedure to check if arrays are naturally aligned to the vector
        alignment boundary.
        * gcc.dg/vect/vect-align-1.c: New.
        * gcc.dg/vect/vect-align-2.c: New.
        * gcc.dg/vect/pr25413.c: New.
        * gcc.dg/vect/pr25413a.c: New.


Added:
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/pr25413.c
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/pr25413a.c
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/vect-align-1.c
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/vect-align-2.c
Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/config/rs6000/rs6000.c
    branches/gcc-4_2-branch/gcc/expr.c
    branches/gcc-4_2-branch/gcc/target-def.h
    branches/gcc-4_2-branch/gcc/target.h
    branches/gcc-4_2-branch/gcc/targhooks.c
    branches/gcc-4_2-branch/gcc/targhooks.h
    branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_2-branch/gcc/testsuite/lib/target-supports.exp
    branches/gcc-4_2-branch/gcc/tree-vect-analyze.c
    branches/gcc-4_2-branch/gcc/tree.h

Comment 19 dorit 2007-07-25 08:52:17 UTC
problem fixed.
Comment 20 pinskia@gmail.com 2007-07-25 09:12:41 UTC
Subject: Re:  wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE

On 25 Jul 2007 08:40:09 -0000, dorit at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote
> In the meantime, would you please try this patch?:

Of course after my patch for PR 16660, the patch here should be
changed to just return true always.

Thanks,
Andrew Pinski
Comment 21 dorit 2007-07-25 11:11:38 UTC
> Of course after my patch for PR 16660, the patch here should be
> changed to just return true always.

In this case, Ryan, could you please also try to see if Andrew's patch (http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00177.html) fixes the problem?
Comment 22 Ryan Hill 2007-07-25 20:24:29 UTC
(In reply to comment #17)
> Lets open a separate PR for this one

This is now PR 32893.
Comment 23 Volker Reichelt 2007-12-28 01:01:45 UTC
*** Bug 33958 has been marked as a duplicate of this bug. ***