(The same phenomenon happens in 4.2 subversion alpha.) Incorrect code is generated when using -ftree-vectorize on the SSE2 Pentium 4 target. (The same code works on the AMD64 64-bit target.) See attached source code. $ gcc -O2 -Wall -march=pentium4 -mfpmath=sse -ftree-vectorize -ftree-vectorizer-verbose=1 gcc_plantage2.c -o ./gcc_plantage2 $ ./gcc_plantage2 will segfault (the same program works perfectly without -ftree-vectorize). The segfault appears at the second movapd instruction: movapd .LC0, %xmm0 .L17: movapd %xmm0, (%eax) addl $1, %edx addl $16, %eax cmpl %edx, %ebx ja .L17 %eax is at this point equal to 4 modulo 16, which results in a segmentation fault, since movapd assumes 16-byte alignment.
Created attachment 10486 [details] preprocessed C source code (assumes Linux glibc) exhibiting the code generation bug Compile this program with -O2 -march=pentium4 -ftree-vectorize on a Pentium 4 and run it. It will segfault. It works perfectly without vectorization.
The problem is that the vectorizer applies loop-peeling in order to align the data reference *(m->c+i), and peeling only works correctly if the data is naturally aligned (aligned on it's type size). This is what the vectorizer currently blindly assumes, but on the Pentium4 doubles are not necessarily 64bit aligned. Accidentally Devang and I discussed this issue last week, and Devang actually committed a patch to apple-ppc branch that works around the problem ( http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108214). Devang's patch however will not fix this PR - the patch he committed disables vectorization if the vectorizer was able to compute the misalignment, and discovered that it doesn't evenly divide by the type size. In this testcase the misalignment is unknown at compile time. To fix this problem we need to disable loop-peeling in the vectorizer if we can't prove that the data is naturally aligned. Alternatively, if we can't prove either way we can peel the loop but control the number of iterations it will execute using a runtime test (i.e. have the prolog loop iterate the entire loop-count if at runtime we discover that the data is not naturally aligned).
related discussion: http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html
Confirmed.
Any news on this one? The bug makes tree vectorization on pentium 4 totally useless. :-( Btw, here's a smaller code snippet for testing. Just compile it with gcc -O -msse2 -ftree-vectorize on a pentium 4 and see the resulting executable segfault: ================================ struct { char c; double d[2]; } a; int main() { int i; for ( i=0; i<2; ++i ) a.d[i]=0; return 0; } ================================
So I see Devang had sent a patch for this over a year ago: http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00167.html I don't know what ever happened to it. Maybe you want to give it a try? (you may need to implement the new target hook for Pentium4). If you have problems applying the patch (it is a bit old) - I could try to help update the patch (not before next week though).
I'm testing the following patch (seems to fix the two testcases in this PR on Pentium4. still need to bootstrap etc, and check the powerpc bits) Index: gcc/targhooks.c =================================================================== *** gcc/targhooks.c (revision 126162) --- gcc/targhooks.c (working copy) *************** tree default_mangle_decl_assembler_name *** 634,637 **** --- 634,653 ---- return id; } + bool + default_builtin_vector_alignment_reachable (tree type, bool is_packed) + { + if (is_packed) + return false; + + /* Assuming that types whose size is > pointer-size are not guaranteed to be + naturally aligned. */ + if (tree_int_cst_compare (TYPE_SIZE (type), bitsize_int (POINTER_SIZE)) > 0) + return false; + + /* Assuming that types whose size is <= pointer-size + are naturally aligned. */ + return true; + } + #include "gt-targhooks.h" Index: gcc/targhooks.h =================================================================== *** gcc/targhooks.h (revision 126162) --- gcc/targhooks.h (working copy) *************** extern tree default_builtin_vectorized_c *** 62,67 **** --- 62,69 ---- extern tree default_builtin_reciprocal (enum built_in_function, bool, bool); + extern bool default_builtin_vector_alignment_reachable (tree, bool); + /* These are here, and not in hooks.[ch], because not all users of hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS. */ Index: gcc/tree.h =================================================================== *** gcc/tree.h (revision 126162) --- gcc/tree.h (working copy) *************** extern tree get_inner_reference (tree, H *** 4327,4332 **** --- 4327,4338 ---- tree *, enum machine_mode *, int *, int *, bool); + /* Given an expression EXP that may be a COMPONENT_REF or an ARRAY_REF, + look for whether EXP or any nested component-refs within EXP is marked + as PACKED. */ + + extern bool contains_packed_reference (tree exp); + /* Return 1 if T is an expression that get_inner_reference handles. */ extern int handled_component_p (tree); Index: gcc/target.h =================================================================== *** gcc/target.h (revision 126162) --- gcc/target.h (working copy) *************** struct gcc_target *** 413,418 **** --- 413,422 ---- element-by-element products for the odd elements. */ tree (* builtin_mul_widen_even) (tree); tree (* builtin_mul_widen_odd) (tree); + + /* Return true if vector alignment is reachable (by peeling N + interations) for the given type. */ + bool (* vector_alignment_reachable) (tree, bool); } vectorize; /* The initial value of target_flags. */ Index: gcc/testsuite/gcc.dg/vect/vect-align-1.c =================================================================== *** gcc/testsuite/gcc.dg/vect/vect-align-1.c (revision 0) --- gcc/testsuite/gcc.dg/vect/vect-align-1.c (revision 0) *************** *** 0 **** --- 1,50 ---- + /* { dg-require-effective-target vect_int } */ + + #include <stdlib.h> + #include <stdarg.h> + #include "tree-vect.h" + + /* Compile time known misalignment. Cannot use loop peeling to align + the store. */ + + #define N 16 + + struct foo { + char x; + int y[N]; + } __attribute__((packed)); + + int + main1 (struct foo * __restrict__ p) + { + int i; + int x[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + + for (i = 0; i < N; i++) + { + p->y[i] = x[i]; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (p->y[i] != x[i]) + abort (); + } + return 0; + } + + + int main (void) + { + int i; + struct foo *p = malloc (2*sizeof (struct foo)); + check_vect (); + + main1 (p); + return 0; + } + + /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" } } */ + /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr25413a.c =================================================================== *** gcc/testsuite/gcc.dg/vect/pr25413a.c (revision 0) --- gcc/testsuite/gcc.dg/vect/pr25413a.c (revision 0) *************** *** 0 **** --- 1,129 ---- + /* { dg-require-effective-target vect_double } */ + + #include <stdarg.h> + #include "tree-vect.h" + + #define N 8 + + typedef unsigned int size_t; + + extern void *malloc (size_t __size) __attribute__ ((__nothrow__)) __attribute__ ((__malloc__)); + + typedef double num_t; + static const num_t num__infty = ((num_t)1.0)/((num_t)0.0); + + struct oct_tt; + typedef struct oct_tt oct_t; + + typedef unsigned int var_t; + typedef enum { + OCT_EMPTY = 0, + OCT_NORMAL = 1, + OCT_CLOSED = 2 + } oct_state; + + struct oct_tt { + var_t n; + + int ref; + + oct_state state; + struct oct_tt* closed; + + num_t* c; + }; + + void* octfapg_mm_malloc (size_t t); + oct_t* octfapg_alloc (var_t n); + oct_t* octfapg_full_copy (oct_t* m); + + struct mmalloc_tt; + typedef struct mmalloc_tt mmalloc_t; + + struct mmalloc_tt + { + int id; + + int nb_alloc; + int nb_realloc; + int nb_free; + + size_t rem; + size_t max; + size_t tot; + + }; + + typedef struct + { + size_t size; + + mmalloc_t* mm; + int id; + + double dummy; + + } mmheader_t; + + void* + octfapg_mm_malloc (size_t t) + { + char* m = (char*)malloc(t+sizeof(mmheader_t)); + return m+sizeof(mmheader_t); + } + + oct_t* octfapg_empty (var_t n); + + oct_t* + octfapg_empty (const var_t n) + { + oct_t* m; + /*octfapg_timing_enter("oct_empty",3);*/ + m = ((oct_t*) octfapg_mm_malloc (sizeof(oct_t))); + m->n = n; + m->ref = 1; + m->state = OCT_EMPTY; + m->closed = (oct_t*)((void *)0); + m->c = (num_t*)((void *)0); + /*octfapg_timing_exit("oct_empty",3);*/ + return m; + } + + oct_t* + octfapg_alloc (const var_t n) + { + size_t nn = (2*(size_t)(n)*((size_t)(n)+1)); + oct_t* m; + m = octfapg_empty(n); + m->c = ((num_t*) octfapg_mm_malloc (sizeof(num_t)*(nn))); + ; + m->state = OCT_NORMAL; + m->closed = (oct_t*)((void *)0); + return m; + } + + oct_t* + octfapg_universe (const var_t n) + { + oct_t* m; + size_t i, nn = (2*(size_t)(n)*((size_t)(n)+1)); + m = octfapg_alloc(n); + for (i=0;i<nn;i++) *(m->c+i) = num__infty; + for (i=0;i<2*n;i++) *(m->c+((size_t)(i)+(((size_t)(i)+1)*((size_t)(i)+1))/2)) = (num_t)(0); + m->state = OCT_CLOSED; + return m; + } + + int main (void) + { + int i; + check_vect (); + + oct_t *p = octfapg_universe(10); + return 0; + } + + /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 "vect" } } */ + /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" } } */ + /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-align-2.c =================================================================== *** gcc/testsuite/gcc.dg/vect/vect-align-2.c (revision 0) --- gcc/testsuite/gcc.dg/vect/vect-align-2.c (revision 0) *************** *** 0 **** --- 1,46 ---- + /* { dg-require-effective-target vect_int } */ + /* { dg-do run } */ + + #include <stdlib.h> + #include <stdarg.h> + #include "tree-vect.h" + + /* Compile time unknown misalignment. Cannot use loop peeling to align + the store. */ + + #define N 17 + + struct foo { + char x0; + int y[N][N]; + } __attribute__ ((packed)); + + struct foo f2; + int z[16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + + void fbar(struct foo *fp) + { + int i,j; + for (i=0; i<N; i++) + for (j=0; j<N; j++) + f2.y[i][j] = z[i]; + + for (i=0; i<N; i++) + for (j=0; j<N; j++) + if (f2.y[i][j] != z[i]) + abort (); + } + + int main (void) + { + struct foo *fp = (struct foo *) malloc (2*sizeof (struct foo)); + + fbar(fp); + return 0; + } + + + /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" } } */ + /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" } } */ + /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr31699.c =================================================================== *** gcc/testsuite/gcc.dg/vect/pr31699.c (revision 126162) --- gcc/testsuite/gcc.dg/vect/pr31699.c (working copy) *************** int main() *** 31,35 **** } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_intfloat_cvt } } } */ ! /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_no_align } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ --- 31,36 ---- } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_intfloat_cvt } } } */ ! /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 "vect" } } */ ! /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr25413.c =================================================================== *** gcc/testsuite/gcc.dg/vect/pr25413.c (revision 0) --- gcc/testsuite/gcc.dg/vect/pr25413.c (revision 0) *************** *** 0 **** --- 1,37 ---- + /* { dg-require-effective-target vect_double } */ + + #include <stdarg.h> + #include "tree-vect.h" + + #define N 8 + + struct + { + char c; + double d[N]; + } a; + + int main1() + { + int i; + for ( i=0; i<N; ++i ) + a.d[i]=1; + return 0; + } + + int main (void) + { + int i; + check_vect (); + + main1 (); + for (i=0; i<N; i++) + if (a.d[i] != 1) + abort (); + return 0; + } + + /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" } } */ + /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 "vect" } } */ + /* { dg-final { scan-tree-dump-times "not vectorized: unsupported unaligned store" 1 "vect" } } */ + /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/expr.c =================================================================== *** gcc/expr.c (revision 126162) --- gcc/expr.c (working copy) *************** get_inner_reference (tree exp, HOST_WIDE *** 5924,5929 **** --- 5924,5966 ---- return exp; } + bool + contains_packed_reference (tree exp) + { + bool packed_p = false; + + while (1) + { + switch (TREE_CODE (exp)) + { + case COMPONENT_REF: + { + tree field = TREE_OPERAND (exp, 1); + packed_p = DECL_PACKED (field) + || TYPE_PACKED (TREE_TYPE (field)) /* CHECKME */ + || TYPE_PACKED (TREE_TYPE (exp)); + if (packed_p) + goto done; + } + break; + + case BIT_FIELD_REF: + case ARRAY_REF: + case ARRAY_RANGE_REF: + case REALPART_EXPR: + case IMAGPART_EXPR: + case VIEW_CONVERT_EXPR: + break; + + default: + goto done; + } + exp = TREE_OPERAND (exp, 0); + } + done: + return packed_p; + } + /* Return a tree of sizetype representing the size, in bytes, of the element of EXP, an ARRAY_REF. */ Index: gcc/tree-vect-analyze.c =================================================================== *** gcc/tree-vect-analyze.c (revision 126162) --- gcc/tree-vect-analyze.c (working copy) *************** Software Foundation, 51 Franklin Street, *** 25,30 **** --- 25,31 ---- #include "tm.h" #include "ggc.h" #include "tree.h" + #include "target.h" #include "basic-block.h" #include "diagnostic.h" #include "tree-flow.h" *************** vect_verify_datarefs_alignment (loop_vec *** 1379,1384 **** --- 1380,1449 ---- } + static bool + vector_alignment_reachable_p (struct data_reference *dr) + { + tree stmt = DR_STMT (dr); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + + if (DR_GROUP_FIRST_DR (stmt_info)) + { + /* For interleaved access we peel only if number of iterations in + the prolog loop ({VF - misalignment}), is a multiple of the + number of the interleaved accesses. */ + int elem_size, mis_in_elements; + int nelements = TYPE_VECTOR_SUBPARTS (vectype); + + /* FORNOW: handle only known alignment. */ + if (!known_alignment_for_access_p (dr)) + return false; + + elem_size = UNITS_PER_SIMD_WORD / nelements; + mis_in_elements = DR_MISALIGNMENT (dr) / elem_size; + + if ((nelements - mis_in_elements) % DR_GROUP_SIZE (stmt_info)) + return false; + } + + /* If misalignment is known at the compiler time then apply peeling + only if natural alignment is reachable through peeling. */ + if (known_alignment_for_access_p (dr) && !aligned_access_p (dr)) + { + HOST_WIDE_INT elmsize = + int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); + if (DR_MISALIGNMENT (dr) % elmsize) + { + if (vect_print_dump_info (REPORT_DETAILS)) + { + fprintf (vect_dump, "data size =" HOST_WIDE_INT_PRINT_DEC, elmsize); + fprintf (vect_dump, ". misalignment = %d. ", DR_MISALIGNMENT (dr)); + fprintf (vect_dump, "data size does not divide the misalignment.\n"); + } + return false; + } + } + + if (!known_alignment_for_access_p (dr)) + { + tree type = (TREE_TYPE (DR_REF (dr))); + tree ba = DR_BASE_OBJECT (dr); + bool is_packed = false; + + if (ba) + is_packed = contains_packed_reference (ba); + + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed); + if (targetm.vectorize.vector_alignment_reachable (type, is_packed)) + return true; + else + return false; + } + + return true; + } + /* Function vect_enhance_data_refs_alignment This pass will use loop versioning and loop peeling in order to enhance *************** vect_enhance_data_refs_alignment (loop_v *** 1540,1572 **** if (!DR_IS_READ (dr) && !aligned_access_p (dr)) { ! if (DR_GROUP_FIRST_DR (stmt_info)) ! { ! /* For interleaved access we peel only if number of iterations in ! the prolog loop ({VF - misalignment}), is a multiple of the ! number of the interleaved accesses. */ ! int elem_size, mis_in_elements; ! tree vectype = STMT_VINFO_VECTYPE (stmt_info); ! int nelements = TYPE_VECTOR_SUBPARTS (vectype); ! ! /* FORNOW: handle only known alignment. */ ! if (!known_alignment_for_access_p (dr)) ! { ! do_peeling = false; ! break; ! } ! ! elem_size = UNITS_PER_SIMD_WORD / nelements; ! mis_in_elements = DR_MISALIGNMENT (dr) / elem_size; ! ! if ((nelements - mis_in_elements) % DR_GROUP_SIZE (stmt_info)) ! { ! do_peeling = false; ! break; ! } ! } ! dr0 = dr; ! do_peeling = true; break; } } --- 1605,1615 ---- if (!DR_IS_READ (dr) && !aligned_access_p (dr)) { ! do_peeling = vector_alignment_reachable_p (dr); ! if (do_peeling) ! dr0 = dr; ! if (!do_peeling && vect_print_dump_info (REPORT_DETAILS)) ! fprintf (vect_dump, "vector alignment may not be reachable"); break; } } Index: gcc/target-def.h =================================================================== *** gcc/target-def.h (revision 126162) --- gcc/target-def.h (working copy) *************** Foundation, 51 Franklin Street, Fifth Fl *** 356,361 **** --- 356,364 ---- default_builtin_vectorized_conversion #define TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN 0 #define TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD 0 + #define TARGET_VECTOR_ALIGNMENT_REACHABLE \ + default_builtin_vector_alignment_reachable + #define TARGET_VECTORIZE \ { \ *************** Foundation, 51 Franklin Street, Fifth Fl *** 363,369 **** TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION, \ TARGET_VECTORIZE_BUILTIN_CONVERSION, \ TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN, \ ! TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD \ } #define TARGET_DEFAULT_TARGET_FLAGS 0 --- 366,373 ---- TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION, \ TARGET_VECTORIZE_BUILTIN_CONVERSION, \ TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN, \ ! TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD, \ ! TARGET_VECTOR_ALIGNMENT_REACHABLE \ } #define TARGET_DEFAULT_TARGET_FLAGS 0 Index: gcc/config/rs6000/rs6000.c =================================================================== *** gcc/config/rs6000/rs6000.c (revision 126162) --- gcc/config/rs6000/rs6000.c (working copy) *************** static tree rs6000_builtin_mul_widen_odd *** 717,722 **** --- 717,723 ---- static tree rs6000_builtin_conversion (enum tree_code, tree); static void def_builtin (int, const char *, tree, int); + static bool rs6000_vector_alignment_reachable (tree, bool); static void rs6000_init_builtins (void); static rtx rs6000_expand_unop_builtin (enum insn_code, tree, rtx); static rtx rs6000_expand_binop_builtin (enum insn_code, tree, rtx); *************** static const char alt_reg_names[][8] = *** 984,989 **** --- 985,993 ---- #undef TARGET_VECTORIZE_BUILTIN_CONVERSION #define TARGET_VECTORIZE_BUILTIN_CONVERSION rs6000_builtin_conversion + #undef TARGET_VECTOR_ALIGNMENT_REACHABLE + #define TARGET_VECTOR_ALIGNMENT_REACHABLE rs6000_vector_alignment_reachable + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins *************** rs6000_builtin_mul_widen_odd (tree type) *** 1806,1811 **** --- 1810,1844 ---- } } + + /* Return true iff, data reference of TYPE can reach vector alignment (16) + after applying N number of iterations. This routine does not determine + how may iterations are required to reach desired alignment. */ + + static bool + rs6000_vector_alignment_reachable (tree type, bool is_packed) + { + if (is_packed) + return false; + + if (TARGET_MACHO) + { + if (TARGET_32BIT) + { + if (rs6000_alignment_flags == MASK_ALIGN_NATURAL) + return true; + + if (rs6000_alignment_flags == MASK_ALIGN_POWER) + return true; + } + return false; + } + + /* Assuming that all other types are naturally aligned. CHECKME! */ + return true; + } + + /* Handle generic options of the form -mfoo=yes/no. NAME is the option name. VALUE is the option value.
Subject: Bug number PR25413 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00082.html
Subject: Bug 25413 Author: dorit Date: Thu Jul 12 14:42:08 2007 New Revision: 126591 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126591 Log: 2007-07-12 Dorit Nuzman <dorit@il.ibm.com> Devang Patel <dpatel@apple.com> PR tree-optimization/25413 * targhooks.c (default_builtin_vector_alignment_reachable): New. * targhooks.h (default_builtin_vector_alignment_reachable): New. * tree.h (contains_packed_reference): New. * expr.c (contains_packed_reference): New. * tree-vect-analyze.c (vector_alignment_reachable_p): New. (vect_enhance_data_refs_alignment): Call vector_alignment_reachable_p. * target.h (vector_alignment_reachable): New builtin. * target-def.h (TARGET_VECTOR_ALIGNMENT_REACHABLE): New. * config/rs6000/rs6000.c (rs6000_vector_alignment_reachable): New. (TARGET_VECTOR_ALIGNMENT_REACHABLE): Define. Added: trunk/gcc/testsuite/gcc.dg/vect/pr25413.c trunk/gcc/testsuite/gcc.dg/vect/pr25413a.c trunk/gcc/testsuite/gcc.dg/vect/vect-align-1.c trunk/gcc/testsuite/gcc.dg/vect/vect-align-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.c trunk/gcc/expr.c trunk/gcc/target-def.h trunk/gcc/target.h trunk/gcc/targhooks.c trunk/gcc/targhooks.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/pr31699.c trunk/gcc/tree-vect-analyze.c trunk/gcc/tree.h
any chance of a 4.2 backport?
(In reply to comment #10) > any chance of a 4.2 backport? sure (as soon as 4.2 gets out of freeze. unfortunately we missed the 4.2.1 release).
*** Bug 19716 has been marked as a duplicate of this bug. ***
David, can you confirm that this PR can now be closed?
Created attachment 13965 [details] zlib testcase > A fix for PR25413 was committed to mainline. > Ryan, could you please check if it solves the zlib miscompilation? > Andrew, would you plase check if it solves the libgcc miscompilation that you > are seeing? Unfortunately it doesn't. I'm still getting unaligned movdqa instructions with both mainline and the 4.2 patch. Both testcases on this bug now work however, so maybe the problem lies elsewhere. I'm attaching a (badly) reduced testcase from inftrees.c in zlib which i believe shows the bug. Compile with -O -msse2 -ftree-vectorize. On i686 i'm getting this: inftrees.lo: file format elf32-i386 Disassembly of section .text: 00000000 <inflate_table>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 83 ec 24 sub $0x24,%esp 7: 8b 5d 0c mov 0xc(%ebp),%ebx a: 8b 4d 10 mov 0x10(%ebp),%ecx d: 8d 45 d8 lea -0x28(%ebp),%eax 10: 66 0f ef c0 pxor %xmm0,%xmm0 14: 66 0f 7f 00 movdqa %xmm0,(%eax) 18: 66 0f 7f 40 10 movdqa %xmm0,0x10(%eax) 1d: 85 c9 test %ecx,%ecx 1f: 74 16 je 37 <inflate_table+0x37> 21: ba 00 00 00 00 mov $0x0,%edx 26: 0f b7 04 53 movzwl (%ebx,%edx,2),%eax 2a: 66 83 44 45 d8 01 addw $0x1,-0x28(%ebp,%eax,2) 30: 83 c2 01 add $0x1,%edx 33: 39 ca cmp %ecx,%edx 35: 75 ef jne 26 <inflate_table+0x26> 37: b8 0f 00 00 00 mov $0xf,%eax 3c: 8d 55 d8 lea -0x28(%ebp),%edx 3f: 66 83 3c 42 00 cmpw $0x0,(%edx,%eax,2) 44: 75 05 jne 4b <inflate_table+0x4b> 46: 83 e8 01 sub $0x1,%eax 49: 75 f4 jne 3f <inflate_table+0x3f> 4b: 83 c4 24 add $0x24,%esp 4e: 5b pop %ebx 4f: 5d pop %ebp 50: c3 ret And without the vectorizer: inftrees.lo: file format elf32-i386 Disassembly of section .text: 00000000 <inflate_table>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 83 ec 20 sub $0x20,%esp 7: 8b 5d 0c mov 0xc(%ebp),%ebx a: 8b 4d 10 mov 0x10(%ebp),%ecx d: b8 00 00 00 00 mov $0x0,%eax 12: 8d 55 dc lea -0x24(%ebp),%edx 15: 66 c7 04 42 00 00 movw $0x0,(%edx,%eax,2) 1b: 83 c0 01 add $0x1,%eax 1e: 83 f8 10 cmp $0x10,%eax 21: 75 f2 jne 15 <inflate_table+0x15> 23: 85 c9 test %ecx,%ecx 25: 74 16 je 3d <inflate_table+0x3d> 27: ba 00 00 00 00 mov $0x0,%edx 2c: 0f b7 04 53 movzwl (%ebx,%edx,2),%eax 30: 66 83 44 45 dc 01 addw $0x1,-0x24(%ebp,%eax,2) 36: 83 c2 01 add $0x1,%edx 39: 39 ca cmp %ecx,%edx 3b: 75 ef jne 2c <inflate_table+0x2c> 3d: b8 0f 00 00 00 mov $0xf,%eax 42: 8d 55 dc lea -0x24(%ebp),%edx 45: 66 83 3c 42 00 cmpw $0x0,(%edx,%eax,2) 4a: 75 05 jne 51 <inflate_table+0x51> 4c: 83 e8 01 sub $0x1,%eax 4f: 75 f4 jne 45 <inflate_table+0x45> 51: 83 c4 20 add $0x20,%esp 54: 5b pop %ebx 55: 5d pop %ebp 56: c3 ret
Created attachment 13966 [details] gcc-PR25413-gdb.log
(In reply to comment #13) > David, can you confirm that this PR can now be closed? I'm no seeing the bug any longer when compiling/testing the octagon library. This does not imply, though, that it no longer occurs on other examples.
This looks like an unrelated problem - the vectorizer does not perform loop peeling here so it's not an issue of natural alignment. Lets open a separate PR for this one, unless there's already one open. In the meantime, would you please try this patch?: Index: tree-vectorizer.c =================================================================== *** tree-vectorizer.c (revision 126902) --- tree-vectorizer.c (working copy) *************** vect_can_force_dr_alignment_p (tree decl *** 1527,1533 **** PREFERRED_STACK_BOUNDARY is honored by all translation units. However, until someone implements forced stack alignment, SSE isn't really usable without this. */ ! return (alignment <= PREFERRED_STACK_BOUNDARY); } --- 1527,1533 ---- PREFERRED_STACK_BOUNDARY is honored by all translation units. However, until someone implements forced stack alignment, SSE isn't really usable without this. */ ! return (alignment <= STACK_BOUNDARY); }
Subject: Bug 25413 Author: dorit Date: Wed Jul 25 08:51:12 2007 New Revision: 126904 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126904 Log: 2007-07-25 Dorit Nuzman <dorit@il.ibm.com> Devang Patel <dpatel@apple.com> PR tree-optimization/25413 * targhooks.c (default_builtin_vector_alignment_reachable): New. * targhooks.h (default_builtin_vector_alignment_reachable): New. * tree.h (contains_packed_reference): New. * expr.c (contains_packed_reference): New. * tree-vect-analyze.c (vector_alignment_reachable_p): New. (vect_enhance_data_refs_alignment): Call vector_alignment_reachable_p. * target.h (vector_alignment_reachable): New builtin. * target-def.h (TARGET_VECTOR_ALIGNMENT_REACHABLE): New. * config/rs6000/rs6000.c (rs6000_vector_alignment_reachable): New. (TARGET_VECTOR_ALIGNMENT_REACHABLE): Define. 2007-07-25 Dorit Nuzman <dorit@il.ibm.com> Devang Patel <dpatel@apple.com> Uros Bizjak <ubizjak@gmail.com> PR tree-optimization/25413 * lib/target-supports.exp (check_effective_target_vect_aligned_arrays): New procedure to check if arrays are naturally aligned to the vector alignment boundary. * gcc.dg/vect/vect-align-1.c: New. * gcc.dg/vect/vect-align-2.c: New. * gcc.dg/vect/pr25413.c: New. * gcc.dg/vect/pr25413a.c: New. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/pr25413.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/pr25413a.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/vect-align-1.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/vect-align-2.c Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/config/rs6000/rs6000.c branches/gcc-4_2-branch/gcc/expr.c branches/gcc-4_2-branch/gcc/target-def.h branches/gcc-4_2-branch/gcc/target.h branches/gcc-4_2-branch/gcc/targhooks.c branches/gcc-4_2-branch/gcc/targhooks.h branches/gcc-4_2-branch/gcc/testsuite/ChangeLog branches/gcc-4_2-branch/gcc/testsuite/lib/target-supports.exp branches/gcc-4_2-branch/gcc/tree-vect-analyze.c branches/gcc-4_2-branch/gcc/tree.h
problem fixed.
Subject: Re: wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE On 25 Jul 2007 08:40:09 -0000, dorit at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote > In the meantime, would you please try this patch?: Of course after my patch for PR 16660, the patch here should be changed to just return true always. Thanks, Andrew Pinski
> Of course after my patch for PR 16660, the patch here should be > changed to just return true always. In this case, Ryan, could you please also try to see if Andrew's patch (http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00177.html) fixes the problem?
(In reply to comment #17) > Lets open a separate PR for this one This is now PR 32893.
*** Bug 33958 has been marked as a duplicate of this bug. ***