The following short testcase gets vectorized with 4.1.1 and doesn't with 4.2.0 revision 114610 ============================================================ template <class T> class vec { public: vec(unsigned int n) : size_(n) { data_ = new T[n]; } vec& multiply(const vec& other) { const T* op=other.data_; for (unsigned int i=0; i<size_; ++i) { data_[i] *= op[i]; } return *this; } private: unsigned int size_; T* data_; }; template class vec<float>; ============================================================ /usr/local/4.2/bin/g++4.2.0 -O3 -ftree-vectorize -ftree-vectorizer-verbose=7 -march=pentium-m -c vectorizer.cpp vectorizer.cpp:16: note: ===== analyze_loop_nest ===== vectorizer.cpp:16: note: === vect_analyze_loop_form === vectorizer.cpp:16: note: split exit edge. vectorizer.cpp:16: note: === get_loop_niters === vectorizer.cpp:16: note: ==> get_loop_niters:D.2376_16 vectorizer.cpp:16: note: Symbolic number of iterations is D.2376_16 vectorizer.cpp:16: note: === vect_analyze_data_refs === vectorizer.cpp:16: note: get vectype with 4 units of type float vectorizer.cpp:16: note: vectype: vector float vectorizer.cpp:16: note: get vectype with 4 units of type const float vectorizer.cpp:16: note: vectype: const vector float vectorizer.cpp:16: note: get vectype with 4 units of type float vectorizer.cpp:16: note: vectype: vector float vectorizer.cpp:16: note: === vect_analyze_scalar_cycles === vectorizer.cpp:16: note: Analyze phi: SMT.6_28 = PHI <SMT.6_27(5), SMT.6_26(3)>; vectorizer.cpp:16: note: virtual phi. skip. vectorizer.cpp:16: note: Analyze phi: i_4 = PHI <i_23(5), 0(3)>; vectorizer.cpp:16: note: Access function of PHI: {0, +, 1}_1 vectorizer.cpp:16: note: step: 1, init: 0 vectorizer.cpp:16: note: Detected induction. vectorizer.cpp:16: note: === vect_pattern_recog === vectorizer.cpp:16: note: === vect_mark_stmts_to_be_vectorized === vectorizer.cpp:16: note: init: phi relevant? SMT.6_28 = PHI <SMT.6_27(5), SMT.6_26(3)>; vectorizer.cpp:16: note: init: phi relevant? i_4 = PHI <i_23(5), 0(3)>; vectorizer.cpp:16: note: init: stmt relevant? <L0>: vectorizer.cpp:16: note: init: stmt relevant? D.2378_9 = pretmp.24_1 vectorizer.cpp:16: note: init: stmt relevant? D.2379_10 = i_4 * 4 vectorizer.cpp:16: note: init: stmt relevant? D.2380_11 = (float *) D.2379_10 vectorizer.cpp:16: note: init: stmt relevant? D.2381_12 = pretmp.24_1 + D.2380_11 vectorizer.cpp:16: note: init: stmt relevant? D.2382_17 = *D.2381_12 vectorizer.cpp:16: note: init: stmt relevant? D.2383_19 = (const float *) D.2379_10 vectorizer.cpp:16: note: init: stmt relevant? D.2384_20 = D.2383_19 + op_3 vectorizer.cpp:16: note: init: stmt relevant? D.2385_21 = *D.2384_20 vectorizer.cpp:16: note: init: stmt relevant? D.2386_22 = D.2382_17 * D.2385_21 vectorizer.cpp:16: note: init: stmt relevant? *D.2381_12 = D.2386_22 vectorizer.cpp:16: note: vec_stmt_relevant_p: stmt has vdefs. vectorizer.cpp:16: note: mark relevant 1, live 0. vectorizer.cpp:16: note: init: stmt relevant? i_23 = i_4 + 1 vectorizer.cpp:16: note: init: stmt relevant? if (D.2376_16 > i_23) goto <L9>; else goto <L12>; vectorizer.cpp:16: note: init: stmt relevant? <L9>: vectorizer.cpp:16: note: worklist: examine stmt: *D.2381_12 = D.2386_22 vectorizer.cpp:16: note: vect_is_simple_use: operand D.2386_22 vectorizer.cpp:16: note: def_stmt: D.2386_22 = D.2382_17 * D.2385_21 vectorizer.cpp:16: note: type of def: 2. vectorizer.cpp:16: note: worklist: examine use 2: D.2386_22 vectorizer.cpp:16: note: mark relevant 1, live 0. vectorizer.cpp:16: note: worklist: examine stmt: D.2386_22 = D.2382_17 * D.2385_21 vectorizer.cpp:16: note: vect_is_simple_use: operand D.2382_17 vectorizer.cpp:16: note: def_stmt: D.2382_17 = *D.2381_12 vectorizer.cpp:16: note: type of def: 2. vectorizer.cpp:16: note: worklist: examine use 2: D.2382_17 vectorizer.cpp:16: note: mark relevant 1, live 0. vectorizer.cpp:16: note: vect_is_simple_use: operand D.2385_21 vectorizer.cpp:16: note: def_stmt: D.2385_21 = *D.2384_20 vectorizer.cpp:16: note: type of def: 2. vectorizer.cpp:16: note: worklist: examine use 2: D.2385_21 vectorizer.cpp:16: note: mark relevant 1, live 0. vectorizer.cpp:16: note: worklist: examine stmt: D.2385_21 = *D.2384_20 vectorizer.cpp:16: note: worklist: examine stmt: D.2382_17 = *D.2381_12 vectorizer.cpp:16: note: === vect_analyze_data_refs_alignment === vectorizer.cpp:16: note: vect_compute_data_ref_alignment: vectorizer.cpp:16: note: Unknown alignment for access: *pretmp.24_1 vectorizer.cpp:16: note: vect_compute_data_ref_alignment: vectorizer.cpp:16: note: Unknown alignment for access: *op_3 vectorizer.cpp:16: note: vect_compute_data_ref_alignment: vectorizer.cpp:16: note: Unknown alignment for access: *pretmp.24_1 vectorizer.cpp:16: note: === vect_determine_vectorization_factor === vectorizer.cpp:16: note: ==> examining statement: <L0>: vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: D.2378_9 = pretmp.24_1 vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: D.2379_10 = i_4 * 4 vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: D.2380_11 = (float *) D.2379_10 vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: D.2381_12 = pretmp.24_1 + D.2380_11 vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: D.2382_17 = *D.2381_12 vectorizer.cpp:16: note: vectype: vector float vectorizer.cpp:16: note: nunits = 4 vectorizer.cpp:16: note: ==> examining statement: D.2383_19 = (const float *) D.2379_10 vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: D.2384_20 = D.2383_19 + op_3 vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: D.2385_21 = *D.2384_20 vectorizer.cpp:16: note: vectype: const vector float vectorizer.cpp:16: note: nunits = 4 vectorizer.cpp:16: note: ==> examining statement: D.2386_22 = D.2382_17 * D.2385_21 vectorizer.cpp:16: note: get vectype for scalar type: float vectorizer.cpp:16: note: get vectype with 4 units of type float vectorizer.cpp:16: note: vectype: vector float vectorizer.cpp:16: note: vectype: vector float vectorizer.cpp:16: note: nunits = 4 vectorizer.cpp:16: note: ==> examining statement: *D.2381_12 = D.2386_22 vectorizer.cpp:16: note: vectype: vector float vectorizer.cpp:16: note: nunits = 4 vectorizer.cpp:16: note: ==> examining statement: i_23 = i_4 + 1 vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: if (D.2376_16 > i_23) goto <L9>; else goto <L12>; vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: ==> examining statement: <L9>: vectorizer.cpp:16: note: skip. vectorizer.cpp:16: note: === vect_analyze_dependences === vectorizer.cpp:16: note: dependence distance = 0. vectorizer.cpp:16: note: accesses have the same alignment. vectorizer.cpp:16: note: dependence distance modulo vf == 0 between *D.2381_12 and *D.2381_12 vectorizer.cpp:16: note: not vectorized: can't determine dependence between *D.2384_20 and *D.2381_12 vectorizer.cpp:16: note: bad data dependence. vectorizer.cpp:16: note: vectorized 0 loops in function. The workaround with "op" is not needed with the current autovect-branch BTW.
Actually I think this is wrong code with 4.1.x.
The code is basicially the same as: void multiply(float *data_, const float *op, unsigned int size_) { for (unsigned int i=0; i<size_; ++i) data_[i] *= op[i]; } And what happens is op is data_ + 3 and size_ is 6, we will get the wrong answer as there will be no feedback in the loop. Anyways this is a 4.1 bug fixed already in 4.2.0
Confirmed. In 4.1, the data-refs have the wrong memtag associated: Created dr for *D.2061_7 base_address: data__6 offset from base address: 0 constant offset from base address: 0 base_object: step: 4B misalignment from base: 0B aligned to: 4 memtag: TMT.5 Created dr for *D.2064_15 base_address: op_14 offset from base address: 0 constant offset from base address: 0 base_object: step: 4B misalignment from base: 0B aligned to: 4 memtag: TMT.6 after ifcvt: # TMT.6_22 = PHI <TMT.6_21(3), TMT.6_20(1)>; # i_2 = PHI <i_18(3), 0(1)>; <L0>:; D.2059_4 = i_2 * 4; D.2060_5 = (float *) D.2059_4; D.2061_7 = D.2060_5 + data__6; # VUSE <TMT.6_22>; D.2062_11 = *D.2061_7; D.2063_13 = (const float *) D.2059_4; D.2064_15 = D.2063_13 + op_14; # VUSE <TMT.6_22>; D.2065_16 = *D.2064_15; D.2066_17 = D.2062_11 * D.2065_16; # TMT.6_21 = V_MAY_DEF <TMT.6_22>; *D.2061_7 = D.2066_17; i_18 = i_2 + 1; if (size__3 > i_18) goto <L8>; else goto <L2>; <L8>:; goto <bb 2> (<L0>); no idea where that TMT.5 comes from (it's from the const qualifier, but the vectorizer makes this up itself).
This looks like a data-ref bug or an aliasing issue. tree-data-ref.c:object_analysis get's for the statement (gdb) call debug_generic_expr (stmt) # VUSE <TMT.8D.2162_28>; D.2129_17 = *D.2128_12 as it calls get_var_ann on D.2128_12: (gdb) print *$23 $24 = {common = {type = VAR_ANN, aux = 0x0, value_handle = 0x0}, out_of_ssa_tag = 0, root_var_processed = 0, mem_tag_kind = NOT_A_TAG, is_alias_tag = 0, used = 0, need_phi_state = NEED_PHI_STATE_MAYBE, in_vuse_list = 0, in_v_may_def_list = 0, type_mem_tag = 0xb7d8c5d8, may_aliases = 0x0, partition = 0, root_index = 0, default_def = 0x0, current_def = 0x0, reference_vars_info = 0x0, subvars = 0x0} (gdb) call debug_generic_expr ($23->type_mem_tag) TMT.7D.2161 so there is a discrepancy between the VUSE (which is correct) and the type_mem_tag on the variable. (const vs. non-const type)
On the mainline we produce Variable: D.1848, UID 1848, float *, symbol memory tag: SMT.4 Variable: D.1851, UID 1851, const float *, symbol memory tag: SMT.4 while 4.1 branch does Variable: D.1604, UID 1604, float *, type memory tag: TMT.5 Variable: D.1607, UID 1607, const float *, type memory tag: TMT.6 anyones bell ringing?
Backporting Author: dberlin Date: Wed Feb 15 22:09:45 2006 New Revision: 111120 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=111120 Log: 2006-02-15 Daniel Berlin <dberlin@dberlin.org> ... * tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READONLY specially here. ... causes this problem to go away. For reference: Index: gcc/tree-ssa-alias.c =================================================================== --- gcc/tree-ssa-alias.c (revision 115613) +++ gcc/tree-ssa-alias.c (working copy) @@ -1818,8 +1818,7 @@ get_tmt_for (tree ptr, struct alias_info { struct alias_map_d *curr = ai->pointers[i]; tree curr_tag = var_ann (curr->var)->type_mem_tag; - if (tag_set == curr->set - && TYPE_READONLY (tag_type) == TYPE_READONLY (TREE_TYPE (curr_tag))) + if (tag_set == curr->set) { tag = curr_tag; break; @@ -1856,10 +1855,6 @@ get_tmt_for (tree ptr, struct alias_info pointed-to type. */ gcc_assert (tag_set == get_alias_set (tag)); - /* If PTR's pointed-to type is read-only, then TAG's type must also - be read-only. */ - gcc_assert (TYPE_READONLY (tag_type) == TYPE_READONLY (TREE_TYPE (tag))); - return tag; } I'm going to bootstrap and test that backport.
Subject: Bug number PR28029 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg00895.html
Subject: Re: [4.1 Regression] wrong optimization with -ftree-vectorize rguenth at gcc dot gnu dot org wrote: > ------- Comment #5 from rguenth at gcc dot gnu dot org 2006-07-21 10:41 ------- > On the mainline we produce > > Variable: D.1848, UID 1848, float *, symbol memory tag: SMT.4 > Variable: D.1851, UID 1851, const float *, symbol memory tag: SMT.4 > > while 4.1 branch does > > Variable: D.1604, UID 1604, float *, type memory tag: TMT.5 > Variable: D.1607, UID 1607, const float *, type memory tag: TMT.6 > > anyones bell ringing? Well, the symbol difference was likely caused by the TYPE_READONLY comparison that ensured that readonly types got their own SMT. However, doing *that* was simply wrong, even though it worked the majority of the time :). Oh, i see you figured that out :) > >
Fixed.
Subject: Bug 28029 Author: rguenth Date: Mon Jul 24 08:18:51 2006 New Revision: 115708 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115708 Log: 2006-07-24 Richard Guenther <rguenther@suse.de> PR tree-optimization/28029 Backport 2006-02-15 Daniel Berlin <dberlin@dberlin.org> * tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READONLY specially here. * gcc.dg/vect/pr28029.c: New testcase. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/vect/pr28029.c Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/testsuite/ChangeLog branches/gcc-4_1-branch/gcc/tree-ssa-alias.c
I ran into this even without the vectorizer :).