Bug 28029 - [4.1 Regression] wrong optimization with -ftree-vectorize
Summary: [4.1 Regression] wrong optimization with -ftree-vectorize
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.1.0
: P3 normal
Target Milestone: 4.1.2
Assignee: Richard Biener
URL:
Keywords: alias, wrong-code
Depends on:
Blocks:
 
Reported: 2006-06-14 13:48 UTC by Peter Doerfler
Modified: 2008-04-03 01:08 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.2.0
Known to fail: 4.1.1
Last reconfirmed: 2006-07-21 11:00:32


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Doerfler 2006-06-14 13:48:03 UTC
The following short testcase gets vectorized with 4.1.1 and doesn't with 4.2.0 revision 114610

============================================================
template <class T>
class vec 
{
  
public:
  vec(unsigned int n) : size_(n) 
    {
      data_ = new T[n];
    }

  vec& multiply(const vec& other)
    {
      
      const T* op=other.data_;
      
      for (unsigned int i=0; i<size_; ++i) {
        data_[i] *= op[i];
      }
      return *this;
    }

private:
  unsigned int size_;
  T* data_;
};

template class vec<float>;
============================================================

/usr/local/4.2/bin/g++4.2.0 -O3 -ftree-vectorize -ftree-vectorizer-verbose=7 -march=pentium-m -c vectorizer.cpp

vectorizer.cpp:16: note: ===== analyze_loop_nest =====
vectorizer.cpp:16: note: === vect_analyze_loop_form ===
vectorizer.cpp:16: note: split exit edge.
vectorizer.cpp:16: note: === get_loop_niters ===
vectorizer.cpp:16: note: ==> get_loop_niters:D.2376_16
vectorizer.cpp:16: note: Symbolic number of iterations is D.2376_16
vectorizer.cpp:16: note: === vect_analyze_data_refs ===
vectorizer.cpp:16: note: get vectype with 4 units of type float
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: get vectype with 4 units of type const float
vectorizer.cpp:16: note: vectype: const vector float
vectorizer.cpp:16: note: get vectype with 4 units of type float
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: === vect_analyze_scalar_cycles ===
vectorizer.cpp:16: note: Analyze phi: SMT.6_28 = PHI <SMT.6_27(5), SMT.6_26(3)>;
vectorizer.cpp:16: note: virtual phi. skip.
vectorizer.cpp:16: note: Analyze phi: i_4 = PHI <i_23(5), 0(3)>;
vectorizer.cpp:16: note: Access function of PHI: {0, +, 1}_1
vectorizer.cpp:16: note: step: 1,  init: 0
vectorizer.cpp:16: note: Detected induction.
vectorizer.cpp:16: note: === vect_pattern_recog ===
vectorizer.cpp:16: note: === vect_mark_stmts_to_be_vectorized ===
vectorizer.cpp:16: note: init: phi relevant? SMT.6_28 = PHI <SMT.6_27(5), SMT.6_26(3)>;
vectorizer.cpp:16: note: init: phi relevant? i_4 = PHI <i_23(5), 0(3)>;
vectorizer.cpp:16: note: init: stmt relevant? <L0>:
vectorizer.cpp:16: note: init: stmt relevant? D.2378_9 = pretmp.24_1
vectorizer.cpp:16: note: init: stmt relevant? D.2379_10 = i_4 * 4
vectorizer.cpp:16: note: init: stmt relevant? D.2380_11 = (float *) D.2379_10
vectorizer.cpp:16: note: init: stmt relevant? D.2381_12 = pretmp.24_1 + D.2380_11
vectorizer.cpp:16: note: init: stmt relevant? D.2382_17 = *D.2381_12
vectorizer.cpp:16: note: init: stmt relevant? D.2383_19 = (const float *) D.2379_10
vectorizer.cpp:16: note: init: stmt relevant? D.2384_20 = D.2383_19 + op_3
vectorizer.cpp:16: note: init: stmt relevant? D.2385_21 = *D.2384_20
vectorizer.cpp:16: note: init: stmt relevant? D.2386_22 = D.2382_17 * D.2385_21
vectorizer.cpp:16: note: init: stmt relevant? *D.2381_12 = D.2386_22
vectorizer.cpp:16: note: vec_stmt_relevant_p: stmt has vdefs.
vectorizer.cpp:16: note: mark relevant 1, live 0.
vectorizer.cpp:16: note: init: stmt relevant? i_23 = i_4 + 1
vectorizer.cpp:16: note: init: stmt relevant? if (D.2376_16 > i_23) goto <L9>; else goto <L12>;
vectorizer.cpp:16: note: init: stmt relevant? <L9>:
vectorizer.cpp:16: note: worklist: examine stmt: *D.2381_12 = D.2386_22
vectorizer.cpp:16: note: vect_is_simple_use: operand D.2386_22
vectorizer.cpp:16: note: def_stmt: D.2386_22 = D.2382_17 * D.2385_21
vectorizer.cpp:16: note: type of def: 2.
vectorizer.cpp:16: note: worklist: examine use 2: D.2386_22
vectorizer.cpp:16: note: mark relevant 1, live 0.
vectorizer.cpp:16: note: worklist: examine stmt: D.2386_22 = D.2382_17 * D.2385_21
vectorizer.cpp:16: note: vect_is_simple_use: operand D.2382_17
vectorizer.cpp:16: note: def_stmt: D.2382_17 = *D.2381_12
vectorizer.cpp:16: note: type of def: 2.
vectorizer.cpp:16: note: worklist: examine use 2: D.2382_17
vectorizer.cpp:16: note: mark relevant 1, live 0.
vectorizer.cpp:16: note: vect_is_simple_use: operand D.2385_21
vectorizer.cpp:16: note: def_stmt: D.2385_21 = *D.2384_20
vectorizer.cpp:16: note: type of def: 2.
vectorizer.cpp:16: note: worklist: examine use 2: D.2385_21
vectorizer.cpp:16: note: mark relevant 1, live 0.
vectorizer.cpp:16: note: worklist: examine stmt: D.2385_21 = *D.2384_20
vectorizer.cpp:16: note: worklist: examine stmt: D.2382_17 = *D.2381_12
vectorizer.cpp:16: note: === vect_analyze_data_refs_alignment ===
vectorizer.cpp:16: note: vect_compute_data_ref_alignment:
vectorizer.cpp:16: note: Unknown alignment for access: *pretmp.24_1
vectorizer.cpp:16: note: vect_compute_data_ref_alignment:
vectorizer.cpp:16: note: Unknown alignment for access: *op_3
vectorizer.cpp:16: note: vect_compute_data_ref_alignment:
vectorizer.cpp:16: note: Unknown alignment for access: *pretmp.24_1
vectorizer.cpp:16: note: === vect_determine_vectorization_factor ===
vectorizer.cpp:16: note: ==> examining statement: <L0>:
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: D.2378_9 = pretmp.24_1
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: D.2379_10 = i_4 * 4
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: D.2380_11 = (float *) D.2379_10
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: D.2381_12 = pretmp.24_1 + D.2380_11
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: D.2382_17 = *D.2381_12
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: nunits = 4
vectorizer.cpp:16: note: ==> examining statement: D.2383_19 = (const float *) D.2379_10
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: D.2384_20 = D.2383_19 + op_3
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: D.2385_21 = *D.2384_20
vectorizer.cpp:16: note: vectype: const vector float
vectorizer.cpp:16: note: nunits = 4
vectorizer.cpp:16: note: ==> examining statement: D.2386_22 = D.2382_17 * D.2385_21
vectorizer.cpp:16: note: get vectype for scalar type:  float
vectorizer.cpp:16: note: get vectype with 4 units of type float
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: nunits = 4
vectorizer.cpp:16: note: ==> examining statement: *D.2381_12 = D.2386_22
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: nunits = 4
vectorizer.cpp:16: note: ==> examining statement: i_23 = i_4 + 1
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: if (D.2376_16 > i_23) goto <L9>; else goto <L12>;
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: ==> examining statement: <L9>:
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: === vect_analyze_dependences ===
vectorizer.cpp:16: note: dependence distance  = 0.
vectorizer.cpp:16: note: accesses have the same alignment.
vectorizer.cpp:16: note: dependence distance modulo vf == 0 between *D.2381_12 and *D.2381_12
vectorizer.cpp:16: note: not vectorized: can't determine dependence between *D.2384_20 and *D.2381_12
vectorizer.cpp:16: note: bad data dependence.
vectorizer.cpp:16: note: vectorized 0 loops in function.

The workaround with "op" is not needed with the current autovect-branch BTW.
Comment 1 Andrew Pinski 2006-06-14 13:55:06 UTC
Actually I think this is wrong code with 4.1.x.
Comment 2 Andrew Pinski 2006-06-14 14:05:02 UTC
The code is basicially the same as:
  void multiply(float *data_, const float *op, unsigned int size_)
    {
      for (unsigned int i=0; i<size_; ++i)
        data_[i] *= op[i];
    }

And what happens is op is data_ + 3 and size_ is 6, we will get the wrong answer as there will be no feedback in the loop.

Anyways this is a 4.1 bug fixed already in 4.2.0
Comment 3 Richard Biener 2006-07-10 13:24:57 UTC
Confirmed.  In 4.1, the data-refs have the wrong memtag associated:

Created dr for *D.2061_7
        base_address: data__6
        offset from base address: 0
        constant offset from base address: 0
        base_object:
        step: 4B
        misalignment from base: 0B
        aligned to: 4
        memtag: TMT.5

Created dr for *D.2064_15
        base_address: op_14
        offset from base address: 0
        constant offset from base address: 0
        base_object:
        step: 4B
        misalignment from base: 0B
        aligned to: 4
        memtag: TMT.6

after ifcvt:

  # TMT.6_22 = PHI <TMT.6_21(3), TMT.6_20(1)>;
  # i_2 = PHI <i_18(3), 0(1)>;
<L0>:;
  D.2059_4 = i_2 * 4;
  D.2060_5 = (float *) D.2059_4;
  D.2061_7 = D.2060_5 + data__6;
  #   VUSE <TMT.6_22>;
  D.2062_11 = *D.2061_7;
  D.2063_13 = (const float *) D.2059_4;
  D.2064_15 = D.2063_13 + op_14;
  #   VUSE <TMT.6_22>;
  D.2065_16 = *D.2064_15;
  D.2066_17 = D.2062_11 * D.2065_16;
  #   TMT.6_21 = V_MAY_DEF <TMT.6_22>;
  *D.2061_7 = D.2066_17;
  i_18 = i_2 + 1;
  if (size__3 > i_18) goto <L8>; else goto <L2>;

<L8>:;
  goto <bb 2> (<L0>);


no idea where that TMT.5 comes from (it's from the const qualifier, but
the vectorizer makes this up itself).
Comment 4 Richard Biener 2006-07-21 10:35:18 UTC
This looks like a data-ref bug or an aliasing issue.  tree-data-ref.c:object_analysis get's for the statement

(gdb) call debug_generic_expr (stmt)
#   VUSE <TMT.8D.2162_28>;
D.2129_17 = *D.2128_12

as it calls get_var_ann on D.2128_12:

(gdb) print *$23
$24 = {common = {type = VAR_ANN, aux = 0x0, value_handle = 0x0}, 
  out_of_ssa_tag = 0, root_var_processed = 0, mem_tag_kind = NOT_A_TAG, 
  is_alias_tag = 0, used = 0, need_phi_state = NEED_PHI_STATE_MAYBE, 
  in_vuse_list = 0, in_v_may_def_list = 0, type_mem_tag = 0xb7d8c5d8, 
  may_aliases = 0x0, partition = 0, root_index = 0, default_def = 0x0, 
  current_def = 0x0, reference_vars_info = 0x0, subvars = 0x0}
(gdb) call debug_generic_expr ($23->type_mem_tag)
TMT.7D.2161

so there is a discrepancy between the VUSE (which is correct) and the type_mem_tag on the variable.  (const vs. non-const type)
Comment 5 Richard Biener 2006-07-21 10:41:54 UTC
On the mainline we produce

Variable: D.1848, UID 1848, float *, symbol memory tag: SMT.4
Variable: D.1851, UID 1851, const float *, symbol memory tag: SMT.4

while 4.1 branch does

Variable: D.1604, UID 1604, float *, type memory tag: TMT.5
Variable: D.1607, UID 1607, const float *, type memory tag: TMT.6

anyones bell ringing?
Comment 6 Richard Biener 2006-07-21 11:00:32 UTC
Backporting

Author: dberlin
Date: Wed Feb 15 22:09:45 2006
New Revision: 111120

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=111120
Log:
2006-02-15 Daniel Berlin  <dberlin@dberlin.org>

...
	* tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READONLY
	specially here.
...

causes this problem to go away.  For reference:

Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c        (revision 115613)
+++ gcc/tree-ssa-alias.c        (working copy)
@@ -1818,8 +1818,7 @@ get_tmt_for (tree ptr, struct alias_info
     {
       struct alias_map_d *curr = ai->pointers[i];
       tree curr_tag = var_ann (curr->var)->type_mem_tag;
-      if (tag_set == curr->set
-         && TYPE_READONLY (tag_type) == TYPE_READONLY (TREE_TYPE (curr_tag)))
+      if (tag_set == curr->set)
        {
          tag = curr_tag;
          break;
@@ -1856,10 +1855,6 @@ get_tmt_for (tree ptr, struct alias_info
      pointed-to type.  */
   gcc_assert (tag_set == get_alias_set (tag));
 
-  /* If PTR's pointed-to type is read-only, then TAG's type must also
-     be read-only.  */
-  gcc_assert (TYPE_READONLY (tag_type) == TYPE_READONLY (TREE_TYPE (tag)));
-
   return tag;
 }


I'm going to bootstrap and test that backport.
Comment 7 patchapp@dberlin.org 2006-07-21 12:25:36 UTC
Subject: Bug number PR28029

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg00895.html
Comment 8 Daniel Berlin 2006-07-22 13:30:54 UTC
Subject: Re:  [4.1 Regression] wrong optimization
 with -ftree-vectorize

rguenth at gcc dot gnu dot org wrote:
> ------- Comment #5 from rguenth at gcc dot gnu dot org  2006-07-21 10:41 -------
> On the mainline we produce
> 
> Variable: D.1848, UID 1848, float *, symbol memory tag: SMT.4
> Variable: D.1851, UID 1851, const float *, symbol memory tag: SMT.4
> 
> while 4.1 branch does
> 
> Variable: D.1604, UID 1604, float *, type memory tag: TMT.5
> Variable: D.1607, UID 1607, const float *, type memory tag: TMT.6
> 
> anyones bell ringing?

Well, the symbol difference was likely caused by the TYPE_READONLY
comparison that ensured that readonly types got their own SMT.
However, doing *that* was simply wrong, even though it worked the
majority of the time :).

Oh, i see you figured that out :)


> 
> 

Comment 9 Richard Biener 2006-07-24 08:19:14 UTC
Fixed.
Comment 10 Richard Biener 2006-07-24 08:19:14 UTC
Subject: Bug 28029

Author: rguenth
Date: Mon Jul 24 08:18:51 2006
New Revision: 115708

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115708
Log:
2006-07-24  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/28029
	Backport
	2006-02-15 Daniel Berlin  <dberlin@dberlin.org>

	* tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READONLY
	specially here.

	* gcc.dg/vect/pr28029.c: New testcase.

Added:
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/vect/pr28029.c
Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_1-branch/gcc/tree-ssa-alias.c

Comment 11 Andrew Pinski 2008-04-03 01:08:42 UTC
I ran into this even without the vectorizer :).