This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Reducing number of alias checks in vectorization.


On Thu, Oct 31, 2013 at 6:27 PM, Cong Hou <congh@google.com> wrote:
> Ping?

Please look in bugzilla in PR53947 dependencies for a bug to reference,
I remember there is at least one covering this.

+/* { dg-options "-O2 -ftree-vectorize
+     --param=vect-max-version-for-alias-checks=2
+     -fdump-tree-vect-details" } */

put options on a single line

Ok with that change.

Thanks for your patience,
Richard.


>
>
> thanks,
> Cong
>
>
> On Wed, Oct 23, 2013 at 1:04 PM, Cong Hou <congh@google.com> wrote:
>> On Wed, Oct 23, 2013 at 5:53 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Oct 22, 2013 at 11:12 PM, Cong Hou <congh@google.com> wrote:
>>>> On Fri, Oct 18, 2013 at 6:09 AM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Tue, Oct 15, 2013 at 12:43 AM, Cong Hou <congh@google.com> wrote:
>>>>>> Sorry for forgetting using plain-text mode. Resend it.
>>>>>>
>>>>>>
>>>>>> ---------- Forwarded message ----------
>>>>>> From: Cong Hou <congh@google.com>
>>>>>> Date: Mon, Oct 14, 2013 at 3:29 PM
>>>>>> Subject: Re: [PATCH] Reducing number of alias checks in vectorization.
>>>>>> To: Richard Biener <rguenther@suse.de>, GCC Patches <gcc-patches@gcc.gnu.org>
>>>>>> Cc: Jakub Jelinek <jakub@redhat.com>
>>>>>>
>>>>>>
>>>>>> I have made a new patch for this issue according to your comments.
>>>>>>
>>>>>> There are several modifications to my previous patch:
>>>>>>
>>>>>>
>>>>>> 1. Remove the use of STL features such as vector and sort. Use GCC's
>>>>>> vec and qsort instead.
>>>>>
>>>>> I think that using <utility> and thus std::pair and std::swap is ok.  Including
>>>>> of <utility> should be done via system.h though.
>>>>>
>>>>
>>>>
>>>> Unfortunately, std::swap is defined in <algorithm>. Can I still use it?
>>>
>>> Hmm, let's defer this issue for now, adding swap and pair templates
>>> locally in GCC is better for now.  That is, I don't want a discussion of
>>> what parts of the standard library to use inside this thread.  You may
>>> want to rise the this issue in a new thread on gcc@gcc.gnu.org.
>>>
>>>>>> 2. Comparisons between tree nodes are not based on their addresses any
>>>>>> more. Use compare_tree() function instead.
>>>>>
>>>>> Ok.
>>>>>
>>>>>> 3. The function vect_create_cond_for_alias_checks() now returns the
>>>>>> number of alias checks. If its second parameter cond_expr is NULL,
>>>>>> then this function only calculate the number of alias checks after the
>>>>>> merging and won't generate comparison expressions.
>>>>>
>>>>> We now perform the merging twice - that's easily avoided by recording
>>>>> the merge result in loop_vinfo alongside may_alias_ddrs (which you
>>>>> should be able to drop as they are already contained in the data
>>>>> structures you build).
>>>>>
>>>>> Which means you can split the function and move the merging
>>>>> part to vect_prune_runtime_alias_test_list.
>>>>
>>>>
>>>> I have added one more field to loop_vec_info to store the dr pairs
>>>> from which alias checks are built. And I also split the functionality
>>>> of vect_create_cond_for_alias_checks() into two functions, so that we
>>>> now perform the merging once instead of twice.
>>>>
>>>>
>>>>>
>>>>>> 4. The function vect_prune_runtime_alias_test_list() now uses
>>>>>> vect_create_cond_for_alias_checks() to get the number of alias checks.
>>>>>>
>>>>>>
>>>>>> The patch is attached as a text file.
>>>>>>
>>>>>> Please give me your comment on this patch. Thank you!
>>>>>
>>>>> +  if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
>>>>> +      || TREE_CODE (dr_a1->offset) != INTEGER_CST
>>>>> +      || TREE_CODE (dr_a2->offset) != INTEGER_CST)
>>>>> +    continue;
>>>>> +
>>>>> +  HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) -
>>>>> + widest_int_cst_value (dr_a1->offset);
>>>>>
>>>>> use !host_integerp (dr_a1->offset) as check and then int_cst_value.
>>>>
>>>>
>>>> Done.
>>>>
>>>>
>>>>>
>>>>> +  if (diff <= vect_factor
>>>>> +      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>>>>> +  && diff - widest_int_cst_value (dr_a1->seg_len) < vect_factor)
>>>>> +      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>>>>> +  && TREE_CODE (dr_b1->seg_len) == INTEGER_CST
>>>>> +  && diff - widest_int_cst_value (dr_a1->seg_len) <
>>>>> +     widest_int_cst_value (dr_b1->seg_len)))
>>>>>
>>>>> can you add a comment above this why it is safe to combine under
>>>>> this condition?  You seem to combine offsets but not data-ref steps
>>>>> for example.  The segment length is computed quite optimistically
>>>>> and you don't seem to adjust that.  That is, generally the alias
>>>>> check, if implemented as simple overlap, would need to intersect
>>>>> [init, init + niter * step]  intervals, losing some non-aliasing cases.
>>>>> The current segment length is optimistically computed and thus
>>>>> can recover some more cases, but you have to be careful with
>>>>> merging to not break its implicit assumptions.
>>>>
>>>>
>>>> I have changed the code and made it more clear as shown below. I still
>>>> use the condition
>>>>
>>>>
>>>> DR_A2->OFFSET - DR_A1->OFFSET - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>>>>
>>>>
>>>> But in case that SEGMENT_LENGTH_A or SEGMENT_LENGTH_B is not constant,
>>>> we can still check the following two conditions:
>>>>
>>>>
>>>> DR_A2->OFFSET - DR_A1->OFFSET  < SEGMENT_LENGTH_B
>>>> (if SEGMENT_LENGTH_A is not a constant).
>>>>
>>>>
>>>> DR_A2->OFFSET - DR_A1->OFFSET - SEGMENT_LENGTH_A < VECT_FACTOR
>>>> (if SEGMENT_LENGTH_B is not a constant, then its minimum value should
>>>> be VECT_FACTOR).
>>>>
>>>>
>>>>
>>>> The corresponding code is:
>>>>
>>>>
>>>>  /* Now we check if the following condition is satisfied:
>>>>
>>>>     DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>>>>
>>>>     where DIFF = DR_A2->OFFSET - DR_A1->OFFSET.  However,
>>>>     SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
>>>>     have to make a best estimation.  We can get the minimum value
>>>>     of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B,
>>>>     then either of the following two conditions can guarantee the
>>>>     one above:
>>>>
>>>>     1: DIFF <= MIN_SEG_LEN_B
>>>>     2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
>>>>
>>>>     */
>>>>
>>>>  HOST_WIDEST_INT
>>>>  min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ?
>>>>      widest_int_cst_value (dr_b1->seg_len) :
>>>>      vect_factor;
>>>>
>>>>  if (diff <= min_seg_len_b
>>>>      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>>>>  && diff - widest_int_cst_value (dr_a1->seg_len) <
>>>>     min_seg_len_b))
>>>>    { ... }
>>>>
>>>>
>>>>
>>>> The new patch is pasted here again. Bootstrapping and regression tests
>>>> are passed.
>>>>
>>>>
>>>> thanks,
>>>> Cong
>>>>
>>>>
>>>>
>>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>>> index 8a38316..7333f6a 100644
>>>> --- a/gcc/ChangeLog
>>>> +++ b/gcc/ChangeLog
>>>> @@ -1,3 +1,12 @@
>>>> +2013-10-14  Cong Hou  <congh@google.com>
>>>> +
>>>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks):
>>>> + Combine alias checks if it is possible to amortize the runtime
>>>> + overhead.  Return the number of alias checks after merging.
>>>> + * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
>>>> + Use the function vect_create_cond_for_alias_checks () to check
>>>> + the number of alias checks.
>>>> +
>>>
>>> Your MUA eats tabs ... ;)
>>
>>
>> That is why I attached a text file ;)
>>
>>
>>>
>>>>  2013-10-14  David Malcolm  <dmalcolm@redhat.com>
>>>>
>>>>   * dumpfile.h (gcc::dump_manager): New class, to hold state
>>>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>>>> index 075d071..ea2782f 100644
>>>> --- a/gcc/testsuite/ChangeLog
>>>> +++ b/gcc/testsuite/ChangeLog
>>>> @@ -1,3 +1,7 @@
>>>> +2013-10-14  Cong Hou  <congh@google.com>
>>>> +
>>>> + * gcc.dg/vect/vect-alias-check.c: New.
>>>> +
>>>>  2013-10-14  Tobias Burnus  <burnus@net-b.de>
>>>>
>>>>   PR fortran/58658
>>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>>>> b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>>>> new file mode 100644
>>>> index 0000000..bf9eb6b
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>>>> @@ -0,0 +1,19 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -ftree-vectorize
>>>> +     --param=vect-max-version-for-alias-checks=2
>>>> +     -fdump-tree-vect-details" } */
>>>> +
>>>> +/* A test case showing three potential alias checks between
>>>> +   a[i] and b[i], b[i+7], b[i+14]. With alias checks merging
>>>> +   enabled, those tree checks can be merged into one, and the
>>>> +   loop will be vectorized with vect-max-version-for-alias-checks=2.  */
>>>> +
>>>> +void foo (int *a, int *b)
>>>> +{
>>>> +  int i;
>>>> +  for (i = 0; i < 1000; ++i)
>>>> +    a[i] = b[i] + b[i+7] + b[i+14];
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>>>> +/* { dg-final { cleanup-tree-dump "vect" } } */
>>>> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>>>> index e7b2f33..5f70d58 100644
>>>> --- a/gcc/tree-vect-data-refs.c
>>>> +++ b/gcc/tree-vect-data-refs.c
>>>> @@ -128,41 +128,6 @@ vect_get_smallest_scalar_type (gimple stmt,
>>>> HOST_WIDE_INT *lhs_size_unit,
>>>>  }
>>>>
>>>>
>>>> -/* Check if data references pointed by DR_I and DR_J are same or
>>>> -   belong to same interleaving group.  Return FALSE if drs are
>>>> -   different, otherwise return TRUE.  */
>>>> -
>>>> -static bool
>>>> -vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j)
>>>> -{
>>>> -  gimple stmt_i = DR_STMT (dr_i);
>>>> -  gimple stmt_j = DR_STMT (dr_j);
>>>> -
>>>> -  if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0)
>>>> -      || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>>>> -    && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))
>>>> -    && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>>>> - == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)))))
>>>> -    return true;
>>>> -  else
>>>> -    return false;
>>>> -}
>>>> -
>>>> -/* If address ranges represented by DDR_I and DDR_J are equal,
>>>> -   return TRUE, otherwise return FALSE.  */
>>>> -
>>>> -static bool
>>>> -vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j)
>>>> -{
>>>> -  if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j))
>>>> -       && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j)))
>>>> -      || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j))
>>>> -  && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j))))
>>>> -    return true;
>>>> -  else
>>>> -    return false;
>>>> -}
>>>> -
>>>>  /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
>>>>     tested at run-time.  Return TRUE if DDR was successfully inserted.
>>>>     Return false if versioning is not supported.  */
>>>> @@ -2647,69 +2612,331 @@ vect_analyze_data_ref_accesses (loop_vec_info
>>>> loop_vinfo, bb_vec_info bb_vinfo)
>>>>    return true;
>>>>  }
>>>>
>>>> +
>>>> +/* Operator == between two dr_addr_with_seg_len objects.
>>>> +
>>>> +   This equality operator is used to make sure two data refs
>>>> +   are the same one so that we will consider to combine the
>>>> +   aliasing checks of those two pairs of data dependent data
>>>> +   refs.  */
>>>> +
>>>> +static bool
>>>> +operator == (const dr_addr_with_seg_len& d1,
>>>> +     const dr_addr_with_seg_len& d2)
>>>> +{
>>>> +  return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
>>>> + && compare_tree (d1.offset, d2.offset) == 0
>>>> + && compare_tree (d1.seg_len, d2.seg_len) == 0;
>>>> +}
>>>> +
>>>> +/* Function comp_dr_addr_with_seg_len_pair.
>>>> +
>>>> +   Comparison function for sorting objects of dr_addr_with_seg_len_pair_t
>>>> +   so that we can combine aliasing checks in one scan.  */
>>>> +
>>>> +static int
>>>> +comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_)
>>>> +{
>>>> +  const dr_addr_with_seg_len_pair_t* p1 =
>>>> +    (const dr_addr_with_seg_len_pair_t *) p1_;
>>>> +  const dr_addr_with_seg_len_pair_t* p2 =
>>>> +    (const dr_addr_with_seg_len_pair_t *) p2_;
>>>> +
>>>> +  const dr_addr_with_seg_len &p11 = p1->first,
>>>> +     &p12 = p1->second,
>>>> +     &p21 = p2->first,
>>>> +     &p22 = p2->second;
>>>> +
>>>> +  int comp_res = compare_tree (p11.basic_addr, p21.basic_addr);
>>>> +  if (comp_res != 0)
>>>> +    return comp_res;
>>>> +
>>>> +  comp_res = compare_tree (p12.basic_addr, p22.basic_addr);
>>>> +  if (comp_res != 0)
>>>> +    return comp_res;
>>>> +
>>>> +  if (TREE_CODE (p11.offset) != INTEGER_CST
>>>> +      || TREE_CODE (p21.offset) != INTEGER_CST)
>>>> +    {
>>>> +      comp_res = compare_tree (p11.offset, p21.offset);
>>>> +      if (comp_res != 0)
>>>> + return comp_res;
>>>> +    }
>>>> +  if (tree_int_cst_compare (p11.offset, p21.offset) < 0)
>>>> +    return -1;
>>>> +  if (tree_int_cst_compare (p11.offset, p21.offset) > 0)
>>>> +    return 1;
>>>> +  if (TREE_CODE (p12.offset) != INTEGER_CST
>>>> +      || TREE_CODE (p22.offset) != INTEGER_CST)
>>>> +    {
>>>> +      comp_res = compare_tree (p12.offset, p22.offset);
>>>> +      if (comp_res != 0)
>>>> + return comp_res;
>>>> +    }
>>>> +  if (tree_int_cst_compare (p12.offset, p22.offset) < 0)
>>>> +    return -1;
>>>> +  if (tree_int_cst_compare (p12.offset, p22.offset) > 0)
>>>> +    return 1;
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +template <class T> static void
>>>> +swap (T& a, T& b)
>>>> +{
>>>> +  T c (a);
>>>> +  a = b;
>>>> +  b = c;
>>>> +}
>>>> +
>>>> +/* Function vect_vfa_segment_size.
>>>> +
>>>> +   Create an expression that computes the size of segment
>>>> +   that will be accessed for a data reference.  The functions takes into
>>>> +   account that realignment loads may access one more vector.
>>>> +
>>>> +   Input:
>>>> +     DR: The data reference.
>>>> +     LENGTH_FACTOR: segment length to consider.
>>>> +
>>>> +   Return an expression whose value is the size of segment which will be
>>>> +   accessed by DR.  */
>>>> +
>>>> +static tree
>>>> +vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>>>> +{
>>>> +  tree segment_length;
>>>> +
>>>> +  if (integer_zerop (DR_STEP (dr)))
>>>> +    segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
>>>> +  else
>>>> +    segment_length = size_binop (MULT_EXPR,
>>>> +                                 fold_convert (sizetype, DR_STEP (dr)),
>>>> +                                 fold_convert (sizetype, length_factor));
>>>> +
>>>> +  if (vect_supportable_dr_alignment (dr, false)
>>>> +        == dr_explicit_realign_optimized)
>>>> +    {
>>>> +      tree vector_size = TYPE_SIZE_UNIT
>>>> +  (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
>>>> +
>>>> +      segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
>>>> +    }
>>>> +  return segment_length;
>>>> +}
>>>> +
>>>>  /* Function vect_prune_runtime_alias_test_list.
>>>>
>>>>     Prune a list of ddrs to be tested at run-time by versioning for alias.
>>>> +   Merge several alias checks into one if possible.
>>>>     Return FALSE if resulting list of ddrs is longer then allowed by
>>>>     PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, otherwise return TRUE.  */
>>>>
>>>>  bool
>>>>  vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
>>>>  {
>>>> -  vec<ddr_p>  ddrs =
>>>> +  vec<ddr_p> may_alias_ddrs =
>>>>      LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
>>>> -  unsigned i, j;
>>>> +  vec<dr_addr_with_seg_len_pair_t>& comp_alias_ddrs =
>>>> +    LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
>>>> +  int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>>> +  tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
>>>> +
>>>> +  ddr_p ddr;
>>>> +  unsigned int i;
>>>> +  tree length_factor;
>>>>
>>>>    if (dump_enabled_p ())
>>>>      dump_printf_loc (MSG_NOTE, vect_location,
>>>>                       "=== vect_prune_runtime_alias_test_list ===\n");
>>>>
>>>> -  for (i = 0; i < ddrs.length (); )
>>>> +  if (may_alias_ddrs.is_empty ())
>>>> +    return true;
>>>> +
>>>> +  /* Basically, for each pair of dependent data refs store_ptr_0
>>>> +     and load_ptr_0, we create an expression:
>>>> +
>>>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>>> +     || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
>>>> +
>>>> +     for aliasing checks.  However, in some cases we can decrease
>>>> +     the number of checks by combining two checks into one.  For
>>>> +     example, suppose we have another pair of data refs store_ptr_0
>>>> +     and load_ptr_1, and if the following condition is satisfied:
>>>> +
>>>> +     load_ptr_0 < load_ptr_1  &&
>>>> +     load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
>>>> +
>>>> +     (this condition means, in each iteration of vectorized loop,
>>>> +     the accessed memory of store_ptr_0 cannot be between the memory
>>>> +     of load_ptr_0 and load_ptr_1.)
>>>> +
>>>> +     we then can use only the following expression to finish the
>>>> +     alising checks between store_ptr_0 & load_ptr_0 and
>>>> +     store_ptr_0 & load_ptr_1:
>>>> +
>>>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>>> +     || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
>>>> +
>>>> +     Note that we only consider that load_ptr_0 and load_ptr_1 have the
>>>> +     same basic address.  */
>>>> +
>>>> +  comp_alias_ddrs.create (may_alias_ddrs.length ());
>>>> +
>>>> +  /* First, we collect all data ref pairs for aliasing checks.  */
>>>> +  FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>>>>      {
>>>> -      bool found;
>>>> -      ddr_p ddr_i;
>>>> +      struct data_reference *dr_a, *dr_b;
>>>> +      gimple dr_group_first_a, dr_group_first_b;
>>>> +      tree segment_length_a, segment_length_b;
>>>> +      gimple stmt_a, stmt_b;
>>>> +
>>>> +      dr_a = DDR_A (ddr);
>>>> +      stmt_a = DR_STMT (DDR_A (ddr));
>>>> +      dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>>>> +      if (dr_group_first_a)
>>>> + {
>>>> +  stmt_a = dr_group_first_a;
>>>> +  dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>>>> + }
>>>>
>>>> -      ddr_i = ddrs[i];
>>>> -      found = false;
>>>> +      dr_b = DDR_B (ddr);
>>>> +      stmt_b = DR_STMT (DDR_B (ddr));
>>>> +      dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>>>> +      if (dr_group_first_b)
>>>> + {
>>>> +  stmt_b = dr_group_first_b;
>>>> +  dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>>>> + }
>>>>
>>>> -      for (j = 0; j < i; j++)
>>>> -        {
>>>> -  ddr_p ddr_j = ddrs[j];
>>>> +      if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>>>> + length_factor = scalar_loop_iters;
>>>> +      else
>>>> + length_factor = size_int (vect_factor);
>>>
>>> ^^^^
>>>
>>> this is the thing I am worried about.  When the steps are equal we
>>> build reduced size segments to handle overlaps with a dependence
>>> distance bigger than vect_factor.
>>
>>
>> I don't quite understand here. Could you give me an example?
>>
>>
>>>
>>>> +      segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>>>> +      segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>>>> +
>>>> +      dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
>>>> +  (dr_addr_with_seg_len
>>>> +       (dr_a, DR_BASE_ADDRESS (dr_a),
>>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
>>>> + segment_length_a),
>>>> +   dr_addr_with_seg_len
>>>> +       (dr_b, DR_BASE_ADDRESS (dr_b),
>>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
>>>> + segment_length_b));
>>>> +
>>>> +      if (compare_tree (dr_with_seg_len_pair.first.basic_addr,
>>>> + dr_with_seg_len_pair.second.basic_addr) > 0)
>>>> + swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
>>>> +
>>>> +      comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
>>>> +    }
>>>>
>>>> -  if (vect_vfa_range_equal (ddr_i, ddr_j))
>>>> +  /* Second, we sort the collected data ref pairs so that we can scan
>>>> +     them once to combine all possible aliasing checks.  */
>>>> +  comp_alias_ddrs.qsort (comp_dr_addr_with_seg_len_pair);
>>>> +
>>>> +  /* Third, we scan the sorted dr pairs and check if we can combine
>>>> +     alias checks of two neighbouring dr pairs.  */
>>>> +  for (size_t i = 1; i < comp_alias_ddrs.length (); ++i)
>>>> +    {
>>>> +      /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2).  */
>>>> +      dr_addr_with_seg_len *dr_a1 = &comp_alias_ddrs[i-1].first,
>>>> +   *dr_b1 = &comp_alias_ddrs[i-1].second,
>>>> +   *dr_a2 = &comp_alias_ddrs[i].first,
>>>> +   *dr_b2 = &comp_alias_ddrs[i].second;
>>>> +
>>>> +      /* Remove duplicate data ref pairs.  */
>>>> +      if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2)
>>>> + {
>>>> +  if (dump_enabled_p ())
>>>> +    {
>>>> +      dump_printf_loc (MSG_NOTE, vect_location,
>>>> +       "found equal ranges ");
>>>> +      dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>>> + DR_REF (dr_a1->dr));
>>>> +      dump_printf (MSG_NOTE,  ", ");
>>>> +      dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>>> + DR_REF (dr_b1->dr));
>>>> +      dump_printf (MSG_NOTE,  " and ");
>>>> +      dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>>> + DR_REF (dr_a2->dr));
>>>> +      dump_printf (MSG_NOTE,  ", ");
>>>> +      dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>>> + DR_REF (dr_b2->dr));
>>>> +      dump_printf (MSG_NOTE, "\n");
>>>> +    }
>>>> +
>>>> +  comp_alias_ddrs.ordered_remove (i--);
>>>> +  continue;
>>>> + }
>>>> +
>>>> +      if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2)
>>>> + {
>>>> +  /* We consider the case that DR_B1 and DR_B2 are same memrefs,
>>>> +     and DR_A1 and DR_A2 are two consecutive memrefs.  */
>>>> +  if (*dr_a1 == *dr_a2)
>>>> +    {
>>>> +      swap (dr_a1, dr_b1);
>>>> +      swap (dr_a2, dr_b2);
>>>> +    }
>>>> +
>>>> +  if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
>>>> +      || !host_integerp (dr_a1->offset, 0)
>>>> +      || !host_integerp (dr_a2->offset, 0))
>>>> +    continue;
>>>> +
>>>> +  HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) -
>>>> + widest_int_cst_value (dr_a1->offset);
>>>> +
>>>> +
>>>> +  /* Now we check if the following condition is satisfied:
>>>> +
>>>> +     DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>>>> +
>>>> +     where DIFF = DR_A2->OFFSET - DR_A1->OFFSET.  However,
>>>> +     SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
>>>> +     have to make a best estimation.  We can get the minimum value
>>>> +     of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B,
>>>> +     then either of the following two conditions can guarantee the
>>>> +     one above:
>>>> +
>>>> +     1: DIFF <= MIN_SEG_LEN_B
>>>> +     2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
>>>> +
>>>> +     */
>>>> +
>>>> +  HOST_WIDEST_INT
>>>> +  min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ?
>>>> +      widest_int_cst_value (dr_b1->seg_len) :
>>>
>>> don't use widest_int_cst_value but tree_int_cst_low () if you previously
>>> checked host_integerp.
>>
>>
>> OK. Done.
>>
>>
>>>
>>>> +      vect_factor;
>>>> +
>>>> +  if (diff <= min_seg_len_b
>>>> +      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>>>> +  && diff - widest_int_cst_value (dr_a1->seg_len) <
>>>> +     min_seg_len_b))
>>>>      {
>>>>        if (dump_enabled_p ())
>>>>   {
>>>> -  dump_printf_loc (MSG_NOTE, vect_location,
>>>> -                   "found equal ranges ");
>>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>>> -                     DR_REF (DDR_A (ddr_i)));
>>>> -  dump_printf (MSG_NOTE,  ", ");
>>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>>> -                     DR_REF (DDR_B (ddr_i)));
>>>> -  dump_printf (MSG_NOTE,  " and ");
>>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>>> -                     DR_REF (DDR_A (ddr_j)));
>>>> -  dump_printf (MSG_NOTE,  ", ");
>>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>>> -                     DR_REF (DDR_B (ddr_j)));
>>>> +  dump_printf_loc
>>>> +      (MSG_NOTE, vect_location,
>>>> +       "merging two runtime checks for data references ");
>>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr));
>>>> +  dump_printf (MSG_NOTE, " and ");
>>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr));
>>>>    dump_printf (MSG_NOTE, "\n");
>>>>   }
>>>> -      found = true;
>>>> -      break;
>>>> +
>>>> +      dr_a1->seg_len = size_binop (PLUS_EXPR,
>>>> +   dr_a2->seg_len, size_int (diff));
>>>> +      comp_alias_ddrs.ordered_remove (i--);
>>>>      }
>>>>   }
>>>> -
>>>> -      if (found)
>>>> -      {
>>>> - ddrs.ordered_remove (i);
>>>> - continue;
>>>> -      }
>>>> -      i++;
>>>>      }
>>>>
>>>> -  if (ddrs.length () >
>>>> -       (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
>>>> +  if ((int) comp_alias_ddrs.length () >
>>>> +      PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
>>>>      {
>>>>        if (dump_enabled_p ())
>>>>   {
>>>> @@ -2718,8 +2945,6 @@ vect_prune_runtime_alias_test_list
>>>> (loop_vec_info loop_vinfo)
>>>>                     "generated checks exceeded.\n");
>>>>   }
>>>>
>>>> -      LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0);
>>>> -
>>>>        return false;
>>>>      }
>>>>
>>>> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
>>>> index 574446a..1e03853 100644
>>>> --- a/gcc/tree-vect-loop-manip.c
>>>> +++ b/gcc/tree-vect-loop-manip.c
>>>> @@ -2211,44 +2211,6 @@ vect_create_cond_for_align_checks
>>>> (loop_vec_info loop_vinfo,
>>>>      *cond_expr = part_cond_expr;
>>>>  }
>>>>
>>>> -
>>>> -/* Function vect_vfa_segment_size.
>>>> -
>>>> -   Create an expression that computes the size of segment
>>>> -   that will be accessed for a data reference.  The functions takes into
>>>> -   account that realignment loads may access one more vector.
>>>> -
>>>> -   Input:
>>>> -     DR: The data reference.
>>>> -     LENGTH_FACTOR: segment length to consider.
>>>> -
>>>> -   Return an expression whose value is the size of segment which will be
>>>> -   accessed by DR.  */
>>>> -
>>>> -static tree
>>>> -vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>>>> -{
>>>> -  tree segment_length;
>>>> -
>>>> -  if (integer_zerop (DR_STEP (dr)))
>>>> -    segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
>>>> -  else
>>>> -    segment_length = size_binop (MULT_EXPR,
>>>> -                                 fold_convert (sizetype, DR_STEP (dr)),
>>>> -                                 fold_convert (sizetype, length_factor));
>>>> -
>>>> -  if (vect_supportable_dr_alignment (dr, false)
>>>> -        == dr_explicit_realign_optimized)
>>>> -    {
>>>> -      tree vector_size = TYPE_SIZE_UNIT
>>>> -  (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
>>>> -
>>>> -      segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
>>>> -    }
>>>> -  return segment_length;
>>>> -}
>>>> -
>>>> -
>>>>  /* Function vect_create_cond_for_alias_checks.
>>>>
>>>>     Create a conditional expression that represents the run-time checks for
>>>> @@ -2257,28 +2219,24 @@ vect_vfa_segment_size (struct data_reference
>>>> *dr, tree length_factor)
>>>>
>>>>     Input:
>>>>     COND_EXPR  - input conditional expression.  New conditions will be chained
>>>> -                with logical AND operation.
>>>> +                with logical AND operation.  If it is NULL, then the function
>>>> +                is used to return the number of alias checks.
>>>>     LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs
>>>>          to be checked.
>>>>
>>>>     Output:
>>>>     COND_EXPR - conditional expression.
>>>>
>>>> -   The returned value is the conditional expression to be used in the if
>>>> +   The returned COND_EXPR is the conditional expression to be used in the if
>>>>     statement that controls which version of the loop gets executed at runtime.
>>>>  */
>>>>
>>>> -static void
>>>> +void
>>>>  vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
>>>>  {
>>>> -  vec<ddr_p>  may_alias_ddrs =
>>>> -    LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
>>>> -  int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>>> -  tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
>>>> -
>>>> -  ddr_p ddr;
>>>> -  unsigned int i;
>>>> -  tree part_cond_expr, length_factor;
>>>> +  vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs =
>>>> +    LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
>>>> +  tree part_cond_expr;
>>>>
>>>>    /* Create expression
>>>>       ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>>> @@ -2289,70 +2247,39 @@ vect_create_cond_for_alias_checks
>>>> (loop_vec_info loop_vinfo, tree * cond_expr)
>>>>       ((store_ptr_n + store_segment_length_n) <= load_ptr_n)
>>>>       || (load_ptr_n + load_segment_length_n) <= store_ptr_n))  */
>>>>
>>>> -  if (may_alias_ddrs.is_empty ())
>>>> +  if (comp_alias_ddrs.is_empty ())
>>>>      return;
>>>>
>>>> -  FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>>>> +  for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i)
>>>>      {
>>>> -      struct data_reference *dr_a, *dr_b;
>>>> -      gimple dr_group_first_a, dr_group_first_b;
>>>> -      tree addr_base_a, addr_base_b;
>>>> -      tree segment_length_a, segment_length_b;
>>>> -      gimple stmt_a, stmt_b;
>>>> -      tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
>>>> -
>>>> -      dr_a = DDR_A (ddr);
>>>> -      stmt_a = DR_STMT (DDR_A (ddr));
>>>> -      dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>>>> -      if (dr_group_first_a)
>>>> -        {
>>>> -  stmt_a = dr_group_first_a;
>>>> -  dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>>>> - }
>>>> -
>>>> -      dr_b = DDR_B (ddr);
>>>> -      stmt_b = DR_STMT (DDR_B (ddr));
>>>> -      dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>>>> -      if (dr_group_first_b)
>>>> -        {
>>>> -  stmt_b = dr_group_first_b;
>>>> -  dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>>>> - }
>>>> +      const dr_addr_with_seg_len& dr_a = comp_alias_ddrs[i].first;
>>>> +      const dr_addr_with_seg_len& dr_b = comp_alias_ddrs[i].second;
>>>> +      tree segment_length_a = dr_a.seg_len;
>>>> +      tree segment_length_b = dr_b.seg_len;
>>>>
>>>> -      addr_base_a
>>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
>>>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
>>>> -       DR_INIT (dr_a)));
>>>> -      addr_base_b
>>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
>>>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
>>>> -       DR_INIT (dr_b)));
>>>> -
>>>> -      if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>>>> - length_factor = scalar_loop_iters;
>>>> -      else
>>>> - length_factor = size_int (vect_factor);
>>>> -      segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>>>> -      segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>>>> +      tree addr_base_a
>>>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
>>>> +      tree addr_base_b
>>>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
>>>>
>>>>        if (dump_enabled_p ())
>>>>   {
>>>>    dump_printf_loc (MSG_NOTE, vect_location,
>>>> -                           "create runtime check for data references ");
>>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
>>>> +   "create runtime check for data references ");
>>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
>>>>    dump_printf (MSG_NOTE, " and ");
>>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
>>>> -          dump_printf (MSG_NOTE, "\n");
>>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
>>>> +  dump_printf (MSG_NOTE, "\n");
>>>>   }
>>>>
>>>> -      seg_a_min = addr_base_a;
>>>> -      seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>>>> -      if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
>>>> +      tree seg_a_min = addr_base_a;
>>>> +      tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>>>> +      if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
>>>>   seg_a_min = seg_a_max, seg_a_max = addr_base_a;
>>>>
>>>> -      seg_b_min = addr_base_b;
>>>> -      seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>>>> -      if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
>>>> +      tree seg_b_min = addr_base_b;
>>>> +      tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>>>> +      if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
>>>>   seg_b_min = seg_b_max, seg_b_max = addr_base_b;
>>>>
>>>>        part_cond_expr =
>>>> @@ -2370,7 +2297,9 @@ vect_create_cond_for_alias_checks (loop_vec_info
>>>> loop_vinfo, tree * cond_expr)
>>>>    if (dump_enabled_p ())
>>>>      dump_printf_loc (MSG_NOTE, vect_location,
>>>>       "created %u versioning for alias checks.\n",
>>>> -     may_alias_ddrs.length ());
>>>> +     comp_alias_ddrs.length ());
>>>> +
>>>> +  comp_alias_ddrs.release ();
>>>>  }
>>>>
>>>>
>>>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>>>> index 8b7b345..f231b62 100644
>>>> --- a/gcc/tree-vectorizer.h
>>>> +++ b/gcc/tree-vectorizer.h
>>>> @@ -175,6 +175,36 @@ typedef struct _slp_oprnd_info
>>>>
>>>>
>>>>
>>>> +/* This struct is used to store the information of a data reference,
>>>> +   including the data ref itself, its basic address, the access offset
>>>> +   and the segment length for aliasing checks.  This is used to generate
>>>> +   alias checks.  */
>>>> +
>>>> +struct dr_addr_with_seg_len
>>>> +{
>>>> +  dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
>>>> +    : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
>>>> +
>>>> +  data_reference *dr;
>>>> +  tree basic_addr;
>>>> +  tree offset;
>>>> +  tree seg_len;
>>>> +};
>>>> +
>>>> +/* This struct contains two dr_addr_with_seg_len objects with aliasing data
>>>> +   refs.  Two comparisons are generated from them.  */
>>>> +
>>>> +struct dr_addr_with_seg_len_pair_t
>>>> +{
>>>> +  dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1,
>>>> +       const dr_addr_with_seg_len& d2)
>>>> +    : first (d1), second (d2) {}
>>>> +
>>>> +  dr_addr_with_seg_len first;
>>>> +  dr_addr_with_seg_len second;
>>>> +};
>>>> +
>>>> +
>>>>  typedef struct _vect_peel_info
>>>>  {
>>>>    int npeel;
>>>> @@ -274,6 +304,10 @@ typedef struct _loop_vec_info {
>>>>       for a run-time aliasing check.  */
>>>>    vec<ddr_p> may_alias_ddrs;
>>>>
>>>> +  /* Data Dependence Relations defining address ranges together with segment
>>>> +     lengths from which the run-time aliasing check is built.  */
>>>> +  vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs;
>>>> +
>>>
>>> Are you sure you release those on all paths that exit the vectorizer?
>>
>>
>> You are right. This may not a good idea. I have removed this dump info.
>>
>>
>> Thank you for you patient comment again! The updated patch is pasted below.
>>
>>
>> thanks,
>> Cong
>>
>>
>>
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 8a38316..7333f6a 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,12 @@
>> +2013-10-14  Cong Hou  <congh@google.com>
>> +
>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks):
>> + Combine alias checks if it is possible to amortize the runtime
>> + overhead.  Return the number of alias checks after merging.
>> + * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
>> + Use the function vect_create_cond_for_alias_checks () to check
>> + the number of alias checks.
>> +
>>  2013-10-14  David Malcolm  <dmalcolm@redhat.com>
>>
>>   * dumpfile.h (gcc::dump_manager): New class, to hold state
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index 075d071..ea2782f 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,7 @@
>> +2013-10-14  Cong Hou  <congh@google.com>
>> +
>> + * gcc.dg/vect/vect-alias-check.c: New.
>> +
>>  2013-10-14  Tobias Burnus  <burnus@net-b.de>
>>
>>   PR fortran/58658
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>> b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>> new file mode 100644
>> index 0000000..bf9eb6b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ftree-vectorize
>> +     --param=vect-max-version-for-alias-checks=2
>> +     -fdump-tree-vect-details" } */
>> +
>> +/* A test case showing three potential alias checks between
>> +   a[i] and b[i], b[i+7], b[i+14]. With alias checks merging
>> +   enabled, those tree checks can be merged into one, and the
>> +   loop will be vectorized with vect-max-version-for-alias-checks=2.  */
>> +
>> +void foo (int *a, int *b)
>> +{
>> +  int i;
>> +  for (i = 0; i < 1000; ++i)
>> +    a[i] = b[i] + b[i+7] + b[i+14];
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>> +/* { dg-final { cleanup-tree-dump "vect" } } */
>> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> index e7b2f33..ee6e6d0 100644
>> --- a/gcc/tree-vect-data-refs.c
>> +++ b/gcc/tree-vect-data-refs.c
>> @@ -128,41 +128,6 @@ vect_get_smallest_scalar_type (gimple stmt,
>> HOST_WIDE_INT *lhs_size_unit,
>>  }
>>
>>
>> -/* Check if data references pointed by DR_I and DR_J are same or
>> -   belong to same interleaving group.  Return FALSE if drs are
>> -   different, otherwise return TRUE.  */
>> -
>> -static bool
>> -vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j)
>> -{
>> -  gimple stmt_i = DR_STMT (dr_i);
>> -  gimple stmt_j = DR_STMT (dr_j);
>> -
>> -  if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0)
>> -      || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>> -    && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))
>> -    && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>> - == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)))))
>> -    return true;
>> -  else
>> -    return false;
>> -}
>> -
>> -/* If address ranges represented by DDR_I and DDR_J are equal,
>> -   return TRUE, otherwise return FALSE.  */
>> -
>> -static bool
>> -vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j)
>> -{
>> -  if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j))
>> -       && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j)))
>> -      || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j))
>> -  && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j))))
>> -    return true;
>> -  else
>> -    return false;
>> -}
>> -
>>  /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
>>     tested at run-time.  Return TRUE if DDR was successfully inserted.
>>     Return false if versioning is not supported.  */
>> @@ -2647,69 +2612,320 @@ vect_analyze_data_ref_accesses (loop_vec_info
>> loop_vinfo, bb_vec_info bb_vinfo)
>>    return true;
>>  }
>>
>> +
>> +/* Operator == between two dr_addr_with_seg_len objects.
>> +
>> +   This equality operator is used to make sure two data refs
>> +   are the same one so that we will consider to combine the
>> +   aliasing checks of those two pairs of data dependent data
>> +   refs.  */
>> +
>> +static bool
>> +operator == (const dr_addr_with_seg_len& d1,
>> +     const dr_addr_with_seg_len& d2)
>> +{
>> +  return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
>> + && compare_tree (d1.offset, d2.offset) == 0
>> + && compare_tree (d1.seg_len, d2.seg_len) == 0;
>> +}
>> +
>> +/* Function comp_dr_addr_with_seg_len_pair.
>> +
>> +   Comparison function for sorting objects of dr_addr_with_seg_len_pair_t
>> +   so that we can combine aliasing checks in one scan.  */
>> +
>> +static int
>> +comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_)
>> +{
>> +  const dr_addr_with_seg_len_pair_t* p1 =
>> +    (const dr_addr_with_seg_len_pair_t *) p1_;
>> +  const dr_addr_with_seg_len_pair_t* p2 =
>> +    (const dr_addr_with_seg_len_pair_t *) p2_;
>> +
>> +  const dr_addr_with_seg_len &p11 = p1->first,
>> +     &p12 = p1->second,
>> +     &p21 = p2->first,
>> +     &p22 = p2->second;
>> +
>> +  int comp_res = compare_tree (p11.basic_addr, p21.basic_addr);
>> +  if (comp_res != 0)
>> +    return comp_res;
>> +
>> +  comp_res = compare_tree (p12.basic_addr, p22.basic_addr);
>> +  if (comp_res != 0)
>> +    return comp_res;
>> +
>> +  if (TREE_CODE (p11.offset) != INTEGER_CST
>> +      || TREE_CODE (p21.offset) != INTEGER_CST)
>> +    {
>> +      comp_res = compare_tree (p11.offset, p21.offset);
>> +      if (comp_res != 0)
>> + return comp_res;
>> +    }
>> +  if (tree_int_cst_compare (p11.offset, p21.offset) < 0)
>> +    return -1;
>> +  if (tree_int_cst_compare (p11.offset, p21.offset) > 0)
>> +    return 1;
>> +  if (TREE_CODE (p12.offset) != INTEGER_CST
>> +      || TREE_CODE (p22.offset) != INTEGER_CST)
>> +    {
>> +      comp_res = compare_tree (p12.offset, p22.offset);
>> +      if (comp_res != 0)
>> + return comp_res;
>> +    }
>> +  if (tree_int_cst_compare (p12.offset, p22.offset) < 0)
>> +    return -1;
>> +  if (tree_int_cst_compare (p12.offset, p22.offset) > 0)
>> +    return 1;
>> +
>> +  return 0;
>> +}
>> +
>> +template <class T> static void
>> +swap (T& a, T& b)
>> +{
>> +  T c (a);
>> +  a = b;
>> +  b = c;
>> +}
>> +
>> +/* Function vect_vfa_segment_size.
>> +
>> +   Create an expression that computes the size of segment
>> +   that will be accessed for a data reference.  The functions takes into
>> +   account that realignment loads may access one more vector.
>> +
>> +   Input:
>> +     DR: The data reference.
>> +     LENGTH_FACTOR: segment length to consider.
>> +
>> +   Return an expression whose value is the size of segment which will be
>> +   accessed by DR.  */
>> +
>> +static tree
>> +vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>> +{
>> +  tree segment_length;
>> +
>> +  if (integer_zerop (DR_STEP (dr)))
>> +    segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
>> +  else
>> +    segment_length = size_binop (MULT_EXPR,
>> +                                 fold_convert (sizetype, DR_STEP (dr)),
>> +                                 fold_convert (sizetype, length_factor));
>> +
>> +  if (vect_supportable_dr_alignment (dr, false)
>> +        == dr_explicit_realign_optimized)
>> +    {
>> +      tree vector_size = TYPE_SIZE_UNIT
>> +  (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
>> +
>> +      segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
>> +    }
>> +  return segment_length;
>> +}
>> +
>>  /* Function vect_prune_runtime_alias_test_list.
>>
>>     Prune a list of ddrs to be tested at run-time by versioning for alias.
>> +   Merge several alias checks into one if possible.
>>     Return FALSE if resulting list of ddrs is longer then allowed by
>>     PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, otherwise return TRUE.  */
>>
>>  bool
>>  vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
>>  {
>> -  vec<ddr_p>  ddrs =
>> +  vec<ddr_p> may_alias_ddrs =
>>      LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
>> -  unsigned i, j;
>> +  vec<dr_addr_with_seg_len_pair_t>& comp_alias_ddrs =
>> +    LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
>> +  int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +  tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
>> +
>> +  ddr_p ddr;
>> +  unsigned int i;
>> +  tree length_factor;
>>
>>    if (dump_enabled_p ())
>>      dump_printf_loc (MSG_NOTE, vect_location,
>>                       "=== vect_prune_runtime_alias_test_list ===\n");
>>
>> -  for (i = 0; i < ddrs.length (); )
>> +  if (may_alias_ddrs.is_empty ())
>> +    return true;
>> +
>> +  /* Basically, for each pair of dependent data refs store_ptr_0
>> +     and load_ptr_0, we create an expression:
>> +
>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>> +     || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
>> +
>> +     for aliasing checks.  However, in some cases we can decrease
>> +     the number of checks by combining two checks into one.  For
>> +     example, suppose we have another pair of data refs store_ptr_0
>> +     and load_ptr_1, and if the following condition is satisfied:
>> +
>> +     load_ptr_0 < load_ptr_1  &&
>> +     load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
>> +
>> +     (this condition means, in each iteration of vectorized loop,
>> +     the accessed memory of store_ptr_0 cannot be between the memory
>> +     of load_ptr_0 and load_ptr_1.)
>> +
>> +     we then can use only the following expression to finish the
>> +     alising checks between store_ptr_0 & load_ptr_0 and
>> +     store_ptr_0 & load_ptr_1:
>> +
>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>> +     || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
>> +
>> +     Note that we only consider that load_ptr_0 and load_ptr_1 have the
>> +     same basic address.  */
>> +
>> +  comp_alias_ddrs.create (may_alias_ddrs.length ());
>> +
>> +  /* First, we collect all data ref pairs for aliasing checks.  */
>> +  FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>>      {
>> -      bool found;
>> -      ddr_p ddr_i;
>> +      struct data_reference *dr_a, *dr_b;
>> +      gimple dr_group_first_a, dr_group_first_b;
>> +      tree segment_length_a, segment_length_b;
>> +      gimple stmt_a, stmt_b;
>> +
>> +      dr_a = DDR_A (ddr);
>> +      stmt_a = DR_STMT (DDR_A (ddr));
>> +      dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>> +      if (dr_group_first_a)
>> + {
>> +  stmt_a = dr_group_first_a;
>> +  dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>> + }
>>
>> -      ddr_i = ddrs[i];
>> -      found = false;
>> +      dr_b = DDR_B (ddr);
>> +      stmt_b = DR_STMT (DDR_B (ddr));
>> +      dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>> +      if (dr_group_first_b)
>> + {
>> +  stmt_b = dr_group_first_b;
>> +  dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>> + }
>>
>> -      for (j = 0; j < i; j++)
>> -        {
>> -  ddr_p ddr_j = ddrs[j];
>> +      if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>> + length_factor = scalar_loop_iters;
>> +      else
>> + length_factor = size_int (vect_factor);
>> +      segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>> +      segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>> +
>> +      dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
>> +  (dr_addr_with_seg_len
>> +       (dr_a, DR_BASE_ADDRESS (dr_a),
>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
>> + segment_length_a),
>> +   dr_addr_with_seg_len
>> +       (dr_b, DR_BASE_ADDRESS (dr_b),
>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
>> + segment_length_b));
>> +
>> +      if (compare_tree (dr_with_seg_len_pair.first.basic_addr,
>> + dr_with_seg_len_pair.second.basic_addr) > 0)
>> + swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
>> +
>> +      comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
>> +    }
>>
>> -  if (vect_vfa_range_equal (ddr_i, ddr_j))
>> +  /* Second, we sort the collected data ref pairs so that we can scan
>> +     them once to combine all possible aliasing checks.  */
>> +  comp_alias_ddrs.qsort (comp_dr_addr_with_seg_len_pair);
>> +
>> +  /* Third, we scan the sorted dr pairs and check if we can combine
>> +     alias checks of two neighbouring dr pairs.  */
>> +  for (size_t i = 1; i < comp_alias_ddrs.length (); ++i)
>> +    {
>> +      /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2).  */
>> +      dr_addr_with_seg_len *dr_a1 = &comp_alias_ddrs[i-1].first,
>> +   *dr_b1 = &comp_alias_ddrs[i-1].second,
>> +   *dr_a2 = &comp_alias_ddrs[i].first,
>> +   *dr_b2 = &comp_alias_ddrs[i].second;
>> +
>> +      /* Remove duplicate data ref pairs.  */
>> +      if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2)
>> + {
>> +  if (dump_enabled_p ())
>>      {
>> -      if (dump_enabled_p ())
>> - {
>> -  dump_printf_loc (MSG_NOTE, vect_location,
>> -                   "found equal ranges ");
>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM,
>> -                     DR_REF (DDR_A (ddr_i)));
>> -  dump_printf (MSG_NOTE,  ", ");
>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM,
>> -                     DR_REF (DDR_B (ddr_i)));
>> -  dump_printf (MSG_NOTE,  " and ");
>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM,
>> -                     DR_REF (DDR_A (ddr_j)));
>> -  dump_printf (MSG_NOTE,  ", ");
>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM,
>> -                     DR_REF (DDR_B (ddr_j)));
>> -  dump_printf (MSG_NOTE, "\n");
>> - }
>> -      found = true;
>> -      break;
>> +      dump_printf_loc (MSG_NOTE, vect_location,
>> +       "found equal ranges ");
>> +      dump_generic_expr (MSG_NOTE, TDF_SLIM,
>> + DR_REF (dr_a1->dr));
>> +      dump_printf (MSG_NOTE,  ", ");
>> +      dump_generic_expr (MSG_NOTE, TDF_SLIM,
>> + DR_REF (dr_b1->dr));
>> +      dump_printf (MSG_NOTE,  " and ");
>> +      dump_generic_expr (MSG_NOTE, TDF_SLIM,
>> + DR_REF (dr_a2->dr));
>> +      dump_printf (MSG_NOTE,  ", ");
>> +      dump_generic_expr (MSG_NOTE, TDF_SLIM,
>> + DR_REF (dr_b2->dr));
>> +      dump_printf (MSG_NOTE, "\n");
>>      }
>> +
>> +  comp_alias_ddrs.ordered_remove (i--);
>> +  continue;
>>   }
>>
>> -      if (found)
>> -      {
>> - ddrs.ordered_remove (i);
>> - continue;
>> -      }
>> -      i++;
>> +      if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2)
>> + {
>> +  /* We consider the case that DR_B1 and DR_B2 are same memrefs,
>> +     and DR_A1 and DR_A2 are two consecutive memrefs.  */
>> +  if (*dr_a1 == *dr_a2)
>> +    {
>> +      swap (dr_a1, dr_b1);
>> +      swap (dr_a2, dr_b2);
>> +    }
>> +
>> +  if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
>> +      || !host_integerp (dr_a1->offset, 0)
>> +      || !host_integerp (dr_a2->offset, 0))
>> +    continue;
>> +
>> +  HOST_WIDE_INT diff = TREE_INT_CST_LOW (dr_a2->offset) -
>> +       TREE_INT_CST_LOW (dr_a1->offset);
>> +
>> +
>> +  /* Now we check if the following condition is satisfied:
>> +
>> +     DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>> +
>> +     where DIFF = DR_A2->OFFSET - DR_A1->OFFSET.  However,
>> +     SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
>> +     have to make a best estimation.  We can get the minimum value
>> +     of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B,
>> +     then either of the following two conditions can guarantee the
>> +     one above:
>> +
>> +     1: DIFF <= MIN_SEG_LEN_B
>> +     2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
>> +
>> +     */
>> +
>> +  HOST_WIDE_INT
>> +  min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ?
>> +      TREE_INT_CST_LOW (dr_b1->seg_len) :
>> +      vect_factor;
>> +
>> +  if (diff <= min_seg_len_b
>> +      || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>> +  && diff - TREE_INT_CST_LOW (dr_a1->seg_len) <
>> +     min_seg_len_b))
>> +    {
>> +      dr_a1->seg_len = size_binop (PLUS_EXPR,
>> +   dr_a2->seg_len, size_int (diff));
>> +      comp_alias_ddrs.ordered_remove (i--);
>> +    }
>> + }
>>      }
>>
>> -  if (ddrs.length () >
>> -       (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
>> +  if ((int) comp_alias_ddrs.length () >
>> +      PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
>>      {
>>        if (dump_enabled_p ())
>>   {
>> @@ -2718,8 +2934,6 @@ vect_prune_runtime_alias_test_list
>> (loop_vec_info loop_vinfo)
>>                     "generated checks exceeded.\n");
>>   }
>>
>> -      LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0);
>> -
>>        return false;
>>      }
>>
>> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
>> index 574446a..1e03853 100644
>> --- a/gcc/tree-vect-loop-manip.c
>> +++ b/gcc/tree-vect-loop-manip.c
>> @@ -2211,44 +2211,6 @@ vect_create_cond_for_align_checks
>> (loop_vec_info loop_vinfo,
>>      *cond_expr = part_cond_expr;
>>  }
>>
>> -
>> -/* Function vect_vfa_segment_size.
>> -
>> -   Create an expression that computes the size of segment
>> -   that will be accessed for a data reference.  The functions takes into
>> -   account that realignment loads may access one more vector.
>> -
>> -   Input:
>> -     DR: The data reference.
>> -     LENGTH_FACTOR: segment length to consider.
>> -
>> -   Return an expression whose value is the size of segment which will be
>> -   accessed by DR.  */
>> -
>> -static tree
>> -vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>> -{
>> -  tree segment_length;
>> -
>> -  if (integer_zerop (DR_STEP (dr)))
>> -    segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
>> -  else
>> -    segment_length = size_binop (MULT_EXPR,
>> -                                 fold_convert (sizetype, DR_STEP (dr)),
>> -                                 fold_convert (sizetype, length_factor));
>> -
>> -  if (vect_supportable_dr_alignment (dr, false)
>> -        == dr_explicit_realign_optimized)
>> -    {
>> -      tree vector_size = TYPE_SIZE_UNIT
>> -  (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
>> -
>> -      segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
>> -    }
>> -  return segment_length;
>> -}
>> -
>> -
>>  /* Function vect_create_cond_for_alias_checks.
>>
>>     Create a conditional expression that represents the run-time checks for
>> @@ -2257,28 +2219,24 @@ vect_vfa_segment_size (struct data_reference
>> *dr, tree length_factor)
>>
>>     Input:
>>     COND_EXPR  - input conditional expression.  New conditions will be chained
>> -                with logical AND operation.
>> +                with logical AND operation.  If it is NULL, then the function
>> +                is used to return the number of alias checks.
>>     LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs
>>          to be checked.
>>
>>     Output:
>>     COND_EXPR - conditional expression.
>>
>> -   The returned value is the conditional expression to be used in the if
>> +   The returned COND_EXPR is the conditional expression to be used in the if
>>     statement that controls which version of the loop gets executed at runtime.
>>  */
>>
>> -static void
>> +void
>>  vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
>>  {
>> -  vec<ddr_p>  may_alias_ddrs =
>> -    LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
>> -  int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> -  tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
>> -
>> -  ddr_p ddr;
>> -  unsigned int i;
>> -  tree part_cond_expr, length_factor;
>> +  vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs =
>> +    LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
>> +  tree part_cond_expr;
>>
>>    /* Create expression
>>       ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>> @@ -2289,70 +2247,39 @@ vect_create_cond_for_alias_checks
>> (loop_vec_info loop_vinfo, tree * cond_expr)
>>       ((store_ptr_n + store_segment_length_n) <= load_ptr_n)
>>       || (load_ptr_n + load_segment_length_n) <= store_ptr_n))  */
>>
>> -  if (may_alias_ddrs.is_empty ())
>> +  if (comp_alias_ddrs.is_empty ())
>>      return;
>>
>> -  FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>> +  for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i)
>>      {
>> -      struct data_reference *dr_a, *dr_b;
>> -      gimple dr_group_first_a, dr_group_first_b;
>> -      tree addr_base_a, addr_base_b;
>> -      tree segment_length_a, segment_length_b;
>> -      gimple stmt_a, stmt_b;
>> -      tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
>> -
>> -      dr_a = DDR_A (ddr);
>> -      stmt_a = DR_STMT (DDR_A (ddr));
>> -      dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>> -      if (dr_group_first_a)
>> -        {
>> -  stmt_a = dr_group_first_a;
>> -  dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>> - }
>> -
>> -      dr_b = DDR_B (ddr);
>> -      stmt_b = DR_STMT (DDR_B (ddr));
>> -      dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>> -      if (dr_group_first_b)
>> -        {
>> -  stmt_b = dr_group_first_b;
>> -  dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>> - }
>> +      const dr_addr_with_seg_len& dr_a = comp_alias_ddrs[i].first;
>> +      const dr_addr_with_seg_len& dr_b = comp_alias_ddrs[i].second;
>> +      tree segment_length_a = dr_a.seg_len;
>> +      tree segment_length_b = dr_b.seg_len;
>>
>> -      addr_base_a
>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
>> -       DR_INIT (dr_a)));
>> -      addr_base_b
>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
>> -       DR_INIT (dr_b)));
>> -
>> -      if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>> - length_factor = scalar_loop_iters;
>> -      else
>> - length_factor = size_int (vect_factor);
>> -      segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>> -      segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>> +      tree addr_base_a
>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
>> +      tree addr_base_b
>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
>>
>>        if (dump_enabled_p ())
>>   {
>>    dump_printf_loc (MSG_NOTE, vect_location,
>> -                           "create runtime check for data references ");
>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
>> +   "create runtime check for data references ");
>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
>>    dump_printf (MSG_NOTE, " and ");
>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
>> -          dump_printf (MSG_NOTE, "\n");
>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
>> +  dump_printf (MSG_NOTE, "\n");
>>   }
>>
>> -      seg_a_min = addr_base_a;
>> -      seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>> -      if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
>> +      tree seg_a_min = addr_base_a;
>> +      tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>> +      if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
>>   seg_a_min = seg_a_max, seg_a_max = addr_base_a;
>>
>> -      seg_b_min = addr_base_b;
>> -      seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>> -      if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
>> +      tree seg_b_min = addr_base_b;
>> +      tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>> +      if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
>>   seg_b_min = seg_b_max, seg_b_max = addr_base_b;
>>
>>        part_cond_expr =
>> @@ -2370,7 +2297,9 @@ vect_create_cond_for_alias_checks (loop_vec_info
>> loop_vinfo, tree * cond_expr)
>>    if (dump_enabled_p ())
>>      dump_printf_loc (MSG_NOTE, vect_location,
>>       "created %u versioning for alias checks.\n",
>> -     may_alias_ddrs.length ());
>> +     comp_alias_ddrs.length ());
>> +
>> +  comp_alias_ddrs.release ();
>>  }
>>
>>
>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>> index 8b7b345..f231b62 100644
>> --- a/gcc/tree-vectorizer.h
>> +++ b/gcc/tree-vectorizer.h
>> @@ -175,6 +175,36 @@ typedef struct _slp_oprnd_info
>>
>>
>>
>> +/* This struct is used to store the information of a data reference,
>> +   including the data ref itself, its basic address, the access offset
>> +   and the segment length for aliasing checks.  This is used to generate
>> +   alias checks.  */
>> +
>> +struct dr_addr_with_seg_len
>> +{
>> +  dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
>> +    : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
>> +
>> +  data_reference *dr;
>> +  tree basic_addr;
>> +  tree offset;
>> +  tree seg_len;
>> +};
>> +
>> +/* This struct contains two dr_addr_with_seg_len objects with aliasing data
>> +   refs.  Two comparisons are generated from them.  */
>> +
>> +struct dr_addr_with_seg_len_pair_t
>> +{
>> +  dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1,
>> +       const dr_addr_with_seg_len& d2)
>> +    : first (d1), second (d2) {}
>> +
>> +  dr_addr_with_seg_len first;
>> +  dr_addr_with_seg_len second;
>> +};
>> +
>> +
>>  typedef struct _vect_peel_info
>>  {
>>    int npeel;
>> @@ -274,6 +304,10 @@ typedef struct _loop_vec_info {
>>       for a run-time aliasing check.  */
>>    vec<ddr_p> may_alias_ddrs;
>>
>> +  /* Data Dependence Relations defining address ranges together with segment
>> +     lengths from which the run-time aliasing check is built.  */
>> +  vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs;
>> +
>>    /* Statements in the loop that have data references that are candidates for a
>>       runtime (loop versioning) misalignment check.  */
>>    vec<gimple> may_misalign_stmts;
>> @@ -336,6 +370,7 @@ typedef struct _loop_vec_info {
>>  #define LOOP_VINFO_MAY_MISALIGN_STMTS(L)   (L)->may_misalign_stmts
>>  #define LOOP_VINFO_LOC(L)                  (L)->loop_line_number
>>  #define LOOP_VINFO_MAY_ALIAS_DDRS(L)       (L)->may_alias_ddrs
>> +#define LOOP_VINFO_COMP_ALIAS_DDRS(L)      (L)->comp_alias_ddrs
>>  #define LOOP_VINFO_GROUPED_STORES(L)       (L)->grouped_stores
>>  #define LOOP_VINFO_SLP_INSTANCES(L)        (L)->slp_instances
>>  #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
>>
>>
>>
>>
>>
>>
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>    /* Statements in the loop that have data references that are candidates for a
>>>>       runtime (loop versioning) misalignment check.  */
>>>>    vec<gimple> may_misalign_stmts;
>>>> @@ -336,6 +370,7 @@ typedef struct _loop_vec_info {
>>>>  #define LOOP_VINFO_MAY_MISALIGN_STMTS(L)   (L)->may_misalign_stmts
>>>>  #define LOOP_VINFO_LOC(L)                  (L)->loop_line_number
>>>>  #define LOOP_VINFO_MAY_ALIAS_DDRS(L)       (L)->may_alias_ddrs
>>>> +#define LOOP_VINFO_COMP_ALIAS_DDRS(L)      (L)->comp_alias_ddrs
>>>>  #define LOOP_VINFO_GROUPED_STORES(L)       (L)->grouped_stores
>>>>  #define LOOP_VINFO_SLP_INSTANCES(L)        (L)->slp_instances
>>>>  #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Cong
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 3, 2013 at 2:35 PM, Cong Hou <congh@google.com> wrote:
>>>>>>>
>>>>>>> Forget about this "aux" idea as the segment length for one data ref
>>>>>>> can be different in different dr pairs.
>>>>>>>
>>>>>>> In my patch I created a struct as shown below:
>>>>>>>
>>>>>>> struct dr_addr_with_seg_len
>>>>>>> {
>>>>>>>   data_reference *dr;
>>>>>>>   tree basic_addr;
>>>>>>>   tree offset;
>>>>>>>   tree seg_len;
>>>>>>> };
>>>>>>>
>>>>>>>
>>>>>>> Note that basic_addr and offset can always obtained from dr, but we
>>>>>>> need to store two segment lengths for each dr pair. It is improper to
>>>>>>> add a field to data_dependence_relation as it is defined outside of
>>>>>>> vectorizer. We can change the type (a new one combining
>>>>>>> data_dependence_relation and segment length) of may_alias_ddrs in
>>>>>>> loop_vec_info to include such information, but we have to add a new
>>>>>>> type to tree-vectorizer.h which is only used in two places - still too
>>>>>>> much.
>>>>>>>
>>>>>>> One possible solution is that we create a local struct as shown above
>>>>>>> and a new function which returns the merged alias check information.
>>>>>>> This function will be called twice: once during analysis phase and
>>>>>>> once in transformation phase. Then we don't have to store the merged
>>>>>>> alias check information during those two phases. The additional time
>>>>>>> cost is minimal as there will not be too many data dependent dr pairs
>>>>>>> in a loop.
>>>>>>>
>>>>>>> Any comment?
>>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>> Cong
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Oct 3, 2013 at 10:57 AM, Cong Hou <congh@google.com> wrote:
>>>>>>> > I noticed that there is a "struct dataref_aux" defined in
>>>>>>> > tree-vectorizer.h which is specific to the vectorizer pass and is
>>>>>>> > stored in (void*)aux in "struct data_reference". Can we add one more
>>>>>>> > field "segment_length" to dataref_aux so that we can pass this
>>>>>>> > information for merging alias checks? Then we can avoid to modify or
>>>>>>> > create other structures.
>>>>>>> >
>>>>>>> >
>>>>>>> > thanks,
>>>>>>> > Cong
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, Oct 2, 2013 at 2:34 PM, Cong Hou <congh@google.com> wrote:
>>>>>>> >> On Wed, Oct 2, 2013 at 4:24 AM, Richard Biener <rguenther@suse.de> wrote:
>>>>>>> >>> On Tue, 1 Oct 2013, Cong Hou wrote:
>>>>>>> >>>
>>>>>>> >>>> When alias exists between data refs in a loop, to vectorize it GCC
>>>>>>> >>>> does loop versioning and adds runtime alias checks. Basically for each
>>>>>>> >>>> pair of data refs with possible data dependence, there will be two
>>>>>>> >>>> comparisons generated to make sure there is no aliasing between them
>>>>>>> >>>> in each iteration of the vectorized loop. If there are many such data
>>>>>>> >>>> refs pairs, the number of comparisons can be very large, which is a
>>>>>>> >>>> big overhead.
>>>>>>> >>>>
>>>>>>> >>>> However, in some cases it is possible to reduce the number of those
>>>>>>> >>>> comparisons. For example, for the following loop, we can detect that
>>>>>>> >>>> b[0] and b[1] are two consecutive member accesses so that we can
>>>>>>> >>>> combine the alias check between a[0:100]&b[0] and a[0:100]&b[1] into
>>>>>>> >>>> checking a[0:100]&b[0:2]:
>>>>>>> >>>>
>>>>>>> >>>> void foo(int*a, int* b)
>>>>>>> >>>> {
>>>>>>> >>>>    for (int i = 0; i < 100; ++i)
>>>>>>> >>>>     a[i] = b[0] + b[1];
>>>>>>> >>>> }
>>>>>>> >>>>
>>>>>>> >>>> Actually, the requirement of consecutive memory accesses is too
>>>>>>> >>>> strict. For the following loop, we can still combine the alias checks
>>>>>>> >>>> between a[0:100]&b[0] and a[0:100]&b[100]:
>>>>>>> >>>>
>>>>>>> >>>> void foo(int*a, int* b)
>>>>>>> >>>> {
>>>>>>> >>>>    for (int i = 0; i < 100; ++i)
>>>>>>> >>>>     a[i] = b[0] + b[100];
>>>>>>> >>>> }
>>>>>>> >>>>
>>>>>>> >>>> This is because if b[0] is not in a[0:100] and b[100] is not in
>>>>>>> >>>> a[0:100] then a[0:100] cannot be between b[0] and b[100]. We only need
>>>>>>> >>>> to check a[0:100] and b[0:101] don't overlap.
>>>>>>> >>>>
>>>>>>> >>>> More generally, consider two pairs of data refs (a, b1) and (a, b2).
>>>>>>> >>>> Suppose addr_b1 and addr_b2 are basic addresses of data ref b1 and b2;
>>>>>>> >>>> offset_b1 and offset_b2 (offset_b1 < offset_b2) are offsets of b1 and
>>>>>>> >>>> b2, and segment_length_a, segment_length_b1, and segment_length_b2 are
>>>>>>> >>>> segment length of a, b1, and b2. Then we can combine the two
>>>>>>> >>>> comparisons into one if the following condition is satisfied:
>>>>>>> >>>>
>>>>>>> >>>> offset_b2- offset_b1 - segment_length_b1 < segment_length_a
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> This patch detects those combination opportunities to reduce the
>>>>>>> >>>> number of alias checks. It is tested on an x86-64 machine.
>>>>>>> >>>
>>>>>>> >>> Apart from the other comments you got (to which I agree) the patch
>>>>>>> >>> seems to do two things, namely also:
>>>>>>> >>>
>>>>>>> >>> +  /* Extract load and store statements on pointers with zero-stride
>>>>>>> >>> +     accesses.  */
>>>>>>> >>> +  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
>>>>>>> >>> +    {
>>>>>>> >>>
>>>>>>> >>> which I'd rather see in a separate patch (and done also when
>>>>>>> >>> the loop doesn't require versioning for alias).
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> My mistake.. I am working on those two patches at the same time and
>>>>>>> >> pasted that one also here by mistake. I will send another patch about
>>>>>>> >> the "hoist" topic.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>> Also combining the alias checks in vect_create_cond_for_alias_checks
>>>>>>> >>> is nice but doesn't properly fix the use of the
>>>>>>> >>> vect-max-version-for-alias-checks param which currently inhibits
>>>>>>> >>> vectorization of the HIMENO benchmark by default (and make us look bad
>>>>>>> >>> compared to LLVM).
>>>>>>> >>>
>>>>>>> >>> So I believe this merging should be done incrementally when
>>>>>>> >>> we collect the DDRs we need to test in vect_mark_for_runtime_alias_test.
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I agree that vect-max-version-for-alias-checks param should count the
>>>>>>> >> number of checks after the merge. However, the struct
>>>>>>> >> data_dependence_relation could not record the new information produced
>>>>>>> >> by the merge. The new information I mentioned contains the new segment
>>>>>>> >> length for comparisons. This length is calculated right in
>>>>>>> >> vect_create_cond_for_alias_checks() function. Since
>>>>>>> >> vect-max-version-for-alias-checks is used during analysis phase, shall
>>>>>>> >> we move all those (get segment length for each data ref and merge
>>>>>>> >> alias checks) from transformation to analysis phase? If we cannot
>>>>>>> >> store the result properly (data_dependence_relation is not enough),
>>>>>>> >> shall we do it twice in both phases?
>>>>>>> >>
>>>>>>> >> I also noticed a possible bug in the function vect_same_range_drs()
>>>>>>> >> called by vect_prune_runtime_alias_test_list(). For the following code
>>>>>>> >> I get two pairs of data refs after
>>>>>>> >> vect_prune_runtime_alias_test_list(), but in
>>>>>>> >> vect_create_cond_for_alias_checks() after detecting grouped accesses I
>>>>>>> >> got two identical pairs of data refs. The consequence is two identical
>>>>>>> >> alias checks are produced.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> void yuv2yuyv_ref (int *d, int *src, int n)
>>>>>>> >> {
>>>>>>> >>   char *dest = (char *)d;
>>>>>>> >>   int i;
>>>>>>> >>
>>>>>>> >>   for(i=0;i<n/2;i++){
>>>>>>> >>     dest[i*4 + 0] = (src[i*2 + 0])>>16;
>>>>>>> >>     dest[i*4 + 1] = (src[i*2 + 1])>>8;
>>>>>>> >>     dest[i*4 + 2] = (src[i*2 + 0])>>16;
>>>>>>> >>     dest[i*4 + 3] = (src[i*2 + 0])>>0;
>>>>>>> >>   }
>>>>>>> >> }
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I think the solution to this problem is changing
>>>>>>> >>
>>>>>>> >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>>>>>>> >> == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)
>>>>>>> >>
>>>>>>> >> into
>>>>>>> >>
>>>>>>> >> STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT
>>>>>>> >> (vinfo_for_stmt (stmt_i))))
>>>>>>> >> == STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT
>>>>>>> >> (vinfo_for_stmt (stmt_j)))
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> in function vect_same_range_drs(). What do you think about it?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> thanks,
>>>>>>> >> Cong
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>> Thanks for working on this,
>>>>>>> >>> Richard.
>>>>>>> >>>
>>>>>>> >>>>
>>>>>>> >>>> thanks,
>>>>>>> >>>> Cong
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> Index: gcc/tree-vect-loop-manip.c
>>>>>>> >>>> ===================================================================
>>>>>>> >>>> --- gcc/tree-vect-loop-manip.c (revision 202662)
>>>>>>> >>>> +++ gcc/tree-vect-loop-manip.c (working copy)
>>>>>>> >>>> @@ -19,6 +19,10 @@ You should have received a copy of the G
>>>>>>> >>>>  along with GCC; see the file COPYING3.  If not see
>>>>>>> >>>>  <http://www.gnu.org/licenses/>.  */
>>>>>>> >>>>
>>>>>>> >>>> +#include <vector>
>>>>>>> >>>> +#include <utility>
>>>>>>> >>>> +#include <algorithm>
>>>>>>> >>>> +
>>>>>>> >>>>  #include "config.h"
>>>>>>> >>>>  #include "system.h"
>>>>>>> >>>>  #include "coretypes.h"
>>>>>>> >>>> @@ -2248,6 +2252,74 @@ vect_vfa_segment_size (struct data_refer
>>>>>>> >>>>    return segment_length;
>>>>>>> >>>>  }
>>>>>>> >>>>
>>>>>>> >>>> +namespace
>>>>>>> >>>> +{
>>>>>>> >>>> +
>>>>>>> >>>> +/* struct dr_addr_with_seg_len
>>>>>>> >>>> +
>>>>>>> >>>> +   A struct storing information of a data reference, including the data
>>>>>>> >>>> +   ref itself, its basic address, the access offset and the segment length
>>>>>>> >>>> +   for aliasing checks.  */
>>>>>>> >>>> +
>>>>>>> >>>> +struct dr_addr_with_seg_len
>>>>>>> >>>> +{
>>>>>>> >>>> +  dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
>>>>>>> >>>> +    : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
>>>>>>> >>>> +
>>>>>>> >>>> +  data_reference* dr;
>>>>>>> >>>> +  tree basic_addr;
>>>>>>> >>>> +  tree offset;
>>>>>>> >>>> +  tree seg_len;
>>>>>>> >>>> +};
>>>>>>> >>>> +
>>>>>>> >>>> +/* Operator == between two dr_addr_with_seg_len objects.
>>>>>>> >>>> +
>>>>>>> >>>> +   This equality operator is used to make sure two data refs
>>>>>>> >>>> +   are the same one so that we will consider to combine the
>>>>>>> >>>> +   aliasing checks of those two pairs of data dependent data
>>>>>>> >>>> +   refs.  */
>>>>>>> >>>> +
>>>>>>> >>>> +bool operator == (const dr_addr_with_seg_len& d1,
>>>>>>> >>>> +  const dr_addr_with_seg_len& d2)
>>>>>>> >>>> +{
>>>>>>> >>>> +  return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
>>>>>>> >>>> + && operand_equal_p (d1.offset, d2.offset, 0)
>>>>>>> >>>> + && operand_equal_p (d1.seg_len, d2.seg_len, 0);
>>>>>>> >>>> +}
>>>>>>> >>>> +
>>>>>>> >>>> +typedef std::pair <dr_addr_with_seg_len, dr_addr_with_seg_len>
>>>>>>> >>>> + dr_addr_with_seg_len_pair_t;
>>>>>>> >>>> +
>>>>>>> >>>> +
>>>>>>> >>>> +/* Operator < between two dr_addr_with_seg_len_pair_t objects.
>>>>>>> >>>> +
>>>>>>> >>>> +   This operator is used to sort objects of dr_addr_with_seg_len_pair_t
>>>>>>> >>>> +   so that we can combine aliasing checks during one scan.  */
>>>>>>> >>>> +
>>>>>>> >>>> +bool operator < (const dr_addr_with_seg_len_pair_t& p1,
>>>>>>> >>>> + const dr_addr_with_seg_len_pair_t& p2)
>>>>>>> >>>> +{
>>>>>>> >>>> +  const dr_addr_with_seg_len& p11 = p1.first;
>>>>>>> >>>> +  const dr_addr_with_seg_len& p12 = p1.second;
>>>>>>> >>>> +  const dr_addr_with_seg_len& p21 = p2.first;
>>>>>>> >>>> +  const dr_addr_with_seg_len& p22 = p2.second;
>>>>>>> >>>> +
>>>>>>> >>>> +  if (p11.basic_addr != p21.basic_addr)
>>>>>>> >>>> +    return p11.basic_addr < p21.basic_addr;
>>>>>>> >>>> +  if (p12.basic_addr != p22.basic_addr)
>>>>>>> >>>> +    return p12.basic_addr < p22.basic_addr;
>>>>>>> >>>> +  if (TREE_CODE (p11.offset) != INTEGER_CST
>>>>>>> >>>> +      || TREE_CODE (p21.offset) != INTEGER_CST)
>>>>>>> >>>> +    return p11.offset < p21.offset;
>>>>>>> >>>> +  if (int_cst_value (p11.offset) != int_cst_value (p21.offset))
>>>>>>> >>>> +    return int_cst_value (p11.offset) < int_cst_value (p21.offset);
>>>>>>> >>>> +  if (TREE_CODE (p12.offset) != INTEGER_CST
>>>>>>> >>>> +      || TREE_CODE (p22.offset) != INTEGER_CST)
>>>>>>> >>>> +    return p12.offset < p22.offset;
>>>>>>> >>>> +  return int_cst_value (p12.offset) < int_cst_value (p22.offset);
>>>>>>> >>>> +}
>>>>>>> >>>> +
>>>>>>> >>>> +}
>>>>>>> >>>>
>>>>>>> >>>>  /* Function vect_create_cond_for_alias_checks.
>>>>>>> >>>>
>>>>>>> >>>> @@ -2292,20 +2364,51 @@ vect_create_cond_for_alias_checks (loop_
>>>>>>> >>>>    if (may_alias_ddrs.is_empty ())
>>>>>>> >>>>      return;
>>>>>>> >>>>
>>>>>>> >>>> +
>>>>>>> >>>> +  /* Basically, for each pair of dependent data refs store_ptr_0
>>>>>>> >>>> +     and load_ptr_0, we create an expression:
>>>>>>> >>>> +
>>>>>>> >>>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>>>>>> >>>> +     || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
>>>>>>> >>>> +
>>>>>>> >>>> +     for aliasing checks. However, in some cases we can decrease
>>>>>>> >>>> +     the number of checks by combining two checks into one. For
>>>>>>> >>>> +     example, suppose we have another pair of data refs store_ptr_0
>>>>>>> >>>> +     and load_ptr_1, and if the following condition is satisfied:
>>>>>>> >>>> +
>>>>>>> >>>> +     load_ptr_0 < load_ptr_1  &&
>>>>>>> >>>> +     load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
>>>>>>> >>>> +
>>>>>>> >>>> +     (this condition means, in each iteration of vectorized loop,
>>>>>>> >>>> +     the accessed memory of store_ptr_0 cannot be between the memory
>>>>>>> >>>> +     of load_ptr_0 and load_ptr_1.)
>>>>>>> >>>> +
>>>>>>> >>>> +     we then can use only the following expression to finish the
>>>>>>> >>>> +     alising checks between store_ptr_0 & load_ptr_0 and
>>>>>>> >>>> +     store_ptr_0 & load_ptr_1:
>>>>>>> >>>> +
>>>>>>> >>>> +     ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>>>>>> >>>> +     || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
>>>>>>> >>>> +
>>>>>>> >>>> +     Note that we only consider that load_ptr_0 and load_ptr_1 have the
>>>>>>> >>>> +     same basic address.  */
>>>>>>> >>>> +
>>>>>>> >>>> +  std::vector<dr_addr_with_seg_len_pair_t> ddrs_with_seg_len;
>>>>>>> >>>> +
>>>>>>> >>>> +  /* First, we collect all data ref pairs for aliasing checks.  */
>>>>>>> >>>> +
>>>>>>> >>>>    FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>>>>>>> >>>>      {
>>>>>>> >>>>        struct data_reference *dr_a, *dr_b;
>>>>>>> >>>>        gimple dr_group_first_a, dr_group_first_b;
>>>>>>> >>>> -      tree addr_base_a, addr_base_b;
>>>>>>> >>>>        tree segment_length_a, segment_length_b;
>>>>>>> >>>>        gimple stmt_a, stmt_b;
>>>>>>> >>>> -      tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
>>>>>>> >>>>
>>>>>>> >>>>        dr_a = DDR_A (ddr);
>>>>>>> >>>>        stmt_a = DR_STMT (DDR_A (ddr));
>>>>>>> >>>>        dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>>>>>>> >>>>        if (dr_group_first_a)
>>>>>>> >>>> -        {
>>>>>>> >>>> + {
>>>>>>> >>>>    stmt_a = dr_group_first_a;
>>>>>>> >>>>    dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>>>>>>> >>>>   }
>>>>>>> >>>> @@ -2314,20 +2417,11 @@ vect_create_cond_for_alias_checks (loop_
>>>>>>> >>>>        stmt_b = DR_STMT (DDR_B (ddr));
>>>>>>> >>>>        dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>>>>>>> >>>>        if (dr_group_first_b)
>>>>>>> >>>> -        {
>>>>>>> >>>> + {
>>>>>>> >>>>    stmt_b = dr_group_first_b;
>>>>>>> >>>>    dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>>>>>>> >>>>   }
>>>>>>> >>>>
>>>>>>> >>>> -      addr_base_a
>>>>>>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
>>>>>>> >>>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
>>>>>>> >>>> -       DR_INIT (dr_a)));
>>>>>>> >>>> -      addr_base_b
>>>>>>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
>>>>>>> >>>> -   size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
>>>>>>> >>>> -       DR_INIT (dr_b)));
>>>>>>> >>>> -
>>>>>>> >>>>        if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>>>>>>> >>>>   length_factor = scalar_loop_iters;
>>>>>>> >>>>        else
>>>>>>> >>>> @@ -2335,24 +2429,149 @@ vect_create_cond_for_alias_checks (loop_
>>>>>>> >>>>        segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>>>>>>> >>>>        segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>>>>>>> >>>>
>>>>>>> >>>> +      dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
>>>>>>> >>>> +  (dr_addr_with_seg_len
>>>>>>> >>>> +       (dr_a, DR_BASE_ADDRESS (dr_a),
>>>>>>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
>>>>>>> >>>> + segment_length_a),
>>>>>>> >>>> +   dr_addr_with_seg_len
>>>>>>> >>>> +       (dr_b, DR_BASE_ADDRESS (dr_b),
>>>>>>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
>>>>>>> >>>> + segment_length_b));
>>>>>>> >>>> +
>>>>>>> >>>> +      if (dr_with_seg_len_pair.first.basic_addr >
>>>>>>> >>>> +  dr_with_seg_len_pair.second.basic_addr)
>>>>>>> >>>> + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
>>>>>>> >>>> +
>>>>>>> >>>> +      ddrs_with_seg_len.push_back (dr_with_seg_len_pair);
>>>>>>> >>>> +    }
>>>>>>> >>>> +
>>>>>>> >>>> +  /* Second, we sort the collected data ref pairs so that we can scan
>>>>>>> >>>> +     them once to combine all possible aliasing checks.  */
>>>>>>> >>>> +
>>>>>>> >>>> +  std::sort (ddrs_with_seg_len.begin(), ddrs_with_seg_len.end());
>>>>>>> >>>> +
>>>>>>> >>>> +  /* Remove duplicate data ref pairs.  */
>>>>>>> >>>> +  ddrs_with_seg_len.erase (std::unique (ddrs_with_seg_len.begin(),
>>>>>>> >>>> + ddrs_with_seg_len.end()),
>>>>>>> >>>> +   ddrs_with_seg_len.end());
>>>>>>> >>>> +
>>>>>>> >>>> +  /* We then scan the sorted dr pairs and check if we can combine
>>>>>>> >>>> +     alias checks of two neighbouring dr pairs.  */
>>>>>>> >>>> +
>>>>>>> >>>> +  for (size_t i = 1; i < ddrs_with_seg_len.size (); ++i)
>>>>>>> >>>> +    {
>>>>>>> >>>> +      dr_addr_with_seg_len& dr_a1 = ddrs_with_seg_len[i-1].first;
>>>>>>> >>>> +      dr_addr_with_seg_len& dr_b1 = ddrs_with_seg_len[i-1].second;
>>>>>>> >>>> +      dr_addr_with_seg_len& dr_a2 = ddrs_with_seg_len[i].first;
>>>>>>> >>>> +      dr_addr_with_seg_len& dr_b2 = ddrs_with_seg_len[i].second;
>>>>>>> >>>> +
>>>>>>> >>>> +      if (dr_a1 == dr_a2)
>>>>>>> >>>> + {
>>>>>>> >>>> +  if (dr_b1.basic_addr != dr_b2.basic_addr
>>>>>>> >>>> +      || TREE_CODE (dr_b1.offset) != INTEGER_CST
>>>>>>> >>>> +      || TREE_CODE (dr_b2.offset) != INTEGER_CST)
>>>>>>> >>>> +    continue;
>>>>>>> >>>> +
>>>>>>> >>>> +  int diff = int_cst_value (dr_b2.offset) -
>>>>>>> >>>> +     int_cst_value (dr_b1.offset);
>>>>>>> >>>> +
>>>>>>> >>>> +  gcc_assert (diff > 0);
>>>>>>> >>>> +
>>>>>>> >>>> +  if (diff <= vect_factor
>>>>>>> >>>> +      || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>>>>>>> >>>> +  && diff - int_cst_value (dr_b1.seg_len) < vect_factor)
>>>>>>> >>>> +      || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>>>>>>> >>>> +  && TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>>>>>>> >>>> +  && diff - int_cst_value (dr_b1.seg_len) <
>>>>>>> >>>> +     int_cst_value (dr_a1.seg_len)))
>>>>>>> >>>> +    {
>>>>>>> >>>> +      if (dump_enabled_p ())
>>>>>>> >>>> + {
>>>>>>> >>>> +  dump_printf_loc
>>>>>>> >>>> +      (MSG_NOTE, vect_location,
>>>>>>> >>>> +       "combining two runtime checks for data references ");
>>>>>>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1.dr));
>>>>>>> >>>> +  dump_printf (MSG_NOTE, " and ");
>>>>>>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2.dr));
>>>>>>> >>>> +  dump_printf (MSG_NOTE, "\n");
>>>>>>> >>>> + }
>>>>>>> >>>> +
>>>>>>> >>>> +      dr_b1.seg_len = size_binop (PLUS_EXPR,
>>>>>>> >>>> +  dr_b2.seg_len, size_int (diff));
>>>>>>> >>>> +      ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i);
>>>>>>> >>>> +      --i;
>>>>>>> >>>> +    }
>>>>>>> >>>> + }
>>>>>>> >>>> +      else if (dr_b1 == dr_b2)
>>>>>>> >>>> + {
>>>>>>> >>>> +  if (dr_a1.basic_addr != dr_a2.basic_addr
>>>>>>> >>>> +      || TREE_CODE (dr_a1.offset) != INTEGER_CST
>>>>>>> >>>> +      || TREE_CODE (dr_a2.offset) != INTEGER_CST)
>>>>>>> >>>> +    continue;
>>>>>>> >>>> +
>>>>>>> >>>> +  int diff = int_cst_value (dr_a2.offset) -
>>>>>>> >>>> +     int_cst_value (dr_a1.offset);
>>>>>>> >>>> +
>>>>>>> >>>> +  gcc_assert (diff > 0);
>>>>>>> >>>> +
>>>>>>> >>>> +  if (diff <= vect_factor
>>>>>>> >>>> +      || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>>>>>>> >>>> +  && diff - int_cst_value (dr_a1.seg_len) < vect_factor)
>>>>>>> >>>> +      || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>>>>>>> >>>> +  && TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>>>>>>> >>>> +  && diff - int_cst_value (dr_a1.seg_len) <
>>>>>>> >>>> +     int_cst_value (dr_b1.seg_len)))
>>>>>>> >>>> +    {
>>>>>>> >>>> +      if (dump_enabled_p ())
>>>>>>> >>>> + {
>>>>>>> >>>> +  dump_printf_loc
>>>>>>> >>>> +      (MSG_NOTE, vect_location,
>>>>>>> >>>> +       "combining two runtime checks for data references ");
>>>>>>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1.dr));
>>>>>>> >>>> +  dump_printf (MSG_NOTE, " and ");
>>>>>>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2.dr));
>>>>>>> >>>> +  dump_printf (MSG_NOTE, "\n");
>>>>>>> >>>> + }
>>>>>>> >>>> +
>>>>>>> >>>> +      dr_a1.seg_len = size_binop (PLUS_EXPR,
>>>>>>> >>>> +  dr_a2.seg_len, size_int (diff));
>>>>>>> >>>> +      ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i);
>>>>>>> >>>> +      --i;
>>>>>>> >>>> +    }
>>>>>>> >>>> + }
>>>>>>> >>>> +    }
>>>>>>> >>>> +
>>>>>>> >>>> +  for (size_t i = 0, s = ddrs_with_seg_len.size (); i < s; ++i)
>>>>>>> >>>> +    {
>>>>>>> >>>> +      const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first;
>>>>>>> >>>> +      const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second;
>>>>>>> >>>> +      tree segment_length_a = dr_a.seg_len;
>>>>>>> >>>> +      tree segment_length_b = dr_b.seg_len;
>>>>>>> >>>> +
>>>>>>> >>>> +      tree addr_base_a
>>>>>>> >>>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
>>>>>>> >>>> +      tree addr_base_b
>>>>>>> >>>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
>>>>>>> >>>> +
>>>>>>> >>>>        if (dump_enabled_p ())
>>>>>>> >>>>   {
>>>>>>> >>>>    dump_printf_loc (MSG_NOTE, vect_location,
>>>>>>> >>>> -                           "create runtime check for data references ");
>>>>>>> >>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
>>>>>>> >>>> +   "create runtime check for data references ");
>>>>>>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
>>>>>>> >>>>    dump_printf (MSG_NOTE, " and ");
>>>>>>> >>>> -  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
>>>>>>> >>>> -          dump_printf (MSG_NOTE, "\n");
>>>>>>> >>>> +  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
>>>>>>> >>>> +  dump_printf (MSG_NOTE, "\n");
>>>>>>> >>>>   }
>>>>>>> >>>>
>>>>>>> >>>> -      seg_a_min = addr_base_a;
>>>>>>> >>>> -      seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>>>>>>> >>>> -      if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
>>>>>>> >>>> +      tree seg_a_min = addr_base_a;
>>>>>>> >>>> +      tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>>>>>>> >>>> +      if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
>>>>>>> >>>>   seg_a_min = seg_a_max, seg_a_max = addr_base_a;
>>>>>>> >>>>
>>>>>>> >>>> -      seg_b_min = addr_base_b;
>>>>>>> >>>> -      seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>>>>>>> >>>> -      if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
>>>>>>> >>>> +      tree seg_b_min = addr_base_b;
>>>>>>> >>>> +      tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>>>>>>> >>>> +      if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
>>>>>>> >>>>   seg_b_min = seg_b_max, seg_b_max = addr_base_b;
>>>>>>> >>>>
>>>>>>> >>>>        part_cond_expr =
>>>>>>> >>>> @@ -2477,6 +2696,81 @@ vect_loop_versioning (loop_vec_info loop
>>>>>>> >>>>        adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
>>>>>>> >>>>      }
>>>>>>> >>>>
>>>>>>> >>>> +  /* Extract load and store statements on pointers with zero-stride
>>>>>>> >>>> +     accesses.  */
>>>>>>> >>>> +  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
>>>>>>> >>>> +    {
>>>>>>> >>>> +
>>>>>>> >>>> +      /* In the loop body, we iterate each statement to check if it is a load
>>>>>>> >>>> + or store. Then we check the DR_STEP of the data reference.  If
>>>>>>> >>>> + DR_STEP is zero, then we will hoist the load statement to the loop
>>>>>>> >>>> + preheader, and move the store statement to the loop exit.  */
>>>>>>> >>>> +
>>>>>>> >>>> +      for (gimple_stmt_iterator si = gsi_start_bb (loop->header);
>>>>>>> >>>> +   !gsi_end_p (si); )
>>>>>>> >>>> + {
>>>>>>> >>>> +  gimple stmt = gsi_stmt (si);
>>>>>>> >>>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>>>>>> >>>> +  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
>>>>>>> >>>> +
>>>>>>> >>>> +
>>>>>>> >>>> +  if (dr && integer_zerop (DR_STEP (dr)))
>>>>>>> >>>> +    {
>>>>>>> >>>> +      if (DR_IS_READ (dr))
>>>>>>> >>>> + {
>>>>>>> >>>> +  if (dump_file)
>>>>>>> >>>> +    {
>>>>>>> >>>> +      fprintf (dump_file,
>>>>>>> >>>> +       "Hoist the load to outside of the loop:\n");
>>>>>>> >>>> +      print_gimple_stmt (dump_file, stmt, 0,
>>>>>>> >>>> + TDF_VOPS|TDF_MEMSYMS);
>>>>>>> >>>> +    }
>>>>>>> >>>> +
>>>>>>> >>>> +  basic_block preheader = loop_preheader_edge (loop)->src;
>>>>>>> >>>> +  gimple_stmt_iterator si_dst = gsi_last_bb (preheader);
>>>>>>> >>>> +  gsi_move_after (&si, &si_dst);
>>>>>>> >>>> + }
>>>>>>> >>>> +      else
>>>>>>> >>>> + {
>>>>>>> >>>> +  gimple_stmt_iterator si_dst =
>>>>>>> >>>> +      gsi_last_bb (single_exit (loop)->dest);
>>>>>>> >>>> +  gsi_move_after (&si, &si_dst);
>>>>>>> >>>> + }
>>>>>>> >>>> +              continue;
>>>>>>> >>>> +    }
>>>>>>> >>>> +  else if (!dr)
>>>>>>> >>>> +          {
>>>>>>> >>>> +            bool hoist = true;
>>>>>>> >>>> +            for (size_t i = 0; i < gimple_num_ops (stmt); i++)
>>>>>>> >>>> +            {
>>>>>>> >>>> +              tree op = gimple_op (stmt, i);
>>>>>>> >>>> +              if (TREE_CODE (op) == INTEGER_CST
>>>>>>> >>>> +                  || TREE_CODE (op) == REAL_CST)
>>>>>>> >>>> +                continue;
>>>>>>> >>>> +              if (TREE_CODE (op) == SSA_NAME)
>>>>>>> >>>> +              {
>>>>>>> >>>> +                gimple def = SSA_NAME_DEF_STMT (op);
>>>>>>> >>>> +                if (def == stmt
>>>>>>> >>>> +                    || gimple_nop_p (def)
>>>>>>> >>>> +                    || !flow_bb_inside_loop_p (loop, gimple_bb (def)))
>>>>>>> >>>> +                  continue;
>>>>>>> >>>> +              }
>>>>>>> >>>> +              hoist = false;
>>>>>>> >>>> +              break;
>>>>>>> >>>> +            }
>>>>>>> >>>> +
>>>>>>> >>>> +            if (hoist)
>>>>>>> >>>> +            {
>>>>>>> >>>> +              basic_block preheader = loop_preheader_edge (loop)->src;
>>>>>>> >>>> +              gimple_stmt_iterator si_dst = gsi_last_bb (preheader);
>>>>>>> >>>> +              gsi_move_after (&si, &si_dst);
>>>>>>> >>>> +              continue;
>>>>>>> >>>> +            }
>>>>>>> >>>> +          }
>>>>>>> >>>> +          gsi_next (&si);
>>>>>>> >>>> + }
>>>>>>> >>>> +    }
>>>>>>> >>>> +
>>>>>>> >>>>    /* End loop-exit-fixes after versioning.  */
>>>>>>> >>>>
>>>>>>> >>>>    if (cond_expr_stmt_list)
>>>>>>> >>>> Index: gcc/ChangeLog
>>>>>>> >>>> ===================================================================
>>>>>>> >>>> --- gcc/ChangeLog (revision 202663)
>>>>>>> >>>> +++ gcc/ChangeLog (working copy)
>>>>>>> >>>> @@ -1,3 +1,8 @@
>>>>>>> >>>> +2013-10-01  Cong Hou  <congh@google.com>
>>>>>>> >>>> +
>>>>>>> >>>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Combine
>>>>>>> >>>> + alias checks if it is possible to amortize the runtime overhead.
>>>>>>> >>>> +
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> Richard Biener <rguenther@suse.de>
>>>>>>> >>> SUSE / SUSE Labs
>>>>>>> >>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>>>>>> >>> GF: Jeff Hawn, Jennifer Guild, Felix Imend


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]