This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Reducing number of alias checks in vectorization.
- From: Cong Hou <congh at google dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 31 Oct 2013 10:27:24 -0700
- Subject: Re: [PATCH] Reducing number of alias checks in vectorization.
- Authentication-results: sourceware.org; auth=none
- References: <CAK=A3=3sjM_MCqDoXwBXPsDiBDRGPuGh3oBkBOt_3685=dUXPw at mail dot gmail dot com> <alpine dot LNX dot 2 dot 00 dot 1310021318380 dot 5759 at zhemvz dot fhfr dot qr> <CAK=A3=1oUejUbP8d-a1vg_bC4bRQYvsOt+G43qP2cUC41wKwcQ at mail dot gmail dot com> <CAK=A3=32zEByr2znHEfc9xqgWHdNm1OFjfv8HsYLBZjYF6Va6Q at mail dot gmail dot com> <CAK=A3=2Z=5ASPHdTw6zAvUJv0emDfA+pRY02MfkSiDu9J-Cv4Q at mail dot gmail dot com> <CAK=A3=1uZWiVh7vCXkzuhzTJh5FUOOb2Uy6WHEnREgVj8Ba+fw at mail dot gmail dot com> <CAK=A3=1Tui9OfnsZNBOr7qa8SB_1rn0nHDXdg1K84r4Dk5ZCiw at mail dot gmail dot com> <CAFiYyc0DKNhJdfj2THMLVNeP2g-=rd1eTYSKcFSkvxLQ-4StyA at mail dot gmail dot com> <CAK=A3=0dZGhTt_amFBCPZB0_yVCUVnPG0Zh+wf5vVhnsteiW8A at mail dot gmail dot com> <CAFiYyc0GGR=GMCwO8fbSZng33Ewrd7ttVjfF0d6qF7BnUYVPrA at mail dot gmail dot com> <CAK=A3=0kwfZrN26t9VnFHXaqq34adsQxOzW+zf_TRprP5axM3g at mail dot gmail dot com>
Ping?
thanks,
Cong
On Wed, Oct 23, 2013 at 1:04 PM, Cong Hou <congh@google.com> wrote:
> On Wed, Oct 23, 2013 at 5:53 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Oct 22, 2013 at 11:12 PM, Cong Hou <congh@google.com> wrote:
>>> On Fri, Oct 18, 2013 at 6:09 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Tue, Oct 15, 2013 at 12:43 AM, Cong Hou <congh@google.com> wrote:
>>>>> Sorry for forgetting using plain-text mode. Resend it.
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Cong Hou <congh@google.com>
>>>>> Date: Mon, Oct 14, 2013 at 3:29 PM
>>>>> Subject: Re: [PATCH] Reducing number of alias checks in vectorization.
>>>>> To: Richard Biener <rguenther@suse.de>, GCC Patches <gcc-patches@gcc.gnu.org>
>>>>> Cc: Jakub Jelinek <jakub@redhat.com>
>>>>>
>>>>>
>>>>> I have made a new patch for this issue according to your comments.
>>>>>
>>>>> There are several modifications to my previous patch:
>>>>>
>>>>>
>>>>> 1. Remove the use of STL features such as vector and sort. Use GCC's
>>>>> vec and qsort instead.
>>>>
>>>> I think that using <utility> and thus std::pair and std::swap is ok. Including
>>>> of <utility> should be done via system.h though.
>>>>
>>>
>>>
>>> Unfortunately, std::swap is defined in <algorithm>. Can I still use it?
>>
>> Hmm, let's defer this issue for now, adding swap and pair templates
>> locally in GCC is better for now. That is, I don't want a discussion of
>> what parts of the standard library to use inside this thread. You may
>> want to rise the this issue in a new thread on gcc@gcc.gnu.org.
>>
>>>>> 2. Comparisons between tree nodes are not based on their addresses any
>>>>> more. Use compare_tree() function instead.
>>>>
>>>> Ok.
>>>>
>>>>> 3. The function vect_create_cond_for_alias_checks() now returns the
>>>>> number of alias checks. If its second parameter cond_expr is NULL,
>>>>> then this function only calculate the number of alias checks after the
>>>>> merging and won't generate comparison expressions.
>>>>
>>>> We now perform the merging twice - that's easily avoided by recording
>>>> the merge result in loop_vinfo alongside may_alias_ddrs (which you
>>>> should be able to drop as they are already contained in the data
>>>> structures you build).
>>>>
>>>> Which means you can split the function and move the merging
>>>> part to vect_prune_runtime_alias_test_list.
>>>
>>>
>>> I have added one more field to loop_vec_info to store the dr pairs
>>> from which alias checks are built. And I also split the functionality
>>> of vect_create_cond_for_alias_checks() into two functions, so that we
>>> now perform the merging once instead of twice.
>>>
>>>
>>>>
>>>>> 4. The function vect_prune_runtime_alias_test_list() now uses
>>>>> vect_create_cond_for_alias_checks() to get the number of alias checks.
>>>>>
>>>>>
>>>>> The patch is attached as a text file.
>>>>>
>>>>> Please give me your comment on this patch. Thank you!
>>>>
>>>> + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
>>>> + || TREE_CODE (dr_a1->offset) != INTEGER_CST
>>>> + || TREE_CODE (dr_a2->offset) != INTEGER_CST)
>>>> + continue;
>>>> +
>>>> + HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) -
>>>> + widest_int_cst_value (dr_a1->offset);
>>>>
>>>> use !host_integerp (dr_a1->offset) as check and then int_cst_value.
>>>
>>>
>>> Done.
>>>
>>>
>>>>
>>>> + if (diff <= vect_factor
>>>> + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>>>> + && diff - widest_int_cst_value (dr_a1->seg_len) < vect_factor)
>>>> + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>>>> + && TREE_CODE (dr_b1->seg_len) == INTEGER_CST
>>>> + && diff - widest_int_cst_value (dr_a1->seg_len) <
>>>> + widest_int_cst_value (dr_b1->seg_len)))
>>>>
>>>> can you add a comment above this why it is safe to combine under
>>>> this condition? You seem to combine offsets but not data-ref steps
>>>> for example. The segment length is computed quite optimistically
>>>> and you don't seem to adjust that. That is, generally the alias
>>>> check, if implemented as simple overlap, would need to intersect
>>>> [init, init + niter * step] intervals, losing some non-aliasing cases.
>>>> The current segment length is optimistically computed and thus
>>>> can recover some more cases, but you have to be careful with
>>>> merging to not break its implicit assumptions.
>>>
>>>
>>> I have changed the code and made it more clear as shown below. I still
>>> use the condition
>>>
>>>
>>> DR_A2->OFFSET - DR_A1->OFFSET - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>>>
>>>
>>> But in case that SEGMENT_LENGTH_A or SEGMENT_LENGTH_B is not constant,
>>> we can still check the following two conditions:
>>>
>>>
>>> DR_A2->OFFSET - DR_A1->OFFSET < SEGMENT_LENGTH_B
>>> (if SEGMENT_LENGTH_A is not a constant).
>>>
>>>
>>> DR_A2->OFFSET - DR_A1->OFFSET - SEGMENT_LENGTH_A < VECT_FACTOR
>>> (if SEGMENT_LENGTH_B is not a constant, then its minimum value should
>>> be VECT_FACTOR).
>>>
>>>
>>>
>>> The corresponding code is:
>>>
>>>
>>> /* Now we check if the following condition is satisfied:
>>>
>>> DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>>>
>>> where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However,
>>> SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
>>> have to make a best estimation. We can get the minimum value
>>> of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B,
>>> then either of the following two conditions can guarantee the
>>> one above:
>>>
>>> 1: DIFF <= MIN_SEG_LEN_B
>>> 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
>>>
>>> */
>>>
>>> HOST_WIDEST_INT
>>> min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ?
>>> widest_int_cst_value (dr_b1->seg_len) :
>>> vect_factor;
>>>
>>> if (diff <= min_seg_len_b
>>> || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>>> && diff - widest_int_cst_value (dr_a1->seg_len) <
>>> min_seg_len_b))
>>> { ... }
>>>
>>>
>>>
>>> The new patch is pasted here again. Bootstrapping and regression tests
>>> are passed.
>>>
>>>
>>> thanks,
>>> Cong
>>>
>>>
>>>
>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>> index 8a38316..7333f6a 100644
>>> --- a/gcc/ChangeLog
>>> +++ b/gcc/ChangeLog
>>> @@ -1,3 +1,12 @@
>>> +2013-10-14 Cong Hou <congh@google.com>
>>> +
>>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks):
>>> + Combine alias checks if it is possible to amortize the runtime
>>> + overhead. Return the number of alias checks after merging.
>>> + * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
>>> + Use the function vect_create_cond_for_alias_checks () to check
>>> + the number of alias checks.
>>> +
>>
>> Your MUA eats tabs ... ;)
>
>
> That is why I attached a text file ;)
>
>
>>
>>> 2013-10-14 David Malcolm <dmalcolm@redhat.com>
>>>
>>> * dumpfile.h (gcc::dump_manager): New class, to hold state
>>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>>> index 075d071..ea2782f 100644
>>> --- a/gcc/testsuite/ChangeLog
>>> +++ b/gcc/testsuite/ChangeLog
>>> @@ -1,3 +1,7 @@
>>> +2013-10-14 Cong Hou <congh@google.com>
>>> +
>>> + * gcc.dg/vect/vect-alias-check.c: New.
>>> +
>>> 2013-10-14 Tobias Burnus <burnus@net-b.de>
>>>
>>> PR fortran/58658
>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>>> b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>>> new file mode 100644
>>> index 0000000..bf9eb6b
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
>>> @@ -0,0 +1,19 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -ftree-vectorize
>>> + --param=vect-max-version-for-alias-checks=2
>>> + -fdump-tree-vect-details" } */
>>> +
>>> +/* A test case showing three potential alias checks between
>>> + a[i] and b[i], b[i+7], b[i+14]. With alias checks merging
>>> + enabled, those tree checks can be merged into one, and the
>>> + loop will be vectorized with vect-max-version-for-alias-checks=2. */
>>> +
>>> +void foo (int *a, int *b)
>>> +{
>>> + int i;
>>> + for (i = 0; i < 1000; ++i)
>>> + a[i] = b[i] + b[i+7] + b[i+14];
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>>> +/* { dg-final { cleanup-tree-dump "vect" } } */
>>> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>>> index e7b2f33..5f70d58 100644
>>> --- a/gcc/tree-vect-data-refs.c
>>> +++ b/gcc/tree-vect-data-refs.c
>>> @@ -128,41 +128,6 @@ vect_get_smallest_scalar_type (gimple stmt,
>>> HOST_WIDE_INT *lhs_size_unit,
>>> }
>>>
>>>
>>> -/* Check if data references pointed by DR_I and DR_J are same or
>>> - belong to same interleaving group. Return FALSE if drs are
>>> - different, otherwise return TRUE. */
>>> -
>>> -static bool
>>> -vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j)
>>> -{
>>> - gimple stmt_i = DR_STMT (dr_i);
>>> - gimple stmt_j = DR_STMT (dr_j);
>>> -
>>> - if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0)
>>> - || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>>> - && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))
>>> - && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>>> - == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)))))
>>> - return true;
>>> - else
>>> - return false;
>>> -}
>>> -
>>> -/* If address ranges represented by DDR_I and DDR_J are equal,
>>> - return TRUE, otherwise return FALSE. */
>>> -
>>> -static bool
>>> -vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j)
>>> -{
>>> - if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j))
>>> - && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j)))
>>> - || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j))
>>> - && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j))))
>>> - return true;
>>> - else
>>> - return false;
>>> -}
>>> -
>>> /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
>>> tested at run-time. Return TRUE if DDR was successfully inserted.
>>> Return false if versioning is not supported. */
>>> @@ -2647,69 +2612,331 @@ vect_analyze_data_ref_accesses (loop_vec_info
>>> loop_vinfo, bb_vec_info bb_vinfo)
>>> return true;
>>> }
>>>
>>> +
>>> +/* Operator == between two dr_addr_with_seg_len objects.
>>> +
>>> + This equality operator is used to make sure two data refs
>>> + are the same one so that we will consider to combine the
>>> + aliasing checks of those two pairs of data dependent data
>>> + refs. */
>>> +
>>> +static bool
>>> +operator == (const dr_addr_with_seg_len& d1,
>>> + const dr_addr_with_seg_len& d2)
>>> +{
>>> + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
>>> + && compare_tree (d1.offset, d2.offset) == 0
>>> + && compare_tree (d1.seg_len, d2.seg_len) == 0;
>>> +}
>>> +
>>> +/* Function comp_dr_addr_with_seg_len_pair.
>>> +
>>> + Comparison function for sorting objects of dr_addr_with_seg_len_pair_t
>>> + so that we can combine aliasing checks in one scan. */
>>> +
>>> +static int
>>> +comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_)
>>> +{
>>> + const dr_addr_with_seg_len_pair_t* p1 =
>>> + (const dr_addr_with_seg_len_pair_t *) p1_;
>>> + const dr_addr_with_seg_len_pair_t* p2 =
>>> + (const dr_addr_with_seg_len_pair_t *) p2_;
>>> +
>>> + const dr_addr_with_seg_len &p11 = p1->first,
>>> + &p12 = p1->second,
>>> + &p21 = p2->first,
>>> + &p22 = p2->second;
>>> +
>>> + int comp_res = compare_tree (p11.basic_addr, p21.basic_addr);
>>> + if (comp_res != 0)
>>> + return comp_res;
>>> +
>>> + comp_res = compare_tree (p12.basic_addr, p22.basic_addr);
>>> + if (comp_res != 0)
>>> + return comp_res;
>>> +
>>> + if (TREE_CODE (p11.offset) != INTEGER_CST
>>> + || TREE_CODE (p21.offset) != INTEGER_CST)
>>> + {
>>> + comp_res = compare_tree (p11.offset, p21.offset);
>>> + if (comp_res != 0)
>>> + return comp_res;
>>> + }
>>> + if (tree_int_cst_compare (p11.offset, p21.offset) < 0)
>>> + return -1;
>>> + if (tree_int_cst_compare (p11.offset, p21.offset) > 0)
>>> + return 1;
>>> + if (TREE_CODE (p12.offset) != INTEGER_CST
>>> + || TREE_CODE (p22.offset) != INTEGER_CST)
>>> + {
>>> + comp_res = compare_tree (p12.offset, p22.offset);
>>> + if (comp_res != 0)
>>> + return comp_res;
>>> + }
>>> + if (tree_int_cst_compare (p12.offset, p22.offset) < 0)
>>> + return -1;
>>> + if (tree_int_cst_compare (p12.offset, p22.offset) > 0)
>>> + return 1;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +template <class T> static void
>>> +swap (T& a, T& b)
>>> +{
>>> + T c (a);
>>> + a = b;
>>> + b = c;
>>> +}
>>> +
>>> +/* Function vect_vfa_segment_size.
>>> +
>>> + Create an expression that computes the size of segment
>>> + that will be accessed for a data reference. The functions takes into
>>> + account that realignment loads may access one more vector.
>>> +
>>> + Input:
>>> + DR: The data reference.
>>> + LENGTH_FACTOR: segment length to consider.
>>> +
>>> + Return an expression whose value is the size of segment which will be
>>> + accessed by DR. */
>>> +
>>> +static tree
>>> +vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>>> +{
>>> + tree segment_length;
>>> +
>>> + if (integer_zerop (DR_STEP (dr)))
>>> + segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
>>> + else
>>> + segment_length = size_binop (MULT_EXPR,
>>> + fold_convert (sizetype, DR_STEP (dr)),
>>> + fold_convert (sizetype, length_factor));
>>> +
>>> + if (vect_supportable_dr_alignment (dr, false)
>>> + == dr_explicit_realign_optimized)
>>> + {
>>> + tree vector_size = TYPE_SIZE_UNIT
>>> + (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
>>> +
>>> + segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
>>> + }
>>> + return segment_length;
>>> +}
>>> +
>>> /* Function vect_prune_runtime_alias_test_list.
>>>
>>> Prune a list of ddrs to be tested at run-time by versioning for alias.
>>> + Merge several alias checks into one if possible.
>>> Return FALSE if resulting list of ddrs is longer then allowed by
>>> PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, otherwise return TRUE. */
>>>
>>> bool
>>> vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
>>> {
>>> - vec<ddr_p> ddrs =
>>> + vec<ddr_p> may_alias_ddrs =
>>> LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
>>> - unsigned i, j;
>>> + vec<dr_addr_with_seg_len_pair_t>& comp_alias_ddrs =
>>> + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
>>> + int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>> + tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
>>> +
>>> + ddr_p ddr;
>>> + unsigned int i;
>>> + tree length_factor;
>>>
>>> if (dump_enabled_p ())
>>> dump_printf_loc (MSG_NOTE, vect_location,
>>> "=== vect_prune_runtime_alias_test_list ===\n");
>>>
>>> - for (i = 0; i < ddrs.length (); )
>>> + if (may_alias_ddrs.is_empty ())
>>> + return true;
>>> +
>>> + /* Basically, for each pair of dependent data refs store_ptr_0
>>> + and load_ptr_0, we create an expression:
>>> +
>>> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>> + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
>>> +
>>> + for aliasing checks. However, in some cases we can decrease
>>> + the number of checks by combining two checks into one. For
>>> + example, suppose we have another pair of data refs store_ptr_0
>>> + and load_ptr_1, and if the following condition is satisfied:
>>> +
>>> + load_ptr_0 < load_ptr_1 &&
>>> + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
>>> +
>>> + (this condition means, in each iteration of vectorized loop,
>>> + the accessed memory of store_ptr_0 cannot be between the memory
>>> + of load_ptr_0 and load_ptr_1.)
>>> +
>>> + we then can use only the following expression to finish the
>>> + alising checks between store_ptr_0 & load_ptr_0 and
>>> + store_ptr_0 & load_ptr_1:
>>> +
>>> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>> + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
>>> +
>>> + Note that we only consider that load_ptr_0 and load_ptr_1 have the
>>> + same basic address. */
>>> +
>>> + comp_alias_ddrs.create (may_alias_ddrs.length ());
>>> +
>>> + /* First, we collect all data ref pairs for aliasing checks. */
>>> + FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>>> {
>>> - bool found;
>>> - ddr_p ddr_i;
>>> + struct data_reference *dr_a, *dr_b;
>>> + gimple dr_group_first_a, dr_group_first_b;
>>> + tree segment_length_a, segment_length_b;
>>> + gimple stmt_a, stmt_b;
>>> +
>>> + dr_a = DDR_A (ddr);
>>> + stmt_a = DR_STMT (DDR_A (ddr));
>>> + dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>>> + if (dr_group_first_a)
>>> + {
>>> + stmt_a = dr_group_first_a;
>>> + dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>>> + }
>>>
>>> - ddr_i = ddrs[i];
>>> - found = false;
>>> + dr_b = DDR_B (ddr);
>>> + stmt_b = DR_STMT (DDR_B (ddr));
>>> + dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>>> + if (dr_group_first_b)
>>> + {
>>> + stmt_b = dr_group_first_b;
>>> + dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>>> + }
>>>
>>> - for (j = 0; j < i; j++)
>>> - {
>>> - ddr_p ddr_j = ddrs[j];
>>> + if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>>> + length_factor = scalar_loop_iters;
>>> + else
>>> + length_factor = size_int (vect_factor);
>>
>> ^^^^
>>
>> this is the thing I am worried about. When the steps are equal we
>> build reduced size segments to handle overlaps with a dependence
>> distance bigger than vect_factor.
>
>
> I don't quite understand here. Could you give me an example?
>
>
>>
>>> + segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>>> + segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>>> +
>>> + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
>>> + (dr_addr_with_seg_len
>>> + (dr_a, DR_BASE_ADDRESS (dr_a),
>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
>>> + segment_length_a),
>>> + dr_addr_with_seg_len
>>> + (dr_b, DR_BASE_ADDRESS (dr_b),
>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
>>> + segment_length_b));
>>> +
>>> + if (compare_tree (dr_with_seg_len_pair.first.basic_addr,
>>> + dr_with_seg_len_pair.second.basic_addr) > 0)
>>> + swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
>>> +
>>> + comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
>>> + }
>>>
>>> - if (vect_vfa_range_equal (ddr_i, ddr_j))
>>> + /* Second, we sort the collected data ref pairs so that we can scan
>>> + them once to combine all possible aliasing checks. */
>>> + comp_alias_ddrs.qsort (comp_dr_addr_with_seg_len_pair);
>>> +
>>> + /* Third, we scan the sorted dr pairs and check if we can combine
>>> + alias checks of two neighbouring dr pairs. */
>>> + for (size_t i = 1; i < comp_alias_ddrs.length (); ++i)
>>> + {
>>> + /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2). */
>>> + dr_addr_with_seg_len *dr_a1 = &comp_alias_ddrs[i-1].first,
>>> + *dr_b1 = &comp_alias_ddrs[i-1].second,
>>> + *dr_a2 = &comp_alias_ddrs[i].first,
>>> + *dr_b2 = &comp_alias_ddrs[i].second;
>>> +
>>> + /* Remove duplicate data ref pairs. */
>>> + if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2)
>>> + {
>>> + if (dump_enabled_p ())
>>> + {
>>> + dump_printf_loc (MSG_NOTE, vect_location,
>>> + "found equal ranges ");
>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>> + DR_REF (dr_a1->dr));
>>> + dump_printf (MSG_NOTE, ", ");
>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>> + DR_REF (dr_b1->dr));
>>> + dump_printf (MSG_NOTE, " and ");
>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>> + DR_REF (dr_a2->dr));
>>> + dump_printf (MSG_NOTE, ", ");
>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>> + DR_REF (dr_b2->dr));
>>> + dump_printf (MSG_NOTE, "\n");
>>> + }
>>> +
>>> + comp_alias_ddrs.ordered_remove (i--);
>>> + continue;
>>> + }
>>> +
>>> + if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2)
>>> + {
>>> + /* We consider the case that DR_B1 and DR_B2 are same memrefs,
>>> + and DR_A1 and DR_A2 are two consecutive memrefs. */
>>> + if (*dr_a1 == *dr_a2)
>>> + {
>>> + swap (dr_a1, dr_b1);
>>> + swap (dr_a2, dr_b2);
>>> + }
>>> +
>>> + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
>>> + || !host_integerp (dr_a1->offset, 0)
>>> + || !host_integerp (dr_a2->offset, 0))
>>> + continue;
>>> +
>>> + HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) -
>>> + widest_int_cst_value (dr_a1->offset);
>>> +
>>> +
>>> + /* Now we check if the following condition is satisfied:
>>> +
>>> + DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>>> +
>>> + where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However,
>>> + SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
>>> + have to make a best estimation. We can get the minimum value
>>> + of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B,
>>> + then either of the following two conditions can guarantee the
>>> + one above:
>>> +
>>> + 1: DIFF <= MIN_SEG_LEN_B
>>> + 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
>>> +
>>> + */
>>> +
>>> + HOST_WIDEST_INT
>>> + min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ?
>>> + widest_int_cst_value (dr_b1->seg_len) :
>>
>> don't use widest_int_cst_value but tree_int_cst_low () if you previously
>> checked host_integerp.
>
>
> OK. Done.
>
>
>>
>>> + vect_factor;
>>> +
>>> + if (diff <= min_seg_len_b
>>> + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
>>> + && diff - widest_int_cst_value (dr_a1->seg_len) <
>>> + min_seg_len_b))
>>> {
>>> if (dump_enabled_p ())
>>> {
>>> - dump_printf_loc (MSG_NOTE, vect_location,
>>> - "found equal ranges ");
>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>> - DR_REF (DDR_A (ddr_i)));
>>> - dump_printf (MSG_NOTE, ", ");
>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>> - DR_REF (DDR_B (ddr_i)));
>>> - dump_printf (MSG_NOTE, " and ");
>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>> - DR_REF (DDR_A (ddr_j)));
>>> - dump_printf (MSG_NOTE, ", ");
>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM,
>>> - DR_REF (DDR_B (ddr_j)));
>>> + dump_printf_loc
>>> + (MSG_NOTE, vect_location,
>>> + "merging two runtime checks for data references ");
>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr));
>>> + dump_printf (MSG_NOTE, " and ");
>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr));
>>> dump_printf (MSG_NOTE, "\n");
>>> }
>>> - found = true;
>>> - break;
>>> +
>>> + dr_a1->seg_len = size_binop (PLUS_EXPR,
>>> + dr_a2->seg_len, size_int (diff));
>>> + comp_alias_ddrs.ordered_remove (i--);
>>> }
>>> }
>>> -
>>> - if (found)
>>> - {
>>> - ddrs.ordered_remove (i);
>>> - continue;
>>> - }
>>> - i++;
>>> }
>>>
>>> - if (ddrs.length () >
>>> - (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
>>> + if ((int) comp_alias_ddrs.length () >
>>> + PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
>>> {
>>> if (dump_enabled_p ())
>>> {
>>> @@ -2718,8 +2945,6 @@ vect_prune_runtime_alias_test_list
>>> (loop_vec_info loop_vinfo)
>>> "generated checks exceeded.\n");
>>> }
>>>
>>> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0);
>>> -
>>> return false;
>>> }
>>>
>>> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
>>> index 574446a..1e03853 100644
>>> --- a/gcc/tree-vect-loop-manip.c
>>> +++ b/gcc/tree-vect-loop-manip.c
>>> @@ -2211,44 +2211,6 @@ vect_create_cond_for_align_checks
>>> (loop_vec_info loop_vinfo,
>>> *cond_expr = part_cond_expr;
>>> }
>>>
>>> -
>>> -/* Function vect_vfa_segment_size.
>>> -
>>> - Create an expression that computes the size of segment
>>> - that will be accessed for a data reference. The functions takes into
>>> - account that realignment loads may access one more vector.
>>> -
>>> - Input:
>>> - DR: The data reference.
>>> - LENGTH_FACTOR: segment length to consider.
>>> -
>>> - Return an expression whose value is the size of segment which will be
>>> - accessed by DR. */
>>> -
>>> -static tree
>>> -vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>>> -{
>>> - tree segment_length;
>>> -
>>> - if (integer_zerop (DR_STEP (dr)))
>>> - segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
>>> - else
>>> - segment_length = size_binop (MULT_EXPR,
>>> - fold_convert (sizetype, DR_STEP (dr)),
>>> - fold_convert (sizetype, length_factor));
>>> -
>>> - if (vect_supportable_dr_alignment (dr, false)
>>> - == dr_explicit_realign_optimized)
>>> - {
>>> - tree vector_size = TYPE_SIZE_UNIT
>>> - (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
>>> -
>>> - segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
>>> - }
>>> - return segment_length;
>>> -}
>>> -
>>> -
>>> /* Function vect_create_cond_for_alias_checks.
>>>
>>> Create a conditional expression that represents the run-time checks for
>>> @@ -2257,28 +2219,24 @@ vect_vfa_segment_size (struct data_reference
>>> *dr, tree length_factor)
>>>
>>> Input:
>>> COND_EXPR - input conditional expression. New conditions will be chained
>>> - with logical AND operation.
>>> + with logical AND operation. If it is NULL, then the function
>>> + is used to return the number of alias checks.
>>> LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs
>>> to be checked.
>>>
>>> Output:
>>> COND_EXPR - conditional expression.
>>>
>>> - The returned value is the conditional expression to be used in the if
>>> + The returned COND_EXPR is the conditional expression to be used in the if
>>> statement that controls which version of the loop gets executed at runtime.
>>> */
>>>
>>> -static void
>>> +void
>>> vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
>>> {
>>> - vec<ddr_p> may_alias_ddrs =
>>> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
>>> - int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>> - tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
>>> -
>>> - ddr_p ddr;
>>> - unsigned int i;
>>> - tree part_cond_expr, length_factor;
>>> + vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs =
>>> + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
>>> + tree part_cond_expr;
>>>
>>> /* Create expression
>>> ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>> @@ -2289,70 +2247,39 @@ vect_create_cond_for_alias_checks
>>> (loop_vec_info loop_vinfo, tree * cond_expr)
>>> ((store_ptr_n + store_segment_length_n) <= load_ptr_n)
>>> || (load_ptr_n + load_segment_length_n) <= store_ptr_n)) */
>>>
>>> - if (may_alias_ddrs.is_empty ())
>>> + if (comp_alias_ddrs.is_empty ())
>>> return;
>>>
>>> - FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>>> + for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i)
>>> {
>>> - struct data_reference *dr_a, *dr_b;
>>> - gimple dr_group_first_a, dr_group_first_b;
>>> - tree addr_base_a, addr_base_b;
>>> - tree segment_length_a, segment_length_b;
>>> - gimple stmt_a, stmt_b;
>>> - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
>>> -
>>> - dr_a = DDR_A (ddr);
>>> - stmt_a = DR_STMT (DDR_A (ddr));
>>> - dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>>> - if (dr_group_first_a)
>>> - {
>>> - stmt_a = dr_group_first_a;
>>> - dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>>> - }
>>> -
>>> - dr_b = DDR_B (ddr);
>>> - stmt_b = DR_STMT (DDR_B (ddr));
>>> - dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>>> - if (dr_group_first_b)
>>> - {
>>> - stmt_b = dr_group_first_b;
>>> - dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>>> - }
>>> + const dr_addr_with_seg_len& dr_a = comp_alias_ddrs[i].first;
>>> + const dr_addr_with_seg_len& dr_b = comp_alias_ddrs[i].second;
>>> + tree segment_length_a = dr_a.seg_len;
>>> + tree segment_length_b = dr_b.seg_len;
>>>
>>> - addr_base_a
>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
>>> - size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
>>> - DR_INIT (dr_a)));
>>> - addr_base_b
>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
>>> - size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
>>> - DR_INIT (dr_b)));
>>> -
>>> - if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>>> - length_factor = scalar_loop_iters;
>>> - else
>>> - length_factor = size_int (vect_factor);
>>> - segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>>> - segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>>> + tree addr_base_a
>>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
>>> + tree addr_base_b
>>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
>>>
>>> if (dump_enabled_p ())
>>> {
>>> dump_printf_loc (MSG_NOTE, vect_location,
>>> - "create runtime check for data references ");
>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
>>> + "create runtime check for data references ");
>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
>>> dump_printf (MSG_NOTE, " and ");
>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
>>> - dump_printf (MSG_NOTE, "\n");
>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
>>> + dump_printf (MSG_NOTE, "\n");
>>> }
>>>
>>> - seg_a_min = addr_base_a;
>>> - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>>> - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
>>> + tree seg_a_min = addr_base_a;
>>> + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>>> + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
>>> seg_a_min = seg_a_max, seg_a_max = addr_base_a;
>>>
>>> - seg_b_min = addr_base_b;
>>> - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>>> - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
>>> + tree seg_b_min = addr_base_b;
>>> + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>>> + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
>>> seg_b_min = seg_b_max, seg_b_max = addr_base_b;
>>>
>>> part_cond_expr =
>>> @@ -2370,7 +2297,9 @@ vect_create_cond_for_alias_checks (loop_vec_info
>>> loop_vinfo, tree * cond_expr)
>>> if (dump_enabled_p ())
>>> dump_printf_loc (MSG_NOTE, vect_location,
>>> "created %u versioning for alias checks.\n",
>>> - may_alias_ddrs.length ());
>>> + comp_alias_ddrs.length ());
>>> +
>>> + comp_alias_ddrs.release ();
>>> }
>>>
>>>
>>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>>> index 8b7b345..f231b62 100644
>>> --- a/gcc/tree-vectorizer.h
>>> +++ b/gcc/tree-vectorizer.h
>>> @@ -175,6 +175,36 @@ typedef struct _slp_oprnd_info
>>>
>>>
>>>
>>> +/* This struct is used to store the information of a data reference,
>>> + including the data ref itself, its basic address, the access offset
>>> + and the segment length for aliasing checks. This is used to generate
>>> + alias checks. */
>>> +
>>> +struct dr_addr_with_seg_len
>>> +{
>>> + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
>>> + : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
>>> +
>>> + data_reference *dr;
>>> + tree basic_addr;
>>> + tree offset;
>>> + tree seg_len;
>>> +};
>>> +
>>> +/* This struct contains two dr_addr_with_seg_len objects with aliasing data
>>> + refs. Two comparisons are generated from them. */
>>> +
>>> +struct dr_addr_with_seg_len_pair_t
>>> +{
>>> + dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1,
>>> + const dr_addr_with_seg_len& d2)
>>> + : first (d1), second (d2) {}
>>> +
>>> + dr_addr_with_seg_len first;
>>> + dr_addr_with_seg_len second;
>>> +};
>>> +
>>> +
>>> typedef struct _vect_peel_info
>>> {
>>> int npeel;
>>> @@ -274,6 +304,10 @@ typedef struct _loop_vec_info {
>>> for a run-time aliasing check. */
>>> vec<ddr_p> may_alias_ddrs;
>>>
>>> + /* Data Dependence Relations defining address ranges together with segment
>>> + lengths from which the run-time aliasing check is built. */
>>> + vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs;
>>> +
>>
>> Are you sure you release those on all paths that exit the vectorizer?
>
>
> You are right. This may not a good idea. I have removed this dump info.
>
>
> Thank you for you patient comment again! The updated patch is pasted below.
>
>
> thanks,
> Cong
>
>
>
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 8a38316..7333f6a 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2013-10-14 Cong Hou <congh@google.com>
> +
> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks):
> + Combine alias checks if it is possible to amortize the runtime
> + overhead. Return the number of alias checks after merging.
> + * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
> + Use the function vect_create_cond_for_alias_checks () to check
> + the number of alias checks.
> +
> 2013-10-14 David Malcolm <dmalcolm@redhat.com>
>
> * dumpfile.h (gcc::dump_manager): New class, to hold state
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 075d071..ea2782f 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2013-10-14 Cong Hou <congh@google.com>
> +
> + * gcc.dg/vect/vect-alias-check.c: New.
> +
> 2013-10-14 Tobias Burnus <burnus@net-b.de>
>
> PR fortran/58658
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
> b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
> new file mode 100644
> index 0000000..bf9eb6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize
> + --param=vect-max-version-for-alias-checks=2
> + -fdump-tree-vect-details" } */
> +
> +/* A test case showing three potential alias checks between
> + a[i] and b[i], b[i+7], b[i+14]. With alias checks merging
> + enabled, those tree checks can be merged into one, and the
> + loop will be vectorized with vect-max-version-for-alias-checks=2. */
> +
> +void foo (int *a, int *b)
> +{
> + int i;
> + for (i = 0; i < 1000; ++i)
> + a[i] = b[i] + b[i+7] + b[i+14];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index e7b2f33..ee6e6d0 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -128,41 +128,6 @@ vect_get_smallest_scalar_type (gimple stmt,
> HOST_WIDE_INT *lhs_size_unit,
> }
>
>
> -/* Check if data references pointed by DR_I and DR_J are same or
> - belong to same interleaving group. Return FALSE if drs are
> - different, otherwise return TRUE. */
> -
> -static bool
> -vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j)
> -{
> - gimple stmt_i = DR_STMT (dr_i);
> - gimple stmt_j = DR_STMT (dr_j);
> -
> - if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0)
> - || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
> - && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))
> - && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
> - == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)))))
> - return true;
> - else
> - return false;
> -}
> -
> -/* If address ranges represented by DDR_I and DDR_J are equal,
> - return TRUE, otherwise return FALSE. */
> -
> -static bool
> -vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j)
> -{
> - if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j))
> - && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j)))
> - || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j))
> - && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j))))
> - return true;
> - else
> - return false;
> -}
> -
> /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
> tested at run-time. Return TRUE if DDR was successfully inserted.
> Return false if versioning is not supported. */
> @@ -2647,69 +2612,320 @@ vect_analyze_data_ref_accesses (loop_vec_info
> loop_vinfo, bb_vec_info bb_vinfo)
> return true;
> }
>
> +
> +/* Operator == between two dr_addr_with_seg_len objects.
> +
> + This equality operator is used to make sure two data refs
> + are the same one so that we will consider to combine the
> + aliasing checks of those two pairs of data dependent data
> + refs. */
> +
> +static bool
> +operator == (const dr_addr_with_seg_len& d1,
> + const dr_addr_with_seg_len& d2)
> +{
> + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
> + && compare_tree (d1.offset, d2.offset) == 0
> + && compare_tree (d1.seg_len, d2.seg_len) == 0;
> +}
> +
> +/* Function comp_dr_addr_with_seg_len_pair.
> +
> + Comparison function for sorting objects of dr_addr_with_seg_len_pair_t
> + so that we can combine aliasing checks in one scan. */
> +
> +static int
> +comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_)
> +{
> + const dr_addr_with_seg_len_pair_t* p1 =
> + (const dr_addr_with_seg_len_pair_t *) p1_;
> + const dr_addr_with_seg_len_pair_t* p2 =
> + (const dr_addr_with_seg_len_pair_t *) p2_;
> +
> + const dr_addr_with_seg_len &p11 = p1->first,
> + &p12 = p1->second,
> + &p21 = p2->first,
> + &p22 = p2->second;
> +
> + int comp_res = compare_tree (p11.basic_addr, p21.basic_addr);
> + if (comp_res != 0)
> + return comp_res;
> +
> + comp_res = compare_tree (p12.basic_addr, p22.basic_addr);
> + if (comp_res != 0)
> + return comp_res;
> +
> + if (TREE_CODE (p11.offset) != INTEGER_CST
> + || TREE_CODE (p21.offset) != INTEGER_CST)
> + {
> + comp_res = compare_tree (p11.offset, p21.offset);
> + if (comp_res != 0)
> + return comp_res;
> + }
> + if (tree_int_cst_compare (p11.offset, p21.offset) < 0)
> + return -1;
> + if (tree_int_cst_compare (p11.offset, p21.offset) > 0)
> + return 1;
> + if (TREE_CODE (p12.offset) != INTEGER_CST
> + || TREE_CODE (p22.offset) != INTEGER_CST)
> + {
> + comp_res = compare_tree (p12.offset, p22.offset);
> + if (comp_res != 0)
> + return comp_res;
> + }
> + if (tree_int_cst_compare (p12.offset, p22.offset) < 0)
> + return -1;
> + if (tree_int_cst_compare (p12.offset, p22.offset) > 0)
> + return 1;
> +
> + return 0;
> +}
> +
> +template <class T> static void
> +swap (T& a, T& b)
> +{
> + T c (a);
> + a = b;
> + b = c;
> +}
> +
> +/* Function vect_vfa_segment_size.
> +
> + Create an expression that computes the size of segment
> + that will be accessed for a data reference. The functions takes into
> + account that realignment loads may access one more vector.
> +
> + Input:
> + DR: The data reference.
> + LENGTH_FACTOR: segment length to consider.
> +
> + Return an expression whose value is the size of segment which will be
> + accessed by DR. */
> +
> +static tree
> +vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
> +{
> + tree segment_length;
> +
> + if (integer_zerop (DR_STEP (dr)))
> + segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
> + else
> + segment_length = size_binop (MULT_EXPR,
> + fold_convert (sizetype, DR_STEP (dr)),
> + fold_convert (sizetype, length_factor));
> +
> + if (vect_supportable_dr_alignment (dr, false)
> + == dr_explicit_realign_optimized)
> + {
> + tree vector_size = TYPE_SIZE_UNIT
> + (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
> +
> + segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
> + }
> + return segment_length;
> +}
> +
> /* Function vect_prune_runtime_alias_test_list.
>
> Prune a list of ddrs to be tested at run-time by versioning for alias.
> + Merge several alias checks into one if possible.
> Return FALSE if resulting list of ddrs is longer then allowed by
> PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, otherwise return TRUE. */
>
> bool
> vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
> {
> - vec<ddr_p> ddrs =
> + vec<ddr_p> may_alias_ddrs =
> LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
> - unsigned i, j;
> + vec<dr_addr_with_seg_len_pair_t>& comp_alias_ddrs =
> + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
> + int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
> +
> + ddr_p ddr;
> + unsigned int i;
> + tree length_factor;
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> "=== vect_prune_runtime_alias_test_list ===\n");
>
> - for (i = 0; i < ddrs.length (); )
> + if (may_alias_ddrs.is_empty ())
> + return true;
> +
> + /* Basically, for each pair of dependent data refs store_ptr_0
> + and load_ptr_0, we create an expression:
> +
> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
> + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
> +
> + for aliasing checks. However, in some cases we can decrease
> + the number of checks by combining two checks into one. For
> + example, suppose we have another pair of data refs store_ptr_0
> + and load_ptr_1, and if the following condition is satisfied:
> +
> + load_ptr_0 < load_ptr_1 &&
> + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
> +
> + (this condition means, in each iteration of vectorized loop,
> + the accessed memory of store_ptr_0 cannot be between the memory
> + of load_ptr_0 and load_ptr_1.)
> +
> + we then can use only the following expression to finish the
> + alising checks between store_ptr_0 & load_ptr_0 and
> + store_ptr_0 & load_ptr_1:
> +
> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
> + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
> +
> + Note that we only consider that load_ptr_0 and load_ptr_1 have the
> + same basic address. */
> +
> + comp_alias_ddrs.create (may_alias_ddrs.length ());
> +
> + /* First, we collect all data ref pairs for aliasing checks. */
> + FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
> {
> - bool found;
> - ddr_p ddr_i;
> + struct data_reference *dr_a, *dr_b;
> + gimple dr_group_first_a, dr_group_first_b;
> + tree segment_length_a, segment_length_b;
> + gimple stmt_a, stmt_b;
> +
> + dr_a = DDR_A (ddr);
> + stmt_a = DR_STMT (DDR_A (ddr));
> + dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
> + if (dr_group_first_a)
> + {
> + stmt_a = dr_group_first_a;
> + dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
> + }
>
> - ddr_i = ddrs[i];
> - found = false;
> + dr_b = DDR_B (ddr);
> + stmt_b = DR_STMT (DDR_B (ddr));
> + dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
> + if (dr_group_first_b)
> + {
> + stmt_b = dr_group_first_b;
> + dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
> + }
>
> - for (j = 0; j < i; j++)
> - {
> - ddr_p ddr_j = ddrs[j];
> + if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
> + length_factor = scalar_loop_iters;
> + else
> + length_factor = size_int (vect_factor);
> + segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
> + segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
> +
> + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
> + (dr_addr_with_seg_len
> + (dr_a, DR_BASE_ADDRESS (dr_a),
> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
> + segment_length_a),
> + dr_addr_with_seg_len
> + (dr_b, DR_BASE_ADDRESS (dr_b),
> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
> + segment_length_b));
> +
> + if (compare_tree (dr_with_seg_len_pair.first.basic_addr,
> + dr_with_seg_len_pair.second.basic_addr) > 0)
> + swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
> +
> + comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
> + }
>
> - if (vect_vfa_range_equal (ddr_i, ddr_j))
> + /* Second, we sort the collected data ref pairs so that we can scan
> + them once to combine all possible aliasing checks. */
> + comp_alias_ddrs.qsort (comp_dr_addr_with_seg_len_pair);
> +
> + /* Third, we scan the sorted dr pairs and check if we can combine
> + alias checks of two neighbouring dr pairs. */
> + for (size_t i = 1; i < comp_alias_ddrs.length (); ++i)
> + {
> + /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2). */
> + dr_addr_with_seg_len *dr_a1 = &comp_alias_ddrs[i-1].first,
> + *dr_b1 = &comp_alias_ddrs[i-1].second,
> + *dr_a2 = &comp_alias_ddrs[i].first,
> + *dr_b2 = &comp_alias_ddrs[i].second;
> +
> + /* Remove duplicate data ref pairs. */
> + if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2)
> + {
> + if (dump_enabled_p ())
> {
> - if (dump_enabled_p ())
> - {
> - dump_printf_loc (MSG_NOTE, vect_location,
> - "found equal ranges ");
> - dump_generic_expr (MSG_NOTE, TDF_SLIM,
> - DR_REF (DDR_A (ddr_i)));
> - dump_printf (MSG_NOTE, ", ");
> - dump_generic_expr (MSG_NOTE, TDF_SLIM,
> - DR_REF (DDR_B (ddr_i)));
> - dump_printf (MSG_NOTE, " and ");
> - dump_generic_expr (MSG_NOTE, TDF_SLIM,
> - DR_REF (DDR_A (ddr_j)));
> - dump_printf (MSG_NOTE, ", ");
> - dump_generic_expr (MSG_NOTE, TDF_SLIM,
> - DR_REF (DDR_B (ddr_j)));
> - dump_printf (MSG_NOTE, "\n");
> - }
> - found = true;
> - break;
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "found equal ranges ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM,
> + DR_REF (dr_a1->dr));
> + dump_printf (MSG_NOTE, ", ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM,
> + DR_REF (dr_b1->dr));
> + dump_printf (MSG_NOTE, " and ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM,
> + DR_REF (dr_a2->dr));
> + dump_printf (MSG_NOTE, ", ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM,
> + DR_REF (dr_b2->dr));
> + dump_printf (MSG_NOTE, "\n");
> }
> +
> + comp_alias_ddrs.ordered_remove (i--);
> + continue;
> }
>
> - if (found)
> - {
> - ddrs.ordered_remove (i);
> - continue;
> - }
> - i++;
> + if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2)
> + {
> + /* We consider the case that DR_B1 and DR_B2 are same memrefs,
> + and DR_A1 and DR_A2 are two consecutive memrefs. */
> + if (*dr_a1 == *dr_a2)
> + {
> + swap (dr_a1, dr_b1);
> + swap (dr_a2, dr_b2);
> + }
> +
> + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
> + || !host_integerp (dr_a1->offset, 0)
> + || !host_integerp (dr_a2->offset, 0))
> + continue;
> +
> + HOST_WIDE_INT diff = TREE_INT_CST_LOW (dr_a2->offset) -
> + TREE_INT_CST_LOW (dr_a1->offset);
> +
> +
> + /* Now we check if the following condition is satisfied:
> +
> + DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
> +
> + where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However,
> + SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
> + have to make a best estimation. We can get the minimum value
> + of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B,
> + then either of the following two conditions can guarantee the
> + one above:
> +
> + 1: DIFF <= MIN_SEG_LEN_B
> + 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
> +
> + */
> +
> + HOST_WIDE_INT
> + min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ?
> + TREE_INT_CST_LOW (dr_b1->seg_len) :
> + vect_factor;
> +
> + if (diff <= min_seg_len_b
> + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
> + && diff - TREE_INT_CST_LOW (dr_a1->seg_len) <
> + min_seg_len_b))
> + {
> + dr_a1->seg_len = size_binop (PLUS_EXPR,
> + dr_a2->seg_len, size_int (diff));
> + comp_alias_ddrs.ordered_remove (i--);
> + }
> + }
> }
>
> - if (ddrs.length () >
> - (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
> + if ((int) comp_alias_ddrs.length () >
> + PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
> {
> if (dump_enabled_p ())
> {
> @@ -2718,8 +2934,6 @@ vect_prune_runtime_alias_test_list
> (loop_vec_info loop_vinfo)
> "generated checks exceeded.\n");
> }
>
> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0);
> -
> return false;
> }
>
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 574446a..1e03853 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -2211,44 +2211,6 @@ vect_create_cond_for_align_checks
> (loop_vec_info loop_vinfo,
> *cond_expr = part_cond_expr;
> }
>
> -
> -/* Function vect_vfa_segment_size.
> -
> - Create an expression that computes the size of segment
> - that will be accessed for a data reference. The functions takes into
> - account that realignment loads may access one more vector.
> -
> - Input:
> - DR: The data reference.
> - LENGTH_FACTOR: segment length to consider.
> -
> - Return an expression whose value is the size of segment which will be
> - accessed by DR. */
> -
> -static tree
> -vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
> -{
> - tree segment_length;
> -
> - if (integer_zerop (DR_STEP (dr)))
> - segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
> - else
> - segment_length = size_binop (MULT_EXPR,
> - fold_convert (sizetype, DR_STEP (dr)),
> - fold_convert (sizetype, length_factor));
> -
> - if (vect_supportable_dr_alignment (dr, false)
> - == dr_explicit_realign_optimized)
> - {
> - tree vector_size = TYPE_SIZE_UNIT
> - (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
> -
> - segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
> - }
> - return segment_length;
> -}
> -
> -
> /* Function vect_create_cond_for_alias_checks.
>
> Create a conditional expression that represents the run-time checks for
> @@ -2257,28 +2219,24 @@ vect_vfa_segment_size (struct data_reference
> *dr, tree length_factor)
>
> Input:
> COND_EXPR - input conditional expression. New conditions will be chained
> - with logical AND operation.
> + with logical AND operation. If it is NULL, then the function
> + is used to return the number of alias checks.
> LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs
> to be checked.
>
> Output:
> COND_EXPR - conditional expression.
>
> - The returned value is the conditional expression to be used in the if
> + The returned COND_EXPR is the conditional expression to be used in the if
> statement that controls which version of the loop gets executed at runtime.
> */
>
> -static void
> +void
> vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
> {
> - vec<ddr_p> may_alias_ddrs =
> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
> - int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> - tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
> -
> - ddr_p ddr;
> - unsigned int i;
> - tree part_cond_expr, length_factor;
> + vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs =
> + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
> + tree part_cond_expr;
>
> /* Create expression
> ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
> @@ -2289,70 +2247,39 @@ vect_create_cond_for_alias_checks
> (loop_vec_info loop_vinfo, tree * cond_expr)
> ((store_ptr_n + store_segment_length_n) <= load_ptr_n)
> || (load_ptr_n + load_segment_length_n) <= store_ptr_n)) */
>
> - if (may_alias_ddrs.is_empty ())
> + if (comp_alias_ddrs.is_empty ())
> return;
>
> - FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
> + for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i)
> {
> - struct data_reference *dr_a, *dr_b;
> - gimple dr_group_first_a, dr_group_first_b;
> - tree addr_base_a, addr_base_b;
> - tree segment_length_a, segment_length_b;
> - gimple stmt_a, stmt_b;
> - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
> -
> - dr_a = DDR_A (ddr);
> - stmt_a = DR_STMT (DDR_A (ddr));
> - dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
> - if (dr_group_first_a)
> - {
> - stmt_a = dr_group_first_a;
> - dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
> - }
> -
> - dr_b = DDR_B (ddr);
> - stmt_b = DR_STMT (DDR_B (ddr));
> - dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
> - if (dr_group_first_b)
> - {
> - stmt_b = dr_group_first_b;
> - dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
> - }
> + const dr_addr_with_seg_len& dr_a = comp_alias_ddrs[i].first;
> + const dr_addr_with_seg_len& dr_b = comp_alias_ddrs[i].second;
> + tree segment_length_a = dr_a.seg_len;
> + tree segment_length_b = dr_b.seg_len;
>
> - addr_base_a
> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
> - size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
> - DR_INIT (dr_a)));
> - addr_base_b
> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
> - size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
> - DR_INIT (dr_b)));
> -
> - if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
> - length_factor = scalar_loop_iters;
> - else
> - length_factor = size_int (vect_factor);
> - segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
> - segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
> + tree addr_base_a
> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
> + tree addr_base_b
> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
>
> if (dump_enabled_p ())
> {
> dump_printf_loc (MSG_NOTE, vect_location,
> - "create runtime check for data references ");
> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
> + "create runtime check for data references ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
> dump_printf (MSG_NOTE, " and ");
> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
> - dump_printf (MSG_NOTE, "\n");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
> + dump_printf (MSG_NOTE, "\n");
> }
>
> - seg_a_min = addr_base_a;
> - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
> - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
> + tree seg_a_min = addr_base_a;
> + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
> + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
> seg_a_min = seg_a_max, seg_a_max = addr_base_a;
>
> - seg_b_min = addr_base_b;
> - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
> - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
> + tree seg_b_min = addr_base_b;
> + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
> + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
> seg_b_min = seg_b_max, seg_b_max = addr_base_b;
>
> part_cond_expr =
> @@ -2370,7 +2297,9 @@ vect_create_cond_for_alias_checks (loop_vec_info
> loop_vinfo, tree * cond_expr)
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> "created %u versioning for alias checks.\n",
> - may_alias_ddrs.length ());
> + comp_alias_ddrs.length ());
> +
> + comp_alias_ddrs.release ();
> }
>
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 8b7b345..f231b62 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -175,6 +175,36 @@ typedef struct _slp_oprnd_info
>
>
>
> +/* This struct is used to store the information of a data reference,
> + including the data ref itself, its basic address, the access offset
> + and the segment length for aliasing checks. This is used to generate
> + alias checks. */
> +
> +struct dr_addr_with_seg_len
> +{
> + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
> + : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
> +
> + data_reference *dr;
> + tree basic_addr;
> + tree offset;
> + tree seg_len;
> +};
> +
> +/* This struct contains two dr_addr_with_seg_len objects with aliasing data
> + refs. Two comparisons are generated from them. */
> +
> +struct dr_addr_with_seg_len_pair_t
> +{
> + dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1,
> + const dr_addr_with_seg_len& d2)
> + : first (d1), second (d2) {}
> +
> + dr_addr_with_seg_len first;
> + dr_addr_with_seg_len second;
> +};
> +
> +
> typedef struct _vect_peel_info
> {
> int npeel;
> @@ -274,6 +304,10 @@ typedef struct _loop_vec_info {
> for a run-time aliasing check. */
> vec<ddr_p> may_alias_ddrs;
>
> + /* Data Dependence Relations defining address ranges together with segment
> + lengths from which the run-time aliasing check is built. */
> + vec<dr_addr_with_seg_len_pair_t> comp_alias_ddrs;
> +
> /* Statements in the loop that have data references that are candidates for a
> runtime (loop versioning) misalignment check. */
> vec<gimple> may_misalign_stmts;
> @@ -336,6 +370,7 @@ typedef struct _loop_vec_info {
> #define LOOP_VINFO_MAY_MISALIGN_STMTS(L) (L)->may_misalign_stmts
> #define LOOP_VINFO_LOC(L) (L)->loop_line_number
> #define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs
> +#define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs
> #define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores
> #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances
> #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
>
>
>
>
>
>
>
>>
>> Thanks,
>> Richard.
>>
>>> /* Statements in the loop that have data references that are candidates for a
>>> runtime (loop versioning) misalignment check. */
>>> vec<gimple> may_misalign_stmts;
>>> @@ -336,6 +370,7 @@ typedef struct _loop_vec_info {
>>> #define LOOP_VINFO_MAY_MISALIGN_STMTS(L) (L)->may_misalign_stmts
>>> #define LOOP_VINFO_LOC(L) (L)->loop_line_number
>>> #define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs
>>> +#define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs
>>> #define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores
>>> #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances
>>> #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>
>>>>> Cong
>>>>>
>>>>>
>>>>> On Thu, Oct 3, 2013 at 2:35 PM, Cong Hou <congh@google.com> wrote:
>>>>>>
>>>>>> Forget about this "aux" idea as the segment length for one data ref
>>>>>> can be different in different dr pairs.
>>>>>>
>>>>>> In my patch I created a struct as shown below:
>>>>>>
>>>>>> struct dr_addr_with_seg_len
>>>>>> {
>>>>>> data_reference *dr;
>>>>>> tree basic_addr;
>>>>>> tree offset;
>>>>>> tree seg_len;
>>>>>> };
>>>>>>
>>>>>>
>>>>>> Note that basic_addr and offset can always obtained from dr, but we
>>>>>> need to store two segment lengths for each dr pair. It is improper to
>>>>>> add a field to data_dependence_relation as it is defined outside of
>>>>>> vectorizer. We can change the type (a new one combining
>>>>>> data_dependence_relation and segment length) of may_alias_ddrs in
>>>>>> loop_vec_info to include such information, but we have to add a new
>>>>>> type to tree-vectorizer.h which is only used in two places - still too
>>>>>> much.
>>>>>>
>>>>>> One possible solution is that we create a local struct as shown above
>>>>>> and a new function which returns the merged alias check information.
>>>>>> This function will be called twice: once during analysis phase and
>>>>>> once in transformation phase. Then we don't have to store the merged
>>>>>> alias check information during those two phases. The additional time
>>>>>> cost is minimal as there will not be too many data dependent dr pairs
>>>>>> in a loop.
>>>>>>
>>>>>> Any comment?
>>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> Cong
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 3, 2013 at 10:57 AM, Cong Hou <congh@google.com> wrote:
>>>>>> > I noticed that there is a "struct dataref_aux" defined in
>>>>>> > tree-vectorizer.h which is specific to the vectorizer pass and is
>>>>>> > stored in (void*)aux in "struct data_reference". Can we add one more
>>>>>> > field "segment_length" to dataref_aux so that we can pass this
>>>>>> > information for merging alias checks? Then we can avoid to modify or
>>>>>> > create other structures.
>>>>>> >
>>>>>> >
>>>>>> > thanks,
>>>>>> > Cong
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Oct 2, 2013 at 2:34 PM, Cong Hou <congh@google.com> wrote:
>>>>>> >> On Wed, Oct 2, 2013 at 4:24 AM, Richard Biener <rguenther@suse.de> wrote:
>>>>>> >>> On Tue, 1 Oct 2013, Cong Hou wrote:
>>>>>> >>>
>>>>>> >>>> When alias exists between data refs in a loop, to vectorize it GCC
>>>>>> >>>> does loop versioning and adds runtime alias checks. Basically for each
>>>>>> >>>> pair of data refs with possible data dependence, there will be two
>>>>>> >>>> comparisons generated to make sure there is no aliasing between them
>>>>>> >>>> in each iteration of the vectorized loop. If there are many such data
>>>>>> >>>> refs pairs, the number of comparisons can be very large, which is a
>>>>>> >>>> big overhead.
>>>>>> >>>>
>>>>>> >>>> However, in some cases it is possible to reduce the number of those
>>>>>> >>>> comparisons. For example, for the following loop, we can detect that
>>>>>> >>>> b[0] and b[1] are two consecutive member accesses so that we can
>>>>>> >>>> combine the alias check between a[0:100]&b[0] and a[0:100]&b[1] into
>>>>>> >>>> checking a[0:100]&b[0:2]:
>>>>>> >>>>
>>>>>> >>>> void foo(int*a, int* b)
>>>>>> >>>> {
>>>>>> >>>> for (int i = 0; i < 100; ++i)
>>>>>> >>>> a[i] = b[0] + b[1];
>>>>>> >>>> }
>>>>>> >>>>
>>>>>> >>>> Actually, the requirement of consecutive memory accesses is too
>>>>>> >>>> strict. For the following loop, we can still combine the alias checks
>>>>>> >>>> between a[0:100]&b[0] and a[0:100]&b[100]:
>>>>>> >>>>
>>>>>> >>>> void foo(int*a, int* b)
>>>>>> >>>> {
>>>>>> >>>> for (int i = 0; i < 100; ++i)
>>>>>> >>>> a[i] = b[0] + b[100];
>>>>>> >>>> }
>>>>>> >>>>
>>>>>> >>>> This is because if b[0] is not in a[0:100] and b[100] is not in
>>>>>> >>>> a[0:100] then a[0:100] cannot be between b[0] and b[100]. We only need
>>>>>> >>>> to check a[0:100] and b[0:101] don't overlap.
>>>>>> >>>>
>>>>>> >>>> More generally, consider two pairs of data refs (a, b1) and (a, b2).
>>>>>> >>>> Suppose addr_b1 and addr_b2 are basic addresses of data ref b1 and b2;
>>>>>> >>>> offset_b1 and offset_b2 (offset_b1 < offset_b2) are offsets of b1 and
>>>>>> >>>> b2, and segment_length_a, segment_length_b1, and segment_length_b2 are
>>>>>> >>>> segment length of a, b1, and b2. Then we can combine the two
>>>>>> >>>> comparisons into one if the following condition is satisfied:
>>>>>> >>>>
>>>>>> >>>> offset_b2- offset_b1 - segment_length_b1 < segment_length_a
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> This patch detects those combination opportunities to reduce the
>>>>>> >>>> number of alias checks. It is tested on an x86-64 machine.
>>>>>> >>>
>>>>>> >>> Apart from the other comments you got (to which I agree) the patch
>>>>>> >>> seems to do two things, namely also:
>>>>>> >>>
>>>>>> >>> + /* Extract load and store statements on pointers with zero-stride
>>>>>> >>> + accesses. */
>>>>>> >>> + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
>>>>>> >>> + {
>>>>>> >>>
>>>>>> >>> which I'd rather see in a separate patch (and done also when
>>>>>> >>> the loop doesn't require versioning for alias).
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> My mistake.. I am working on those two patches at the same time and
>>>>>> >> pasted that one also here by mistake. I will send another patch about
>>>>>> >> the "hoist" topic.
>>>>>> >>
>>>>>> >>
>>>>>> >>> Also combining the alias checks in vect_create_cond_for_alias_checks
>>>>>> >>> is nice but doesn't properly fix the use of the
>>>>>> >>> vect-max-version-for-alias-checks param which currently inhibits
>>>>>> >>> vectorization of the HIMENO benchmark by default (and make us look bad
>>>>>> >>> compared to LLVM).
>>>>>> >>>
>>>>>> >>> So I believe this merging should be done incrementally when
>>>>>> >>> we collect the DDRs we need to test in vect_mark_for_runtime_alias_test.
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> I agree that vect-max-version-for-alias-checks param should count the
>>>>>> >> number of checks after the merge. However, the struct
>>>>>> >> data_dependence_relation could not record the new information produced
>>>>>> >> by the merge. The new information I mentioned contains the new segment
>>>>>> >> length for comparisons. This length is calculated right in
>>>>>> >> vect_create_cond_for_alias_checks() function. Since
>>>>>> >> vect-max-version-for-alias-checks is used during analysis phase, shall
>>>>>> >> we move all those (get segment length for each data ref and merge
>>>>>> >> alias checks) from transformation to analysis phase? If we cannot
>>>>>> >> store the result properly (data_dependence_relation is not enough),
>>>>>> >> shall we do it twice in both phases?
>>>>>> >>
>>>>>> >> I also noticed a possible bug in the function vect_same_range_drs()
>>>>>> >> called by vect_prune_runtime_alias_test_list(). For the following code
>>>>>> >> I get two pairs of data refs after
>>>>>> >> vect_prune_runtime_alias_test_list(), but in
>>>>>> >> vect_create_cond_for_alias_checks() after detecting grouped accesses I
>>>>>> >> got two identical pairs of data refs. The consequence is two identical
>>>>>> >> alias checks are produced.
>>>>>> >>
>>>>>> >>
>>>>>> >> void yuv2yuyv_ref (int *d, int *src, int n)
>>>>>> >> {
>>>>>> >> char *dest = (char *)d;
>>>>>> >> int i;
>>>>>> >>
>>>>>> >> for(i=0;i<n/2;i++){
>>>>>> >> dest[i*4 + 0] = (src[i*2 + 0])>>16;
>>>>>> >> dest[i*4 + 1] = (src[i*2 + 1])>>8;
>>>>>> >> dest[i*4 + 2] = (src[i*2 + 0])>>16;
>>>>>> >> dest[i*4 + 3] = (src[i*2 + 0])>>0;
>>>>>> >> }
>>>>>> >> }
>>>>>> >>
>>>>>> >>
>>>>>> >> I think the solution to this problem is changing
>>>>>> >>
>>>>>> >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i))
>>>>>> >> == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)
>>>>>> >>
>>>>>> >> into
>>>>>> >>
>>>>>> >> STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT
>>>>>> >> (vinfo_for_stmt (stmt_i))))
>>>>>> >> == STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT
>>>>>> >> (vinfo_for_stmt (stmt_j)))
>>>>>> >>
>>>>>> >>
>>>>>> >> in function vect_same_range_drs(). What do you think about it?
>>>>>> >>
>>>>>> >>
>>>>>> >> thanks,
>>>>>> >> Cong
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>> Thanks for working on this,
>>>>>> >>> Richard.
>>>>>> >>>
>>>>>> >>>>
>>>>>> >>>> thanks,
>>>>>> >>>> Cong
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> Index: gcc/tree-vect-loop-manip.c
>>>>>> >>>> ===================================================================
>>>>>> >>>> --- gcc/tree-vect-loop-manip.c (revision 202662)
>>>>>> >>>> +++ gcc/tree-vect-loop-manip.c (working copy)
>>>>>> >>>> @@ -19,6 +19,10 @@ You should have received a copy of the G
>>>>>> >>>> along with GCC; see the file COPYING3. If not see
>>>>>> >>>> <http://www.gnu.org/licenses/>. */
>>>>>> >>>>
>>>>>> >>>> +#include <vector>
>>>>>> >>>> +#include <utility>
>>>>>> >>>> +#include <algorithm>
>>>>>> >>>> +
>>>>>> >>>> #include "config.h"
>>>>>> >>>> #include "system.h"
>>>>>> >>>> #include "coretypes.h"
>>>>>> >>>> @@ -2248,6 +2252,74 @@ vect_vfa_segment_size (struct data_refer
>>>>>> >>>> return segment_length;
>>>>>> >>>> }
>>>>>> >>>>
>>>>>> >>>> +namespace
>>>>>> >>>> +{
>>>>>> >>>> +
>>>>>> >>>> +/* struct dr_addr_with_seg_len
>>>>>> >>>> +
>>>>>> >>>> + A struct storing information of a data reference, including the data
>>>>>> >>>> + ref itself, its basic address, the access offset and the segment length
>>>>>> >>>> + for aliasing checks. */
>>>>>> >>>> +
>>>>>> >>>> +struct dr_addr_with_seg_len
>>>>>> >>>> +{
>>>>>> >>>> + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len)
>>>>>> >>>> + : dr (d), basic_addr (addr), offset (off), seg_len (len) {}
>>>>>> >>>> +
>>>>>> >>>> + data_reference* dr;
>>>>>> >>>> + tree basic_addr;
>>>>>> >>>> + tree offset;
>>>>>> >>>> + tree seg_len;
>>>>>> >>>> +};
>>>>>> >>>> +
>>>>>> >>>> +/* Operator == between two dr_addr_with_seg_len objects.
>>>>>> >>>> +
>>>>>> >>>> + This equality operator is used to make sure two data refs
>>>>>> >>>> + are the same one so that we will consider to combine the
>>>>>> >>>> + aliasing checks of those two pairs of data dependent data
>>>>>> >>>> + refs. */
>>>>>> >>>> +
>>>>>> >>>> +bool operator == (const dr_addr_with_seg_len& d1,
>>>>>> >>>> + const dr_addr_with_seg_len& d2)
>>>>>> >>>> +{
>>>>>> >>>> + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0)
>>>>>> >>>> + && operand_equal_p (d1.offset, d2.offset, 0)
>>>>>> >>>> + && operand_equal_p (d1.seg_len, d2.seg_len, 0);
>>>>>> >>>> +}
>>>>>> >>>> +
>>>>>> >>>> +typedef std::pair <dr_addr_with_seg_len, dr_addr_with_seg_len>
>>>>>> >>>> + dr_addr_with_seg_len_pair_t;
>>>>>> >>>> +
>>>>>> >>>> +
>>>>>> >>>> +/* Operator < between two dr_addr_with_seg_len_pair_t objects.
>>>>>> >>>> +
>>>>>> >>>> + This operator is used to sort objects of dr_addr_with_seg_len_pair_t
>>>>>> >>>> + so that we can combine aliasing checks during one scan. */
>>>>>> >>>> +
>>>>>> >>>> +bool operator < (const dr_addr_with_seg_len_pair_t& p1,
>>>>>> >>>> + const dr_addr_with_seg_len_pair_t& p2)
>>>>>> >>>> +{
>>>>>> >>>> + const dr_addr_with_seg_len& p11 = p1.first;
>>>>>> >>>> + const dr_addr_with_seg_len& p12 = p1.second;
>>>>>> >>>> + const dr_addr_with_seg_len& p21 = p2.first;
>>>>>> >>>> + const dr_addr_with_seg_len& p22 = p2.second;
>>>>>> >>>> +
>>>>>> >>>> + if (p11.basic_addr != p21.basic_addr)
>>>>>> >>>> + return p11.basic_addr < p21.basic_addr;
>>>>>> >>>> + if (p12.basic_addr != p22.basic_addr)
>>>>>> >>>> + return p12.basic_addr < p22.basic_addr;
>>>>>> >>>> + if (TREE_CODE (p11.offset) != INTEGER_CST
>>>>>> >>>> + || TREE_CODE (p21.offset) != INTEGER_CST)
>>>>>> >>>> + return p11.offset < p21.offset;
>>>>>> >>>> + if (int_cst_value (p11.offset) != int_cst_value (p21.offset))
>>>>>> >>>> + return int_cst_value (p11.offset) < int_cst_value (p21.offset);
>>>>>> >>>> + if (TREE_CODE (p12.offset) != INTEGER_CST
>>>>>> >>>> + || TREE_CODE (p22.offset) != INTEGER_CST)
>>>>>> >>>> + return p12.offset < p22.offset;
>>>>>> >>>> + return int_cst_value (p12.offset) < int_cst_value (p22.offset);
>>>>>> >>>> +}
>>>>>> >>>> +
>>>>>> >>>> +}
>>>>>> >>>>
>>>>>> >>>> /* Function vect_create_cond_for_alias_checks.
>>>>>> >>>>
>>>>>> >>>> @@ -2292,20 +2364,51 @@ vect_create_cond_for_alias_checks (loop_
>>>>>> >>>> if (may_alias_ddrs.is_empty ())
>>>>>> >>>> return;
>>>>>> >>>>
>>>>>> >>>> +
>>>>>> >>>> + /* Basically, for each pair of dependent data refs store_ptr_0
>>>>>> >>>> + and load_ptr_0, we create an expression:
>>>>>> >>>> +
>>>>>> >>>> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>>>>> >>>> + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
>>>>>> >>>> +
>>>>>> >>>> + for aliasing checks. However, in some cases we can decrease
>>>>>> >>>> + the number of checks by combining two checks into one. For
>>>>>> >>>> + example, suppose we have another pair of data refs store_ptr_0
>>>>>> >>>> + and load_ptr_1, and if the following condition is satisfied:
>>>>>> >>>> +
>>>>>> >>>> + load_ptr_0 < load_ptr_1 &&
>>>>>> >>>> + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
>>>>>> >>>> +
>>>>>> >>>> + (this condition means, in each iteration of vectorized loop,
>>>>>> >>>> + the accessed memory of store_ptr_0 cannot be between the memory
>>>>>> >>>> + of load_ptr_0 and load_ptr_1.)
>>>>>> >>>> +
>>>>>> >>>> + we then can use only the following expression to finish the
>>>>>> >>>> + alising checks between store_ptr_0 & load_ptr_0 and
>>>>>> >>>> + store_ptr_0 & load_ptr_1:
>>>>>> >>>> +
>>>>>> >>>> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>>>>>> >>>> + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0))
>>>>>> >>>> +
>>>>>> >>>> + Note that we only consider that load_ptr_0 and load_ptr_1 have the
>>>>>> >>>> + same basic address. */
>>>>>> >>>> +
>>>>>> >>>> + std::vector<dr_addr_with_seg_len_pair_t> ddrs_with_seg_len;
>>>>>> >>>> +
>>>>>> >>>> + /* First, we collect all data ref pairs for aliasing checks. */
>>>>>> >>>> +
>>>>>> >>>> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>>>>>> >>>> {
>>>>>> >>>> struct data_reference *dr_a, *dr_b;
>>>>>> >>>> gimple dr_group_first_a, dr_group_first_b;
>>>>>> >>>> - tree addr_base_a, addr_base_b;
>>>>>> >>>> tree segment_length_a, segment_length_b;
>>>>>> >>>> gimple stmt_a, stmt_b;
>>>>>> >>>> - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
>>>>>> >>>>
>>>>>> >>>> dr_a = DDR_A (ddr);
>>>>>> >>>> stmt_a = DR_STMT (DDR_A (ddr));
>>>>>> >>>> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>>>>>> >>>> if (dr_group_first_a)
>>>>>> >>>> - {
>>>>>> >>>> + {
>>>>>> >>>> stmt_a = dr_group_first_a;
>>>>>> >>>> dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>>>>>> >>>> }
>>>>>> >>>> @@ -2314,20 +2417,11 @@ vect_create_cond_for_alias_checks (loop_
>>>>>> >>>> stmt_b = DR_STMT (DDR_B (ddr));
>>>>>> >>>> dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>>>>>> >>>> if (dr_group_first_b)
>>>>>> >>>> - {
>>>>>> >>>> + {
>>>>>> >>>> stmt_b = dr_group_first_b;
>>>>>> >>>> dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>>>>>> >>>> }
>>>>>> >>>>
>>>>>> >>>> - addr_base_a
>>>>>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
>>>>>> >>>> - size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
>>>>>> >>>> - DR_INIT (dr_a)));
>>>>>> >>>> - addr_base_b
>>>>>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
>>>>>> >>>> - size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
>>>>>> >>>> - DR_INIT (dr_b)));
>>>>>> >>>> -
>>>>>> >>>> if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
>>>>>> >>>> length_factor = scalar_loop_iters;
>>>>>> >>>> else
>>>>>> >>>> @@ -2335,24 +2429,149 @@ vect_create_cond_for_alias_checks (loop_
>>>>>> >>>> segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
>>>>>> >>>> segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>>>>>> >>>>
>>>>>> >>>> + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair
>>>>>> >>>> + (dr_addr_with_seg_len
>>>>>> >>>> + (dr_a, DR_BASE_ADDRESS (dr_a),
>>>>>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)),
>>>>>> >>>> + segment_length_a),
>>>>>> >>>> + dr_addr_with_seg_len
>>>>>> >>>> + (dr_b, DR_BASE_ADDRESS (dr_b),
>>>>>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)),
>>>>>> >>>> + segment_length_b));
>>>>>> >>>> +
>>>>>> >>>> + if (dr_with_seg_len_pair.first.basic_addr >
>>>>>> >>>> + dr_with_seg_len_pair.second.basic_addr)
>>>>>> >>>> + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
>>>>>> >>>> +
>>>>>> >>>> + ddrs_with_seg_len.push_back (dr_with_seg_len_pair);
>>>>>> >>>> + }
>>>>>> >>>> +
>>>>>> >>>> + /* Second, we sort the collected data ref pairs so that we can scan
>>>>>> >>>> + them once to combine all possible aliasing checks. */
>>>>>> >>>> +
>>>>>> >>>> + std::sort (ddrs_with_seg_len.begin(), ddrs_with_seg_len.end());
>>>>>> >>>> +
>>>>>> >>>> + /* Remove duplicate data ref pairs. */
>>>>>> >>>> + ddrs_with_seg_len.erase (std::unique (ddrs_with_seg_len.begin(),
>>>>>> >>>> + ddrs_with_seg_len.end()),
>>>>>> >>>> + ddrs_with_seg_len.end());
>>>>>> >>>> +
>>>>>> >>>> + /* We then scan the sorted dr pairs and check if we can combine
>>>>>> >>>> + alias checks of two neighbouring dr pairs. */
>>>>>> >>>> +
>>>>>> >>>> + for (size_t i = 1; i < ddrs_with_seg_len.size (); ++i)
>>>>>> >>>> + {
>>>>>> >>>> + dr_addr_with_seg_len& dr_a1 = ddrs_with_seg_len[i-1].first;
>>>>>> >>>> + dr_addr_with_seg_len& dr_b1 = ddrs_with_seg_len[i-1].second;
>>>>>> >>>> + dr_addr_with_seg_len& dr_a2 = ddrs_with_seg_len[i].first;
>>>>>> >>>> + dr_addr_with_seg_len& dr_b2 = ddrs_with_seg_len[i].second;
>>>>>> >>>> +
>>>>>> >>>> + if (dr_a1 == dr_a2)
>>>>>> >>>> + {
>>>>>> >>>> + if (dr_b1.basic_addr != dr_b2.basic_addr
>>>>>> >>>> + || TREE_CODE (dr_b1.offset) != INTEGER_CST
>>>>>> >>>> + || TREE_CODE (dr_b2.offset) != INTEGER_CST)
>>>>>> >>>> + continue;
>>>>>> >>>> +
>>>>>> >>>> + int diff = int_cst_value (dr_b2.offset) -
>>>>>> >>>> + int_cst_value (dr_b1.offset);
>>>>>> >>>> +
>>>>>> >>>> + gcc_assert (diff > 0);
>>>>>> >>>> +
>>>>>> >>>> + if (diff <= vect_factor
>>>>>> >>>> + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>>>>>> >>>> + && diff - int_cst_value (dr_b1.seg_len) < vect_factor)
>>>>>> >>>> + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>>>>>> >>>> + && TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>>>>>> >>>> + && diff - int_cst_value (dr_b1.seg_len) <
>>>>>> >>>> + int_cst_value (dr_a1.seg_len)))
>>>>>> >>>> + {
>>>>>> >>>> + if (dump_enabled_p ())
>>>>>> >>>> + {
>>>>>> >>>> + dump_printf_loc
>>>>>> >>>> + (MSG_NOTE, vect_location,
>>>>>> >>>> + "combining two runtime checks for data references ");
>>>>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1.dr));
>>>>>> >>>> + dump_printf (MSG_NOTE, " and ");
>>>>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2.dr));
>>>>>> >>>> + dump_printf (MSG_NOTE, "\n");
>>>>>> >>>> + }
>>>>>> >>>> +
>>>>>> >>>> + dr_b1.seg_len = size_binop (PLUS_EXPR,
>>>>>> >>>> + dr_b2.seg_len, size_int (diff));
>>>>>> >>>> + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i);
>>>>>> >>>> + --i;
>>>>>> >>>> + }
>>>>>> >>>> + }
>>>>>> >>>> + else if (dr_b1 == dr_b2)
>>>>>> >>>> + {
>>>>>> >>>> + if (dr_a1.basic_addr != dr_a2.basic_addr
>>>>>> >>>> + || TREE_CODE (dr_a1.offset) != INTEGER_CST
>>>>>> >>>> + || TREE_CODE (dr_a2.offset) != INTEGER_CST)
>>>>>> >>>> + continue;
>>>>>> >>>> +
>>>>>> >>>> + int diff = int_cst_value (dr_a2.offset) -
>>>>>> >>>> + int_cst_value (dr_a1.offset);
>>>>>> >>>> +
>>>>>> >>>> + gcc_assert (diff > 0);
>>>>>> >>>> +
>>>>>> >>>> + if (diff <= vect_factor
>>>>>> >>>> + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>>>>>> >>>> + && diff - int_cst_value (dr_a1.seg_len) < vect_factor)
>>>>>> >>>> + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST
>>>>>> >>>> + && TREE_CODE (dr_b1.seg_len) == INTEGER_CST
>>>>>> >>>> + && diff - int_cst_value (dr_a1.seg_len) <
>>>>>> >>>> + int_cst_value (dr_b1.seg_len)))
>>>>>> >>>> + {
>>>>>> >>>> + if (dump_enabled_p ())
>>>>>> >>>> + {
>>>>>> >>>> + dump_printf_loc
>>>>>> >>>> + (MSG_NOTE, vect_location,
>>>>>> >>>> + "combining two runtime checks for data references ");
>>>>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1.dr));
>>>>>> >>>> + dump_printf (MSG_NOTE, " and ");
>>>>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2.dr));
>>>>>> >>>> + dump_printf (MSG_NOTE, "\n");
>>>>>> >>>> + }
>>>>>> >>>> +
>>>>>> >>>> + dr_a1.seg_len = size_binop (PLUS_EXPR,
>>>>>> >>>> + dr_a2.seg_len, size_int (diff));
>>>>>> >>>> + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i);
>>>>>> >>>> + --i;
>>>>>> >>>> + }
>>>>>> >>>> + }
>>>>>> >>>> + }
>>>>>> >>>> +
>>>>>> >>>> + for (size_t i = 0, s = ddrs_with_seg_len.size (); i < s; ++i)
>>>>>> >>>> + {
>>>>>> >>>> + const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first;
>>>>>> >>>> + const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second;
>>>>>> >>>> + tree segment_length_a = dr_a.seg_len;
>>>>>> >>>> + tree segment_length_b = dr_b.seg_len;
>>>>>> >>>> +
>>>>>> >>>> + tree addr_base_a
>>>>>> >>>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset);
>>>>>> >>>> + tree addr_base_b
>>>>>> >>>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset);
>>>>>> >>>> +
>>>>>> >>>> if (dump_enabled_p ())
>>>>>> >>>> {
>>>>>> >>>> dump_printf_loc (MSG_NOTE, vect_location,
>>>>>> >>>> - "create runtime check for data references ");
>>>>>> >>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
>>>>>> >>>> + "create runtime check for data references ");
>>>>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr));
>>>>>> >>>> dump_printf (MSG_NOTE, " and ");
>>>>>> >>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
>>>>>> >>>> - dump_printf (MSG_NOTE, "\n");
>>>>>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr));
>>>>>> >>>> + dump_printf (MSG_NOTE, "\n");
>>>>>> >>>> }
>>>>>> >>>>
>>>>>> >>>> - seg_a_min = addr_base_a;
>>>>>> >>>> - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>>>>>> >>>> - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0)
>>>>>> >>>> + tree seg_a_min = addr_base_a;
>>>>>> >>>> + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
>>>>>> >>>> + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
>>>>>> >>>> seg_a_min = seg_a_max, seg_a_max = addr_base_a;
>>>>>> >>>>
>>>>>> >>>> - seg_b_min = addr_base_b;
>>>>>> >>>> - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>>>>>> >>>> - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0)
>>>>>> >>>> + tree seg_b_min = addr_base_b;
>>>>>> >>>> + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
>>>>>> >>>> + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
>>>>>> >>>> seg_b_min = seg_b_max, seg_b_max = addr_base_b;
>>>>>> >>>>
>>>>>> >>>> part_cond_expr =
>>>>>> >>>> @@ -2477,6 +2696,81 @@ vect_loop_versioning (loop_vec_info loop
>>>>>> >>>> adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
>>>>>> >>>> }
>>>>>> >>>>
>>>>>> >>>> + /* Extract load and store statements on pointers with zero-stride
>>>>>> >>>> + accesses. */
>>>>>> >>>> + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
>>>>>> >>>> + {
>>>>>> >>>> +
>>>>>> >>>> + /* In the loop body, we iterate each statement to check if it is a load
>>>>>> >>>> + or store. Then we check the DR_STEP of the data reference. If
>>>>>> >>>> + DR_STEP is zero, then we will hoist the load statement to the loop
>>>>>> >>>> + preheader, and move the store statement to the loop exit. */
>>>>>> >>>> +
>>>>>> >>>> + for (gimple_stmt_iterator si = gsi_start_bb (loop->header);
>>>>>> >>>> + !gsi_end_p (si); )
>>>>>> >>>> + {
>>>>>> >>>> + gimple stmt = gsi_stmt (si);
>>>>>> >>>> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>>>>> >>>> + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
>>>>>> >>>> +
>>>>>> >>>> +
>>>>>> >>>> + if (dr && integer_zerop (DR_STEP (dr)))
>>>>>> >>>> + {
>>>>>> >>>> + if (DR_IS_READ (dr))
>>>>>> >>>> + {
>>>>>> >>>> + if (dump_file)
>>>>>> >>>> + {
>>>>>> >>>> + fprintf (dump_file,
>>>>>> >>>> + "Hoist the load to outside of the loop:\n");
>>>>>> >>>> + print_gimple_stmt (dump_file, stmt, 0,
>>>>>> >>>> + TDF_VOPS|TDF_MEMSYMS);
>>>>>> >>>> + }
>>>>>> >>>> +
>>>>>> >>>> + basic_block preheader = loop_preheader_edge (loop)->src;
>>>>>> >>>> + gimple_stmt_iterator si_dst = gsi_last_bb (preheader);
>>>>>> >>>> + gsi_move_after (&si, &si_dst);
>>>>>> >>>> + }
>>>>>> >>>> + else
>>>>>> >>>> + {
>>>>>> >>>> + gimple_stmt_iterator si_dst =
>>>>>> >>>> + gsi_last_bb (single_exit (loop)->dest);
>>>>>> >>>> + gsi_move_after (&si, &si_dst);
>>>>>> >>>> + }
>>>>>> >>>> + continue;
>>>>>> >>>> + }
>>>>>> >>>> + else if (!dr)
>>>>>> >>>> + {
>>>>>> >>>> + bool hoist = true;
>>>>>> >>>> + for (size_t i = 0; i < gimple_num_ops (stmt); i++)
>>>>>> >>>> + {
>>>>>> >>>> + tree op = gimple_op (stmt, i);
>>>>>> >>>> + if (TREE_CODE (op) == INTEGER_CST
>>>>>> >>>> + || TREE_CODE (op) == REAL_CST)
>>>>>> >>>> + continue;
>>>>>> >>>> + if (TREE_CODE (op) == SSA_NAME)
>>>>>> >>>> + {
>>>>>> >>>> + gimple def = SSA_NAME_DEF_STMT (op);
>>>>>> >>>> + if (def == stmt
>>>>>> >>>> + || gimple_nop_p (def)
>>>>>> >>>> + || !flow_bb_inside_loop_p (loop, gimple_bb (def)))
>>>>>> >>>> + continue;
>>>>>> >>>> + }
>>>>>> >>>> + hoist = false;
>>>>>> >>>> + break;
>>>>>> >>>> + }
>>>>>> >>>> +
>>>>>> >>>> + if (hoist)
>>>>>> >>>> + {
>>>>>> >>>> + basic_block preheader = loop_preheader_edge (loop)->src;
>>>>>> >>>> + gimple_stmt_iterator si_dst = gsi_last_bb (preheader);
>>>>>> >>>> + gsi_move_after (&si, &si_dst);
>>>>>> >>>> + continue;
>>>>>> >>>> + }
>>>>>> >>>> + }
>>>>>> >>>> + gsi_next (&si);
>>>>>> >>>> + }
>>>>>> >>>> + }
>>>>>> >>>> +
>>>>>> >>>> /* End loop-exit-fixes after versioning. */
>>>>>> >>>>
>>>>>> >>>> if (cond_expr_stmt_list)
>>>>>> >>>> Index: gcc/ChangeLog
>>>>>> >>>> ===================================================================
>>>>>> >>>> --- gcc/ChangeLog (revision 202663)
>>>>>> >>>> +++ gcc/ChangeLog (working copy)
>>>>>> >>>> @@ -1,3 +1,8 @@
>>>>>> >>>> +2013-10-01 Cong Hou <congh@google.com>
>>>>>> >>>> +
>>>>>> >>>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Combine
>>>>>> >>>> + alias checks if it is possible to amortize the runtime overhead.
>>>>>> >>>> +
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Richard Biener <rguenther@suse.de>
>>>>>> >>> SUSE / SUSE Labs
>>>>>> >>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>>>>> >>> GF: Jeff Hawn, Jennifer Guild, Felix Imend