This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Handle data dependence relations with different bases
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: gcc-patches List <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <richard dot sandiford at linaro dot org>
- Date: Thu, 4 May 2017 10:23:43 +0100
- Subject: Re: Handle data dependence relations with different bases
- Authentication-results: sourceware.org; auth=none
- References: <87tw52s73r.fsf@linaro.org>
On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch tries to calculate conservatively-correct distance
> vectors for two references whose base addresses are not the same.
> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
> isn't guaranteed to occur.
>
> The motivating example is:
>
> struct s { int x[8]; };
> void
> f (struct s *a, struct s *b)
> {
> for (int i = 0; i < 8; ++i)
> a->x[i] += b->x[i];
> }
>
> in which the "a" and "b" accesses are either independent or have a
> dependence distance of 0 (assuming -fstrict-aliasing). Neither case
> prevents vectorisation, so we can vectorise without an alias check.
>
> I'd originally wanted to do the same thing for arrays as well, e.g.:
>
> void
> f (int a[][8], struct b[][8])
> {
> for (int i = 0; i < 8; ++i)
> a[0][i] += b[0][i];
> }
>
> I think this is valid because C11 6.7.6.2/6 says:
>
> For two array types to be compatible, both shall have compatible
> element types, and if both size specifiers are present, and are
> integer constant expressions, then both size specifiers shall have
> the same constant value.
>
> So if we access an array through an int (*)[8], it must have type X[8]
> or X[], where X is compatible with int. It doesn't seem possible in
> either case for "a[0]" and "b[0]" to overlap when "a != b".
>
> However, Richard B said that (at least in gimple) we support arbitrary
> overlap of arrays and allow arrays to be accessed with different
> dimensionality. There are examples of this in PR50067. I've therefore
> only handled references that end in a structure field access.
>
> There are two ways of handling these dependences in the vectoriser:
> use them to limit VF, or check at runtime as before. I've gone for
> the approach of checking at runtime if we can, to avoid limiting VF
> unnecessarily. We still fall back to a VF cap when runtime checks
> aren't allowed.
>
> The patch tests whether we queued an alias check with a dependence
> distance of X and then picked a VF <= X, in which case it's safe to
> drop the alias check. Since vect_prune_runtime_alias_check_list can
> be called twice with different VF for the same loop, it's no longer
> safe to clear may_alias_ddrs on exit. Instead we should use
> comp_alias_ddrs to check whether versioning is necessary.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
Hi Richard,
It's nice to explore more alias opportunity, below are some simple
comments embedded.
>
> Thanks,
> Richard
>
>
> gcc/
> 2017-05-03 Richard Sandiford <richard.sandiford@linaro.org>
>
> * tree-data-ref.h (subscript): Add access_fn field.
> (data_dependence_relation): Add could_be_independent_p.
> (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros.
> (same_access_functions): Move to tree-data-ref.c.
> * tree-data-ref.c (ref_contains_union_access_p): New function.
> (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of
> DR_ACCESS_FN.
> (constant_access_functions): Likewise.
> (add_other_self_distances): Likewise.
> (same_access_functions): Likewise. (Moved from tree-data-ref.h.)
> (initialize_data_dependence_relation): Use XCNEW and remove
> explicit zeroing of DDR_REVERSED_P. Look for a subsequence
> of access functions that have the same type. Allow the
> subsequence to end with different bases in some circumstances.
> Record the chosen access functions in SUB_ACCESS_FN.
> (build_classic_dist_vector_1): Replace ddr_a and ddr_b with
> a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN.
> (subscript_dependence_tester_1): Likewise dra and drb.
> (build_classic_dist_vector): Update calls accordingly.
> (subscript_dependence_tester): Likewise.
> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check
> DDR_COULD_BE_INDEPENDENT_P.
> * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test
> comp_alias_ddrs instead of may_alias_ddrs.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try
> to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back
> to using the recorded distance vectors if that fails.
> (dependence_distance_ge_vf): New function.
> (vect_prune_runtime_alias_test_list): Use it. Don't clear
> LOOP_VINFO_MAY_ALIAS_DDRS.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-alias-check-3.c: New test.
> * gcc.dg/vect/vect-alias-check-4.c: Likewise.
> * gcc.dg/vect/vect-alias-check-5.c: Likewise.
>
> Index: gcc/tree-data-ref.h
> ===================================================================
> --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100
> +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100
> @@ -191,6 +191,9 @@ struct conflict_function
>
> struct subscript
> {
> + /* The access functions of the two references. */
> + tree access_fn[2];
Is it better to follow existing code, i.e, name this as
access_fn_a/access_fn_b. Thus we don't need to use const value 0/1 in
various places, which is a little bit confusing.
> +
> /* A description of the iterations for which the elements are
> accessed twice. */
> conflict_function *conflicting_iterations_in_a;
> @@ -209,6 +212,7 @@ struct subscript
>
> typedef struct subscript *subscript_p;
>
> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I]
> #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a
> #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b
> #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict
> @@ -264,6 +268,33 @@ struct data_dependence_relation
> /* Set to true when the dependence relation is on the same data
> access. */
> bool self_reference_p;
> +
> + /* True if the dependence described is conservatively correct rather
> + than exact, and if it is still possible for the accesses to be
> + conditionally independent. For example, the a and b references in:
> +
> + struct s *a, *b;
> + for (int i = 0; i < n; ++i)
> + a->f[i] += b->f[i];
> +
> + conservatively have a distance vector of (0), for the case in which
> + a == b, but the accesses are independent if a != b. Similarly,
> + the a and b references in:
> +
> + struct s *a, *b;
> + for (int i = 0; i < n; ++i)
> + a[0].f[i] += b[i].f[i];
> +
> + conservatively have a distance vector of (0), but they are indepenent
> + when a != b + i. In contrast, the references in:
> +
> + struct s *a;
> + for (int i = 0; i < n; ++i)
> + a->f[i] += a->f[i];
> +
> + have the same distance vector of (0), but the accesses can never be
> + independent. */
> + bool could_be_independent_p;
> };
>
> typedef struct data_dependence_relation *ddr_p;
> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \
> #define DDR_DIST_VECT(DDR, I) \
> DDR_DIST_VECTS (DDR)[I]
> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p
>
>
> bool dr_analyze_innermost (struct data_reference *, struct loop *);
> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data
> return false;
>
> return true;
> -}
> -
> -/* Return true when the DDR contains two data references that have the
> - same access functions. */
> -
> -static inline bool
> -same_access_functions (const struct data_dependence_relation *ddr)
> -{
> - unsigned i;
> -
> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
> - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i),
> - DR_ACCESS_FN (DDR_B (ddr), i)))
> - return false;
> -
> - return true;
> }
>
> /* Returns true when all the dependences are computable. */
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000
> +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100
> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o
> } dependence_stats;
>
> static bool subscript_dependence_tester_1 (struct data_dependence_relation *,
> - struct data_reference *,
> - struct data_reference *,
> + unsigned int, unsigned int,
> struct loop *);
As mentioned, how about passing access_fn directly, rather than less
meaningful 0/1 values?
> /* Returns true iff A divides B. */
>
> @@ -144,6 +143,21 @@ int_divides_p (int a, int b)
> return ((b % a) == 0);
> }
>
> +/* Return true if reference REF contains a union access. */
> +
> +static bool
> +ref_contains_union_access_p (tree ref)
> +{
> + while (handled_component_p (ref))
> + {
> + ref = TREE_OPERAND (ref, 0);
> + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE
> + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE)
> + return true;
> + }
> + return false;
> +}
> +
>
>
> /* Dump into FILE all the data references from DATAREFS. */
> @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out
> unsigned int i;
> struct loop *loopi;
>
> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
> + subscript *sub;
> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
> {
> fprintf (outf, " access_fn_A: ");
> - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0);
> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0);
> fprintf (outf, " access_fn_B: ");
> - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0);
> - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i));
> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0);
> + dump_subscript (outf, sub);
> }
>
> fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr));
> @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str
> struct data_dependence_relation *res;
> unsigned int i;
>
> - res = XNEW (struct data_dependence_relation);
> + res = XCNEW (struct data_dependence_relation);
> DDR_A (res) = a;
> DDR_B (res) = b;
> DDR_LOOP_NEST (res).create (0);
> - DDR_REVERSED_P (res) = false;
> DDR_SUBSCRIPTS (res).create (0);
> DDR_DIR_VECTS (res).create (0);
> DDR_DIST_VECTS (res).create (0);
> @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str
> return res;
> }
>
> - /* The case where the references are exactly the same. */
> - if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
> + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
> + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
> + if (num_dimensions_a == 0 || num_dimensions_b == 0)
> {
> - if ((loop_nest.exists ()
> - && !object_address_invariant_in_loop_p (loop_nest[0],
> - DR_BASE_OBJECT (a)))
> - || DR_NUM_DIMENSIONS (a) == 0)
> - {
> - DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> - return res;
> - }
> - DDR_AFFINE_P (res) = true;
> - DDR_ARE_DEPENDENT (res) = NULL_TREE;
> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a));
> - DDR_LOOP_NEST (res) = loop_nest;
> - DDR_INNER_LOOP (res) = 0;
> - DDR_SELF_REFERENCE (res) = true;
> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> - {
> - struct subscript *subscript;
> + DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> + return res;
> + }
> +
> + /* For unconstrained bases, the outer (highest-index) subscript
> + describes a variation in the base of the original DR_REF rather
> + than a component access. We have no type that accurately describes
> + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after*
> + applying the outer subscript) so limit the search to the last real
> + component access.
> +
> + E.g. for:
>
> - subscript = XNEW (struct subscript);
> - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
> - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
> - SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
> - SUB_DISTANCE (subscript) = chrec_dont_know;
> - DDR_SUBSCRIPTS (res).safe_push (subscript);
> + void
> + f (int a[][8], int b[][8])
> + {
> + for (int i = 0; i < 8; ++i)
> + a[i * 2][0] = b[i][0];
> }
> - return res;
> +
> + the a and b accesses have a single ARRAY_REF component reference [0]
> + but have two subscripts. */
> + if (DR_UNCONSTRAINED_BASE (a))
> + num_dimensions_a -= 1;
> + if (DR_UNCONSTRAINED_BASE (b))
> + num_dimensions_b -= 1;
> +
> + /* Now look for two sequences of component references that have the same
> + type in both A and B. The first sequence includes an arbitrary mixture
> + of array and structure references while the second always ends on a
> + structure reference.
> +
> + The former (arbitrary) sequence uses access functions:
> +
> + [START_A, START_A + NUM_DIMENSIONS) of A
> + [START_B, START_B + NUM_DIMENSIONS) of B
> +
> + The latter sequence uses access functions:
> +
> + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A
> + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B
> +
> + STRUCT_REF_A and STRUCT_REF_B are the outer references for the
IIUC, A and B always share the same latter sequence, and the common
latter sequence ends at a structure reference providing alias
information. Is it possible to record the the former arbitrary
references instead of simple flag DDR_COULD_BE_INDEPENDENT_P. With
this information, alias check can be simplified by stripping away
address computation for the shared common sub-sequence. I doubt
vect_create_cond_for_alias_checks could detect this kind CSE for now.
Ah, I see you changed alias check code generation in order to handle
this.
> + latter sequence. */
> + unsigned int start_a = 0;
> + unsigned int start_b = 0;
> + unsigned int num_dimensions = 0;
> + unsigned int struct_start_a = 0;
> + unsigned int struct_start_b = 0;
> + unsigned int struct_num_dimensions = 0;
> + unsigned int index_a = 0;
> + unsigned int index_b = 0;
> + tree next_ref_a = DR_REF (a);
> + tree next_ref_b = DR_REF (b);
> + tree struct_ref_a = NULL_TREE;
> + tree struct_ref_b = NULL_TREE;
> + while (index_a < num_dimensions_a && index_b < num_dimensions_b)
> + {
> + gcc_checking_assert (handled_component_p (next_ref_a));
> + gcc_checking_assert (handled_component_p (next_ref_b));
> + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0);
> + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0);
> + tree type_a = TREE_TYPE (outer_ref_a);
> + tree type_b = TREE_TYPE (outer_ref_b);
> + if (types_compatible_p (type_a, type_b))
> + {
> + /* This pair of accesses belong to a suitable sequence. */
> + if (start_a + num_dimensions != index_a
> + || start_b + num_dimensions != index_b)
> + {
> + /* Start a new sequence here. */
> + start_a = index_a;
> + start_b = index_b;
> + num_dimensions = 0;
> + }
> + num_dimensions += 1;
> + if (TREE_CODE (type_a) == RECORD_TYPE)
> + {
> + struct_start_a = start_a;
> + struct_start_b = start_b;
> + struct_num_dimensions = num_dimensions;
> + struct_ref_a = outer_ref_a;
> + struct_ref_b = outer_ref_b;
> + }
> + next_ref_a = outer_ref_a;
> + next_ref_b = outer_ref_b;
> + index_a += 1;
> + index_b += 1;
> + continue;
> + }
> + /* Try to approach equal type sizes. */
> + if (!COMPLETE_TYPE_P (type_a)
> + || !COMPLETE_TYPE_P (type_b)
> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
> + break;
> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a));
> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b));
> + if (size_a <= size_b)
> + {
> + index_a += 1;
> + next_ref_a = outer_ref_a;
> + }
> + if (size_b <= size_a)
> + {
> + index_b += 1;
> + next_ref_b = outer_ref_b;
> + }
> }
>
> - /* If the references do not access the same object, we do not know
> - whether they alias or not. We do not care about TBAA or alignment
> - info so we can use OEP_ADDRESS_OF to avoid false negatives.
> - But the accesses have to use compatible types as otherwise the
> - built indices would not match. */
> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF)
> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)),
> - TREE_TYPE (DR_BASE_OBJECT (b))))
> + /* See whether the sequence ends at the base and whether the two bases
> + are equal. We do not care about TBAA or alignment info so we can use
> + OEP_ADDRESS_OF to avoid false negatives. */
> + tree base_a = DR_BASE_OBJECT (a);
> + tree base_b = DR_BASE_OBJECT (b);
> + bool same_base_p = (start_a + num_dimensions == num_dimensions_a
> + && start_b + num_dimensions == num_dimensions_b
> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
> + && types_compatible_p (TREE_TYPE (base_a),
> + TREE_TYPE (base_b))
> + && (!loop_nest.exists ()
> + || (object_address_invariant_in_loop_p
> + (loop_nest[0], base_a))));
Major change is in function initialize_data_dependence_relation in
order to detect partial alias opportunity. The original equality
check on DR_BASE_OBJECT is bypassed now. IMHO better to introduce a
new parameter to compute_data_reference_for_loop etc., indicating
whether we want to handle partial alias opportunity or not. After
all, such computation is unnecessary for predcom/prefetch/parloop.
It's only a waste of time computing it.
> +
> + /* If the bases are the same, we can include the base variation too.
> + E.g. the b accesses in:
> +
> + for (int i = 0; i < n; ++i)
> + b[i + 4][0] = b[i][0];
> +
> + have a definite dependence distance of 4, while for:
> +
> + for (int i = 0; i < n; ++i)
> + a[i + 4][0] = b[i][0];
> +
> + the dependence distance depends on the gap between a and b.
> +
> + If the bases are different then we can only rely on the sequence
> + rooted at a structure access, since arrays are allowed to overlap
> + arbitrarily and change shape arbitrarily. E.g. we treat this as
> + valid code:
> +
> + int a[256];
> + ...
> + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0];
> +
> + where two lvalues with the same int[4][3] type overlap, and where
> + both lvalues are distinct from the object's declared type. */
> + if (same_base_p)
> {
> - DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> - return res;
> + if (DR_UNCONSTRAINED_BASE (a))
> + num_dimensions += 1;
> + }
> + else
> + {
> + start_a = struct_start_a;
> + start_b = struct_start_b;
> + num_dimensions = struct_num_dimensions;
> }
>
> - /* If the base of the object is not invariant in the loop nest, we cannot
> - analyze it. TODO -- in fact, it would suffice to record that there may
> - be arbitrary dependences in the loops where the base object varies. */
> - if ((loop_nest.exists ()
> - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a)))
> - || DR_NUM_DIMENSIONS (a) == 0)
> + /* Punt if we didn't find a suitable sequence. */
> + if (num_dimensions == 0)
> {
> DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> return res;
> }
>
> - /* If the number of dimensions of the access to not agree we can have
> - a pointer access to a component of the array element type and an
> - array access while the base-objects are still the same. Punt. */
> - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b))
> + if (!same_base_p)
> {
> - DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> - return res;
> + /* Partial overlap is possible for different bases when strict aliasing
> + is not in effect. It's also possible if either base involves a union
> + access; e.g. for:
> +
> + struct s1 { int a[2]; };
> + struct s2 { struct s1 b; int c; };
> + struct s3 { int d; struct s1 e; };
> + union u { struct s2 f; struct s3 g; } *p, *q;
> +
> + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at
> + "p->g.e" (base "p->g") and might partially overlap the s1 at
> + "q->g.e" (base "q->g"). */
> + if (!flag_strict_aliasing
> + || ref_contains_union_access_p (struct_ref_a)
> + || ref_contains_union_access_p (struct_ref_b))
> + {
> + DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> + return res;
> + }
> +
> + DDR_COULD_BE_INDEPENDENT_P (res) = true;
> }
>
> DDR_AFFINE_P (res) = true;
> DDR_ARE_DEPENDENT (res) = NULL_TREE;
> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a));
> + DDR_SUBSCRIPTS (res).create (num_dimensions);
> DDR_LOOP_NEST (res) = loop_nest;
> DDR_INNER_LOOP (res) = 0;
> DDR_SELF_REFERENCE (res) = false;
>
> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> + for (i = 0; i < num_dimensions; ++i)
> {
> struct subscript *subscript;
>
> subscript = XNEW (struct subscript);
> + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i);
> + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i);
> SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
> SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
> SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
> @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende
> }
>
> /* Return false when fail to represent the data dependence as a
> - distance vector. INIT_B is set to true when a component has been
> + distance vector. A_INDEX is the index of the first reference
> + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the
> + second reference. INIT_B is set to true when a component has been
> added to the distance vector DIST_V. INDEX_CARRY is then set to
> the index in DIST_V that carries the dependence. */
>
> static bool
> build_classic_dist_vector_1 (struct data_dependence_relation *ddr,
> - struct data_reference *ddr_a,
> - struct data_reference *ddr_b,
> + unsigned int a_index, unsigned int b_index,
> lambda_vector dist_v, bool *init_b,
> int *index_carry)
> {
> @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data
> return false;
> }
>
> - access_fn_a = DR_ACCESS_FN (ddr_a, i);
> - access_fn_b = DR_ACCESS_FN (ddr_b, i);
> + access_fn_a = SUB_ACCESS_FN (subscript, a_index);
> + access_fn_b = SUB_ACCESS_FN (subscript, b_index);
>
> if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC
> && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC)
> @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data
> constant_access_functions (const struct data_dependence_relation *ddr)
> {
> unsigned i;
> + subscript *sub;
>
> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
> - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i))
> - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i)))
> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
> + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0))
> + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1)))
> return false;
>
> return true;
> @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de
> lambda_vector dist_v;
> unsigned i;
> int index_carry = DDR_NB_LOOPS (ddr);
> + subscript *sub;
>
> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
> {
> - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i);
> + tree access_fun = SUB_ACCESS_FN (sub, 0);
>
> if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC)
> {
> @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de
> return;
> }
>
> - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0);
> + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0);
>
> if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC)
> add_multivariate_self_dist (ddr, access_fun);
> @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d
> }
> }
>
> +/* Return true when the DDR contains two data references that have the
> + same access functions. */
> +
> +static inline bool
> +same_access_functions (const struct data_dependence_relation *ddr)
> +{
> + unsigned i;
> + subscript *sub;
> +
> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
> + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0),
> + SUB_ACCESS_FN (sub, 1)))
> + return false;
> +
> + return true;
> +}
> +
> /* Compute the classic per loop distance vector. DDR is the data
> dependence relation to build a vector from. Return false when fail
> to represent the data dependence as a distance vector. */
> @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d
> }
>
> dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
> - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr),
> - dist_v, &init_b, &index_carry))
> + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry))
> return false;
>
> /* Save the distance vector if we initialized one. */
> @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d
> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
> {
> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr),
> - loop_nest))
> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
> return false;
> compute_subscript_distance (ddr);
> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr),
> - save_v, &init_b, &index_carry))
> + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
> + &index_carry))
> return false;
> save_dist_v (ddr, save_v);
> DDR_REVERSED_P (ddr) = true;
> @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d
> {
> lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>
> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr),
> - DDR_A (ddr), loop_nest))
> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
> return false;
> compute_subscript_distance (ddr);
> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr),
> - opposite_v, &init_b,
> + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b,
> &index_carry))
> return false;
>
> @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de
> }
> }
>
> -/* Helper function. Returns true when there is a dependence between
> - data references DRA and DRB. */
> +/* Helper function. Returns true when there is a dependence between the
> + data references. A_INDEX is the index of the first reference (0 for
> + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */
>
> static bool
> subscript_dependence_tester_1 (struct data_dependence_relation *ddr,
> - struct data_reference *dra,
> - struct data_reference *drb,
> + unsigned int a_index, unsigned int b_index,
> struct loop *loop_nest)
> {
> unsigned int i;
> @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da
> {
> conflict_function *overlaps_a, *overlaps_b;
>
> - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i),
> - DR_ACCESS_FN (drb, i),
> + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index),
> + SUB_ACCESS_FN (subscript, b_index),
> &overlaps_a, &overlaps_b,
> &last_conflicts, loop_nest);
>
> @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da
> subscript_dependence_tester (struct data_dependence_relation *ddr,
> struct loop *loop_nest)
> {
> - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest))
> + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest))
> dependence_stats.num_dependence_dependent++;
>
> compute_subscript_distance (ddr);
> Index: gcc/tree-ssa-loop-prefetch.c
> ===================================================================
> --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100
> +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100
> @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop *
> refb = (struct mem_ref *) DDR_B (dep)->aux;
>
> if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know
> + || DDR_COULD_BE_INDEPENDENT_P (dep)
> || DDR_NUM_DIST_VECTS (dep) == 0)
> {
> /* If the dependence cannot be analyzed, assume that there might be
As said, we could avoid computing such information in the first place.
I can see predcom ingores it by explicitly checking DR_BASE_OBJECT,
what about tree-parloops.c?
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h 2017-03-28 16:19:28.000000000 +0100
> +++ gcc/tree-vectorizer.h 2017-05-03 08:48:48.738045102 +0100
> @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L)
> #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
> ((L)->may_misalign_stmts.length () > 0)
> #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \
> - ((L)->may_alias_ddrs.length () > 0)
> + ((L)->comp_alias_ddrs.length () > 0)
> #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \
> (LOOP_VINFO_NITERS_ASSUMPTIONS (L))
> #define LOOP_REQUIRES_VERSIONING(L) \
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c 2017-05-03 08:48:30.536704993 +0100
> +++ gcc/tree-vect-data-refs.c 2017-05-03 08:48:48.738045102 +0100
> @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct
> }
>
> loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
> +
> + if (DDR_COULD_BE_INDEPENDENT_P (ddr))
> + /* For dependence distances of 2 or more, we have the option of
> + limiting VF or checking for an alias at runtime. Prefer to check
> + at runtime if we can, to avoid limiting the VF unnecessarily when
> + the bases are in fact independent.
> +
> + Note that the alias checks will be removed if the VF ends up
> + being small enough. */
> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
> + {
> + int dist = dist_v[loop_depth];
> + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr)))
> + {
> + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo))
> + return false;
> + break;
> + }
> + }
In theory, we could end up with DDR pushed more than once for alias
test? Flag it in FOR loop and only mark it later?
Thanks,
bin
> +
> FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
> {
> int dist = dist_v[loop_depth];
> @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference *
> return false;
> }
>
> +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH
> + in DDR is >= VF. */
> +
> +static bool
> +dependence_distance_ge_vf (data_dependence_relation *ddr,
> + unsigned int loop_depth, unsigned HOST_WIDE_INT vf)
> +{
> + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE
> + || DDR_NUM_DIST_VECTS (ddr) == 0)
> + return false;
> +
> + /* If the dependence is exact, we should have limited the VF instead. */
> + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr));
> +
> + unsigned int i;
> + lambda_vector dist_v;
> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
> + {
> + HOST_WIDE_INT dist = dist_v[loop_depth];
> + if (dist != 0
> + && !(dist > 0 && DDR_REVERSED_P (ddr))
> + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf)
> + return false;
> + }
> +
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "dependence distance between ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr)));
> + dump_printf (MSG_NOTE, " and ");
> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr)));
> + dump_printf (MSG_NOTE, " is >= VF\n");
> + }
> +
> + return true;
> +}
> +
> /* Function vect_prune_runtime_alias_test_list.
>
> Prune a list of ddrs to be tested at run-time by versioning for alias.
> @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop
>
> comp_alias_ddrs.create (may_alias_ddrs.length ());
>
> + unsigned int loop_depth
> + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num,
> + LOOP_VINFO_LOOP_NEST (loop_vinfo));
> +
> /* First, we collect all data ref pairs for aliasing checks. */
> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
> {
> @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop
> tree segment_length_a, segment_length_b;
> gimple *stmt_a, *stmt_b;
>
> + /* Ignore the alias if the VF we chose ended up being no greater
> + than the dependence distance. */
> + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor))
> + continue;
> +
> dr_a = DDR_A (ddr);
> stmt_a = DR_STMT (DDR_A (ddr));
> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
> @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop
> return false;
> }
>
> - /* All alias checks have been resolved at compilation time. */
> - if (!comp_alias_ddrs.length ())
> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0);
> -
> return true;
> }
>
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c
> ===================================================================
> --- /dev/null 2017-05-03 08:16:26.972699664 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-03 08:48:48.736031902 +0100
> @@ -0,0 +1,104 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */
> +
> +/* Intended to be larger than any VF. */
> +#define GAP 128
> +#define N (GAP * 3)
> +
> +struct s { int x[N + 1]; };
> +struct t { struct s x[N + 1]; };
> +struct u { int x[N + 1]; int y; };
> +
> +void
> +f1 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a->x[i] += b->x[i];
> +}
> +
> +void
> +f2 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a[1].x[i] += b[2].x[i];
> +}
> +
> +void
> +f3 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a[1].x[i] += b[i].x[i];
> +}
> +
> +void
> +f4 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a[i].x[i] += b[i].x[i];
> +}
> +
> +void
> +f5 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a->x[i] += b->x[i + 1];
> +}
> +
> +void
> +f6 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a[1].x[i] += b[2].x[i + 1];
> +}
> +
> +void
> +f7 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a[1].x[i] += b[i].x[i + 1];
> +}
> +
> +void
> +f8 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a[i].x[i] += b[i].x[i + 1];
> +}
> +
> +void
> +f9 (struct s *a, struct t *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a->x[i] += b->x[1].x[i];
> +}
> +
> +void
> +f10 (struct s *a, struct t *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a->x[i] += b->x[i].x[i];
> +}
> +
> +void
> +f11 (struct u *a, struct u *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a->x[i] += b->x[i] + b[i].y;
> +}
> +
> +void
> +f12 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < GAP; ++i)
> + a->x[i + GAP] += b->x[i];
> +}
> +
> +void
> +f13 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < GAP * 2; ++i)
> + a->x[i + GAP] += b->x[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 13 "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c
> ===================================================================
> --- /dev/null 2017-05-03 08:16:26.972699664 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-03 08:48:48.736031902 +0100
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */
> +
> +#define N 16
> +
> +struct s1 { int a[N]; };
> +struct s2 { struct s1 b; int c; };
> +struct s3 { int d; struct s1 e; };
> +union u { struct s2 f; struct s3 g; };
> +
> +/* We allow a and b to overlap arbitrarily. */
> +
> +void
> +f1 (int a[][N], int b[][N])
> +{
> + for (int i = 0; i < N; ++i)
> + a[0][i] += b[0][i];
> +}
> +
> +void
> +f2 (union u *a, union u *b)
> +{
> + for (int i = 0; i < N; ++i)
> + a->f.b.a[i] += b->g.e.a[i];
> +}
> +
> +void
> +f3 (struct s1 *a, struct s1 *b)
> +{
> + for (int i = 0; i < N - 1; ++i)
> + a->a[i + 1] += b->a[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c
> ===================================================================
> --- /dev/null 2017-05-03 08:16:26.972699664 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-03 08:48:48.736031902 +0100
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* Intended to be larger than any VF. */
> +#define GAP 128
> +#define N (GAP * 3)
> +
> +struct s { int x[N]; };
> +
> +void
> +f1 (struct s *a, struct s *b)
> +{
> + for (int i = 0; i < GAP * 2; ++i)
> + a->x[i + GAP] += b->x[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */