This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Handle data dependence relations with different bases
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <richard dot sandiford at linaro dot org>
- Date: Wed, 7 Jun 2017 12:16:56 +0200
- Subject: Re: Handle data dependence relations with different bases
- Authentication-results: sourceware.org; auth=none
- References: <87tw52s73r.fsf@linaro.org> <CAFiYyc0XFciO8BEitmMSuDQbVTc60FXiaCKtQxoiSQ_RjKM4dQ@mail.gmail.com> <CAFiYyc0y66Rc9OU_ocZrvNZ9ogjTdzc3VhcGAWhBUdOdKg4H5g@mail.gmail.com> <87a86sfsgg.fsf@linaro.org> <CAFiYyc3jfLvP7t06mJoWHHuQxKRQ1XdaMru+x1Q3rLxVizjPFg@mail.gmail.com> <87o9uqfvw4.fsf@linaro.org> <8760gho6ox.fsf@linaro.org>
On Wed, May 31, 2017 at 8:56 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Ping
>
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>> This patch tries to calculate conservatively-correct distance
>>>>>>> vectors for two references whose base addresses are not the same.
>>>>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>>>>>>> isn't guaranteed to occur.
>>>>>>>
>>>>>>> The motivating example is:
>>>>>>>
>>>>>>> struct s { int x[8]; };
>>>>>>> void
>>>>>>> f (struct s *a, struct s *b)
>>>>>>> {
>>>>>>> for (int i = 0; i < 8; ++i)
>>>>>>> a->x[i] += b->x[i];
>>>>>>> }
>>>>>>>
>>>>>>> in which the "a" and "b" accesses are either independent or have a
>>>>>>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case
>>>>>>> prevents vectorisation, so we can vectorise without an alias check.
>>>>>>>
>>>>>>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>>>>>>
>>>>>>> void
>>>>>>> f (int a[][8], struct b[][8])
>>>>>>> {
>>>>>>> for (int i = 0; i < 8; ++i)
>>>>>>> a[0][i] += b[0][i];
>>>>>>> }
>>>>>>>
>>>>>>> I think this is valid because C11 6.7.6.2/6 says:
>>>>>>>
>>>>>>> For two array types to be compatible, both shall have compatible
>>>>>>> element types, and if both size specifiers are present, and are
>>>>>>> integer constant expressions, then both size specifiers shall have
>>>>>>> the same constant value.
>>>>>>>
>>>>>>> So if we access an array through an int (*)[8], it must have type X[8]
>>>>>>> or X[], where X is compatible with int. It doesn't seem possible in
>>>>>>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>>>>>>
>>>>>>> However, Richard B said that (at least in gimple) we support arbitrary
>>>>>>> overlap of arrays and allow arrays to be accessed with different
>>>>>>> dimensionality. There are examples of this in PR50067. I've therefore
>>>>>>> only handled references that end in a structure field access.
>>>>>>>
>>>>>>> There are two ways of handling these dependences in the vectoriser:
>>>>>>> use them to limit VF, or check at runtime as before. I've gone for
>>>>>>> the approach of checking at runtime if we can, to avoid limiting VF
>>>>>>> unnecessarily. We still fall back to a VF cap when runtime checks
>>>>>>> aren't allowed.
>>>>>>>
>>>>>>> The patch tests whether we queued an alias check with a dependence
>>>>>>> distance of X and then picked a VF <= X, in which case it's safe to
>>>>>>> drop the alias check. Since vect_prune_runtime_alias_check_list can
>>>>>>> be called twice with different VF for the same loop, it's no longer
>>>>>>> safe to clear may_alias_ddrs on exit. Instead we should use
>>>>>>> comp_alias_ddrs to check whether versioning is necessary.
>>>>>>>
>>>>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>>>>>>
>>>>>> You seem to do your "fancy" thing but also later compute the old
>>>>>> base equality anyway (for same_base_p). It looks to me for this
>>>>>> case the new fancy code can be simply skipped, keeping num_dimensions
>>>>>> as before?
>>>>>>
>>>>>> + /* Try to approach equal type sizes. */
>>>>>> + if (!COMPLETE_TYPE_P (type_a)
>>>>>> + || !COMPLETE_TYPE_P (type_b)
>>>>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>>>>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>>>>>> + break;
>>>>>>
>>>>>> ah, interesting idea to avoid a quadratic search. Note that you should
>>>>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
>>>>>> as they are used for type-punning.
>>>>
>>>> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs,
>>>> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that
>>>> dr_analyze_indices allows, so I think we safe in terms of the tree codes.
>>>
>>> Yeah. I think we need to document that we should have a 1:1 match here.
>>
>> OK, I added that to the comments and also added an access_fn_component_p
>> that we can assert on.
>>
>>>>>> I see nonoverlapping_component_refs_of_decl_p should simply skip
>>>>>> ARRAY_REFs - but I also see there:
>>>>>>
>>>>>> /* ??? We cannot simply use the type of operand #0 of the refs here
>>>>>> as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>>>>> for common blocks instead of using unions like everyone else. */
>>>>>> tree type1 = DECL_CONTEXT (field1);
>>>>>> tree type2 = DECL_CONTEXT (field2);
>>>>>>
>>>>>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility.
>>>>>> You also may not use types_compatible_p here as for LTO that is _way_ too
>>>>>> lax for aggregates. The above uses
>>>>>>
>>>>>> /* We cannot disambiguate fields in a union or qualified union. */
>>>>>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>>>>> return false;
>>>>>>
>>>>>> so you should also bail out on unions here, rather than the check you do later.
>>>>
>>>> The loop stops before we get to a union, so I think "only" the RECORD_TYPE
>>>> COMPONENT_REF handling is a potential problem. Does this mean that
>>>> I should use the nonoverlapping_component_refs_of_decl_p code:
>>>>
>>>> tree field1 = TREE_OPERAND (ref1, 1);
>>>> tree field2 = TREE_OPERAND (ref2, 1);
>>>>
>>>> /* ??? We cannot simply use the type of operand #0 of the refs here
>>>> as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>>> for common blocks instead of using unions like everyone else. */
>>>> tree type1 = DECL_CONTEXT (field1);
>>>> tree type2 = DECL_CONTEXT (field2);
>>>>
>>>> /* We cannot disambiguate fields in a union or qualified union. */
>>>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>>> return false;
>>>>
>>>> if (field1 != field2)
>>>> {
>>>> /* A field and its representative need to be considered the
>>>> same. */
>>>> if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2
>>>> || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1)
>>>> return false;
>>>> /* Different fields of the same record type cannot overlap.
>>>> ??? Bitfields can overlap at RTL level so punt on them. */
>>>> if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
>>>> return false;
>>>> return true;
>>>> }
>>>>
>>>> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p
>>>> during the new loop?
>>>
>>> Yes. OTOH you want to "match" while the above disambiguates. So it means
>>> you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL
>>> equality (which should be the same in the end). The RTL concern
>>> should not matter
>>> here.
>>
>> The attached patch adds an access_fn_components_comparable_p helper
>> function that checks whether the DECL_CONTEXTs are the same.
>>
>>>> And test for this as well as unions in the outer
>>>> references?
>>>
>>> So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT,
>>> and you (should) stop the ref walk at DR_BASE_OBJECT.
>>
>> I was just thinking that if the Fortran front-end has cases in which
>> TREE_TYPE (TREE_OPERAND (ref, 0)) != DECL_CONTEXT (TREE_OPERAND (ref, 1))
>> for a COMPONENT_REF, should we treat that as equivalent to a union
>> access in ref_contains_union_access_p? But I'm not sure that's
>> necessary after all.
>>
>>> The dr_analyze_indices code is also somewhat fishy in that it simply
>>> ignores everything below unhandled component-refs even if there are
>>> indices involved (and it gets away with this because dependence
>>> analysis likely/hopefully gives up on the DR_BASE_OBJECT equality test
>>> in case it is sth like a[i].union for example ... hopefully ...).
>>
>> I think this is what you meant, but: I don't think the base object
>> itself can be a union, because we need at least one component reference
>> for the DR, and don't accept COMPONENT_REFs for unions as access functions.
>> So if the base involves a union, the base would also need to have a
>> COMPONENT_REF that selects a particular member of that union.
>>
>> And yeah, before the patch we did allow a dependence distance to be
>> calculated for a[i].union.f[j] vs. a[i].union.f[j + 1] (and still do
>> after the patch), on the basis that a[i].union.f refers to the same
>> object in both cases.
>>
>>>>>> You seem to rely on getting an access_fn entry for each handled_component_p.
>>>>>> It looks like this is the case -- we even seem to stop at unions
>>>>>> (with the same
>>>>>> fortran "issue"). I'm not sure that's the best thing to do but you
>>>>>> rely on that.
>>>>
>>>> Yeah, the loop is deliberately limited to the components associated with
>>>> an access_fn. I did wonder at first whether dr_analyze_indices should
>>>> store the original component reference trees for each access function.
>>>> That would make things simpler and more explicit, but would also eat up
>>>> more memory. Things like object_address_invariant_in_loop_p rely on the
>>>> access_fns in the same way that the loop in the patch does.
>>>
>>> in fact it fails to handle ARRAY_RANGE_REFs ...
>>
>> Yeah, the whole file seems to ignore those. What kind of code would
>> benefit?
This is currently only used by the Ada frontend so I'm not sure. It
would be also
non-trivial to handle them as they do not represent an independent dimension
but just adjust the index domain (and size, but that doesn't matter).
>>>>>> I don't understand the looping, it needs more comments. You seem to be
>>>>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions
>>>>>> is how many compatible refs you found on the way (with incompatible ones
>>>>>> not counting?!). What about an inner varying array of structs?
>>>>>> This seems to
>>>>>> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real
>>>>>> b[i].s.b[i].j?
>>>>
>>>> I'll try to improve the comments. But the idea is that both sequences are
>>>> as long as possible, while that still gives compatible types. If there is
>>>> more than one such sequence, we pick the one nearest the base.
>>>>
>>>> So in your example, the access functions would be:
>>>>
>>>> 0 1 2 3 4
>>>> a: .j [i] .b .s [i]
>>>>
>>>> 0 1 2 3 4 5
>>>> b: __real .j [i] .b .s [i]
>>>>
>>>> If a and b are pointers, the final access functions would be
>>>> unconstrained base accesses, so we'd end up with:
>>>>
>>>> a: [0, 3]
>>>> b: [1, 4]
>>>>
>>>> for both sequences.
>>>>
>>>>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
>>>>>> conveniently start from the other
>>>>>> end of the ref here.
>>>>>
>>>>> That said, for the motivational cases we either have one ref having
>>>>> more dimensions than the other (the __real vs. full complex access) or
>>>>> they have the same number of dimensions (and no access fn for the
>>>>> base).
>>>>>
>>>>> For the first case we should simply "drop" access_fns of the larger
>>>>> dimensional ref (from the start, plus outer component refs) up to the
>>>>> point the number of dimensions are equal.
>>>>
>>>> Yeah, that's what happens for your example. But if we had:
>>>>
>>>> a[i].s.c.d
>>>> __real b[i].s.b[i].j
>>>>
>>>> (where d is the same type as the real component) then the access
>>>> functions would be:
>>>>
>>>> 0 1 2 3
>>>> a: .d .c .s [i]
>>>>
>>>> 0 1 2 3 4 5
>>>> b: __real .j [i] .b .s [i]
>>>>
>>>> Comparing the a0/b2 column doesn't make sense, because one's an array
>>>> and the other is a structure. In this case the sequence we care about is:
>>>>
>>>> a: [1, 3]
>>>> b: [3, 5]
>>>>
>>>> which is what the loop gives. The a1/b3 column is the one that proves
>>>> there's no dependence.
>>>>
>>>>> Then we have the case of
>>>>>
>>>>> ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b))
>>>>>
>>>>> where we have to punt.
>>>>>
>>>>> Then we have the case of
>>>>>
>>>>> ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>>>>>
>>>>> which is where the new code should kick in to see if we can drop access_fns
>>>>> from the other end (as unanalyzable but either having distance zero or not
>>>>> aliased because of TBAA).
>>>>>
>>>>> At least your testcases suggest you do not want to handle
>>>>>
>>>>> struct s { int x[N]; };
>>>>> struct r { struct s s; };
>>>>> f (struct s *a, struct r *b)
>>>>> {
>>>>> for (i = 0; i < N; ++i)
>>>>> a->s.x[i] = b->x[i];
>>>>> }
>>>>>
>>>>> ?
>>>>>
>>>>> With this example your loop which seems to search for a "common"
>>>>> sequence in (different) midst of the reference trees makes more sense
>>>>> (still that loop is awkward to understand).
>>>>
>>>> Yeah, I want to handle that too, just hadn't thought of it as a specific
>>>> testcase. The code does give the expected dependence distance of 0.
>>>
>>> Ok.
>>>
>>> I think the patch is reasonable, maybe the loop can be restructured /
>>> simplified a bit and handling of the union case for example be done
>>> first (by looking at DR_BASE_OBJECT).
>>
>> I still prefer doing the loop first and keeping the "same base" check
>> together as a single condition, since it means that we're analysing the
>> reference in a single direction (DR_REF to base) rather than jumping
>> around. And unequal bases should be more common that equal ones.
>>
>> I think both orders involve doing potentially redundant work. The
>> current order tends towards doing redundant work for union accesses
>> and !flag_strict_aliasing, but they should be the less common cases.
>>
>> How does this look? Changes since v1:
>>
>> - Added access_fn_component_p to check for valid access function components.
>>
>> - Added access_fn_components_comparable_p instead of using
>> types_compatibloe_p directly.
>>
>> - Added more commentary.
>>
>> - Added local structures to represent the sequence, so that it's
>> more obvious which variables are temporaries and which aren't.
>>
>> - Added the test above to vect-alias-check-3.c.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.
This is ok.
Thanks,
Richard.
>> Thanks,
>> Richard
>>
>>
>> 2017-05-18 Richard Sandiford <richard.sandiford@linaro.org>
>>
>> gcc/
>>
>> * tree-data-ref.h (subscript): Add access_fn field.
>> (data_dependence_relation): Add could_be_independent_p.
>> (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros.
>> (same_access_functions): Move to tree-data-ref.c.
>> * tree-data-ref.c (ref_contains_union_access_p): New function.
>> (access_fn_component_p): Likewise.
>> (access_fn_components_comparable_p): Likewise.
>> (dr_analyze_indices): Add a comment that this code needs to be
>> kept in sync with access_fn_component_p.
>> (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of
>> DR_ACCESS_FN.
>> (constant_access_functions): Likewise.
>> (add_other_self_distances): Likewise.
>> (same_access_functions): Likewise. (Moved from tree-data-ref.h.)
>> (initialize_data_dependence_relation): Use XCNEW and remove
>> explicit zeroing of DDR_REVERSED_P. Look for a subsequence
>> of access functions that have the same type. Allow the
>> subsequence to end with different bases in some circumstances.
>> Record the chosen access functions in SUB_ACCESS_FN.
>> (build_classic_dist_vector_1): Replace ddr_a and ddr_b with
>> a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN.
>> (subscript_dependence_tester_1): Likewise dra and drb.
>> (build_classic_dist_vector): Update calls accordingly.
>> (subscript_dependence_tester): Likewise.
>> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check
>> DDR_COULD_BE_INDEPENDENT_P.
>> * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test
>> comp_alias_ddrs instead of may_alias_ddrs.
>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try
>> to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back
>> to using the recorded distance vectors if that fails.
>> (dependence_distance_ge_vf): New function.
>> (vect_prune_runtime_alias_test_list): Use it. Don't clear
>> LOOP_VINFO_MAY_ALIAS_DDRS.
>>
>> gcc/testsuite/
>> * gcc.dg/vect/vect-alias-check-3.c: New test.
>> * gcc.dg/vect/vect-alias-check-4.c: Likewise.
>> * gcc.dg/vect/vect-alias-check-5.c: Likewise.
>>
>> Index: gcc/tree-data-ref.h
>> ===================================================================
>> --- gcc/tree-data-ref.h 2017-05-04 11:36:51.157328631 +0100
>> +++ gcc/tree-data-ref.h 2017-05-18 07:51:50.871904726 +0100
>> @@ -191,6 +191,9 @@ struct conflict_function
>>
>> struct subscript
>> {
>> + /* The access functions of the two references. */
>> + tree access_fn[2];
>> +
>> /* A description of the iterations for which the elements are
>> accessed twice. */
>> conflict_function *conflicting_iterations_in_a;
>> @@ -209,6 +212,7 @@ struct subscript
>>
>> typedef struct subscript *subscript_p;
>>
>> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I]
>> #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a
>> #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b
>> #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict
>> @@ -264,6 +268,33 @@ struct data_dependence_relation
>> /* Set to true when the dependence relation is on the same data
>> access. */
>> bool self_reference_p;
>> +
>> + /* True if the dependence described is conservatively correct rather
>> + than exact, and if it is still possible for the accesses to be
>> + conditionally independent. For example, the a and b references in:
>> +
>> + struct s *a, *b;
>> + for (int i = 0; i < n; ++i)
>> + a->f[i] += b->f[i];
>> +
>> + conservatively have a distance vector of (0), for the case in which
>> + a == b, but the accesses are independent if a != b. Similarly,
>> + the a and b references in:
>> +
>> + struct s *a, *b;
>> + for (int i = 0; i < n; ++i)
>> + a[0].f[i] += b[i].f[i];
>> +
>> + conservatively have a distance vector of (0), but they are indepenent
>> + when a != b + i. In contrast, the references in:
>> +
>> + struct s *a;
>> + for (int i = 0; i < n; ++i)
>> + a->f[i] += a->f[i];
>> +
>> + have the same distance vector of (0), but the accesses can never be
>> + independent. */
>> + bool could_be_independent_p;
>> };
>>
>> typedef struct data_dependence_relation *ddr_p;
>> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \
>> #define DDR_DIST_VECT(DDR, I) \
>> DDR_DIST_VECTS (DDR)[I]
>> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
>> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p
>>
>>
>> bool dr_analyze_innermost (struct data_reference *, struct loop *);
>> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data
>> return false;
>>
>> return true;
>> -}
>> -
>> -/* Return true when the DDR contains two data references that have the
>> - same access functions. */
>> -
>> -static inline bool
>> -same_access_functions (const struct data_dependence_relation *ddr)
>> -{
>> - unsigned i;
>> -
>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
>> - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i),
>> - DR_ACCESS_FN (DDR_B (ddr), i)))
>> - return false;
>> -
>> - return true;
>> }
>>
>> /* Returns true when all the dependences are computable. */
>> Index: gcc/tree-data-ref.c
>> ===================================================================
>> --- gcc/tree-data-ref.c 2017-05-18 07:51:26.126377691 +0100
>> +++ gcc/tree-data-ref.c 2017-05-18 07:51:50.871904726 +0100
>> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o
>> } dependence_stats;
>>
>> static bool subscript_dependence_tester_1 (struct data_dependence_relation *,
>> - struct data_reference *,
>> - struct data_reference *,
>> + unsigned int, unsigned int,
>> struct loop *);
>> /* Returns true iff A divides B. */
>>
>> @@ -144,6 +143,21 @@ int_divides_p (int a, int b)
>> return ((b % a) == 0);
>> }
>>
>> +/* Return true if reference REF contains a union access. */
>> +
>> +static bool
>> +ref_contains_union_access_p (tree ref)
>> +{
>> + while (handled_component_p (ref))
>> + {
>> + ref = TREE_OPERAND (ref, 0);
>> + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE
>> + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE)
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>>
>>
>> /* Dump into FILE all the data references from DATAREFS. */
>> @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out
>> unsigned int i;
>> struct loop *loopi;
>>
>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
>> + subscript *sub;
>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
>> {
>> fprintf (outf, " access_fn_A: ");
>> - print_generic_stmt (outf, DR_ACCESS_FN (dra, i));
>> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0));
>> fprintf (outf, " access_fn_B: ");
>> - print_generic_stmt (outf, DR_ACCESS_FN (drb, i));
>> - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i));
>> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1));
>> + dump_subscript (outf, sub);
>> }
>>
>> fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr));
>> @@ -886,6 +901,27 @@ dr_analyze_innermost (struct data_refere
>> return true;
>> }
>>
>> +/* Return true if OP is a valid component reference for a DR access
>> + function. This accepts a subset of what handled_component_p accepts. */
>> +
>> +static bool
>> +access_fn_component_p (tree op)
>> +{
>> + switch (TREE_CODE (op))
>> + {
>> + case REALPART_EXPR:
>> + case IMAGPART_EXPR:
>> + case ARRAY_REF:
>> + return true;
>> +
>> + case COMPONENT_REF:
>> + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE;
>> +
>> + default:
>> + return false;
>> + }
>> +}
>> +
>> /* Determines the base object and the list of indices of memory reference
>> DR, analyzed in LOOP and instantiated in loop nest NEST. */
>>
>> @@ -923,7 +959,9 @@ dr_analyze_indices (struct data_referenc
>> access_fns.safe_push (integer_one_node);
>> }
>>
>> - /* Analyze access functions of dimensions we know to be independent. */
>> + /* Analyze access functions of dimensions we know to be independent.
>> + The list of component references handled here should be kept in
>> + sync with access_fn_component_p. */
>> while (handled_component_p (ref))
>> {
>> if (TREE_CODE (ref) == ARRAY_REF)
>> @@ -1472,6 +1510,27 @@ dr_may_alias_p (const struct data_refere
>> return refs_may_alias_p (addr_a, addr_b);
>> }
>>
>> +/* REF_A and REF_B both satisfy access_fns_comparable_p. Return true
>> + if it is meaningful to compare their associated access functions
>> + when checking for dependencies. */
>> +
>> +static bool
>> +access_fn_components_comparable_p (tree ref_a, tree ref_b)
>> +{
>> + if (TREE_CODE (ref_a) != TREE_CODE (ref_b))
>> + return false;
>> +
>> + if (TREE_CODE (ref_a) == COMPONENT_REF)
>> + /* ??? We cannot simply use the type of operand #0 of the refs here as
>> + the Fortran compiler smuggles type punning into COMPONENT_REFs.
>> + Use the DECL_CONTEXT of the FIELD_DECLs instead. */
>> + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1))
>> + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1)));
>> +
>> + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)),
>> + TREE_TYPE (TREE_OPERAND (ref_b, 0)));
>> +}
>> +
>> /* Initialize a data dependence relation between data accesses A and
>> B. NB_LOOPS is the number of loops surrounding the references: the
>> size of the classic distance/direction vectors. */
>> @@ -1484,11 +1543,10 @@ initialize_data_dependence_relation (str
>> struct data_dependence_relation *res;
>> unsigned int i;
>>
>> - res = XNEW (struct data_dependence_relation);
>> + res = XCNEW (struct data_dependence_relation);
>> DDR_A (res) = a;
>> DDR_B (res) = b;
>> DDR_LOOP_NEST (res).create (0);
>> - DDR_REVERSED_P (res) = false;
>> DDR_SUBSCRIPTS (res).create (0);
>> DDR_DIR_VECTS (res).create (0);
>> DDR_DIST_VECTS (res).create (0);
>> @@ -1506,82 +1564,277 @@ initialize_data_dependence_relation (str
>> return res;
>> }
>>
>> - /* The case where the references are exactly the same. */
>> - if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
>> + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
>> + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
>> + if (num_dimensions_a == 0 || num_dimensions_b == 0)
>> {
>> - if ((loop_nest.exists ()
>> - && !object_address_invariant_in_loop_p (loop_nest[0],
>> - DR_BASE_OBJECT (a)))
>> - || DR_NUM_DIMENSIONS (a) == 0)
>> + DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>> + return res;
>> + }
>> +
>> + /* For unconstrained bases, the root (highest-indexed) subscript
>> + describes a variation in the base of the original DR_REF rather
>> + than a component access. We have no type that accurately describes
>> + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after*
>> + applying this subscript) so limit the search to the last real
>> + component access.
>> +
>> + E.g. for:
>> +
>> + void
>> + f (int a[][8], int b[][8])
>> + {
>> + for (int i = 0; i < 8; ++i)
>> + a[i * 2][0] = b[i][0];
>> + }
>> +
>> + the a and b accesses have a single ARRAY_REF component reference [0]
>> + but have two subscripts. */
>> + if (DR_UNCONSTRAINED_BASE (a))
>> + num_dimensions_a -= 1;
>> + if (DR_UNCONSTRAINED_BASE (b))
>> + num_dimensions_b -= 1;
>> +
>> + /* These structures describe sequences of component references in
>> + DR_REF (A) and DR_REF (B). Each component reference is tied to a
>> + specific access function. */
>> + struct {
>> + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and
>> + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher
>> + indices. In C notation, these are the indices of the rightmost
>> + component references; e.g. for a sequence .b.c.d, the start
>> + index is for .d. */
>> + unsigned int start_a;
>> + unsigned int start_b;
>> +
>> + /* The sequence contains LENGTH consecutive access functions from
>> + each DR. */
>> + unsigned int length;
>> +
>> + /* The enclosing objects for the A and B sequences respectively,
>> + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1)
>> + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */
>> + tree object_a;
>> + tree object_b;
>> + } full_seq = {}, struct_seq = {};
>> +
>> + /* Before each iteration of the loop:
>> +
>> + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and
>> + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */
>> + unsigned int index_a = 0;
>> + unsigned int index_b = 0;
>> + tree ref_a = DR_REF (a);
>> + tree ref_b = DR_REF (b);
>> +
>> + /* Now walk the component references from the final DR_REFs back up to
>> + the enclosing base objects. Each component reference corresponds
>> + to one access function in the DR, with access function 0 being for
>> + the final DR_REF and the highest-indexed access function being the
>> + one that is applied to the base of the DR.
>> +
>> + Look for a sequence of component references whose access functions
>> + are comparable (see access_fn_components_comparable_p). If more
>> + than one such sequence exists, pick the one nearest the base
>> + (which is the leftmost sequence in C notation). Store this sequence
>> + in FULL_SEQ.
>> +
>> + For example, if we have:
>> +
>> + struct foo { struct bar s; ... } (*a)[10], (*b)[10];
>> +
>> + A: a[0][i].s.c.d
>> + B: __real b[0][i].s.e[i].f
>> +
>> + (where d is the same type as the real component of f) then the access
>> + functions would be:
>> +
>> + 0 1 2 3
>> + A: .d .c .s [i]
>> +
>> + 0 1 2 3 4 5
>> + B: __real .f [i] .e .s [i]
>> +
>> + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF
>> + and [i] is an ARRAY_REF. However, the A1/B3 column contains two
>> + COMPONENT_REF accesses for struct bar, so is comparable. Likewise
>> + the A2/B4 column contains two COMPONENT_REF accesses for struct foo,
>> + so is comparable. The A3/B5 column contains two ARRAY_REFs that
>> + index foo[10] arrays, so is again comparable. The sequence is
>> + therefore:
>> +
>> + A: [1, 3] (i.e. [i].s.c)
>> + B: [3, 5] (i.e. [i].s.e)
>> +
>> + Also look for sequences of component references whose access
>> + functions are comparable and whose enclosing objects have the same
>> + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above
>> + example, STRUCT_SEQ would be:
>> +
>> + A: [1, 2] (i.e. s.c)
>> + B: [3, 4] (i.e. s.e) */
>> + while (index_a < num_dimensions_a && index_b < num_dimensions_b)
>> + {
>> + /* REF_A and REF_B must be one of the component access types
>> + allowed by dr_analyze_indices. */
>> + gcc_checking_assert (access_fn_component_p (ref_a));
>> + gcc_checking_assert (access_fn_component_p (ref_b));
>> +
>> + /* Get the immediately-enclosing objects for REF_A and REF_B,
>> + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A)
>> + and DR_ACCESS_FN (B, INDEX_B). */
>> + tree object_a = TREE_OPERAND (ref_a, 0);
>> + tree object_b = TREE_OPERAND (ref_b, 0);
>> +
>> + tree type_a = TREE_TYPE (object_a);
>> + tree type_b = TREE_TYPE (object_b);
>> + if (access_fn_components_comparable_p (ref_a, ref_b))
>> + {
>> + /* This pair of component accesses is comparable for dependence
>> + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and
>> + DR_ACCESS_FN (B, INDEX_B) in the sequence. */
>> + if (full_seq.start_a + full_seq.length != index_a
>> + || full_seq.start_b + full_seq.length != index_b)
>> + {
>> + /* The accesses don't extend the current sequence,
>> + so start a new one here. */
>> + full_seq.start_a = index_a;
>> + full_seq.start_b = index_b;
>> + full_seq.length = 0;
>> + }
>> +
>> + /* Add this pair of references to the sequence. */
>> + full_seq.length += 1;
>> + full_seq.object_a = object_a;
>> + full_seq.object_b = object_b;
>> +
>> + /* If the enclosing objects are structures (and thus have the
>> + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */
>> + if (TREE_CODE (type_a) == RECORD_TYPE)
>> + struct_seq = full_seq;
>> +
>> + /* Move to the next containing reference for both A and B. */
>> + ref_a = object_a;
>> + ref_b = object_b;
>> + index_a += 1;
>> + index_b += 1;
>> + continue;
>> + }
>> +
>> + /* Try to approach equal type sizes. */
>> + if (!COMPLETE_TYPE_P (type_a)
>> + || !COMPLETE_TYPE_P (type_b)
>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>> + break;
>> +
>> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a));
>> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b));
>> + if (size_a <= size_b)
>> {
>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>> - return res;
>> + index_a += 1;
>> + ref_a = object_a;
>> + }
>> + if (size_b <= size_a)
>> + {
>> + index_b += 1;
>> + ref_b = object_b;
>> }
>> - DDR_AFFINE_P (res) = true;
>> - DDR_ARE_DEPENDENT (res) = NULL_TREE;
>> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a));
>> - DDR_LOOP_NEST (res) = loop_nest;
>> - DDR_INNER_LOOP (res) = 0;
>> - DDR_SELF_REFERENCE (res) = true;
>> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
>> - {
>> - struct subscript *subscript;
>> -
>> - subscript = XNEW (struct subscript);
>> - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
>> - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
>> - SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
>> - SUB_DISTANCE (subscript) = chrec_dont_know;
>> - DDR_SUBSCRIPTS (res).safe_push (subscript);
>> - }
>> - return res;
>> }
>>
>> - /* If the references do not access the same object, we do not know
>> - whether they alias or not. We do not care about TBAA or alignment
>> - info so we can use OEP_ADDRESS_OF to avoid false negatives.
>> - But the accesses have to use compatible types as otherwise the
>> - built indices would not match. */
>> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF)
>> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)),
>> - TREE_TYPE (DR_BASE_OBJECT (b))))
>> + /* See whether FULL_SEQ ends at the base and whether the two bases
>> + are equal. We do not care about TBAA or alignment info so we can
>> + use OEP_ADDRESS_OF to avoid false negatives. */
>> + tree base_a = DR_BASE_OBJECT (a);
>> + tree base_b = DR_BASE_OBJECT (b);
>> + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
>> + && full_seq.start_b + full_seq.length == num_dimensions_b
>> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
>> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>> + && types_compatible_p (TREE_TYPE (base_a),
>> + TREE_TYPE (base_b))
>> + && (!loop_nest.exists ()
>> + || (object_address_invariant_in_loop_p
>> + (loop_nest[0], base_a))));
>> +
>> + /* If the bases are the same, we can include the base variation too.
>> + E.g. the b accesses in:
>> +
>> + for (int i = 0; i < n; ++i)
>> + b[i + 4][0] = b[i][0];
>> +
>> + have a definite dependence distance of 4, while for:
>> +
>> + for (int i = 0; i < n; ++i)
>> + a[i + 4][0] = b[i][0];
>> +
>> + the dependence distance depends on the gap between a and b.
>> +
>> + If the bases are different then we can only rely on the sequence
>> + rooted at a structure access, since arrays are allowed to overlap
>> + arbitrarily and change shape arbitrarily. E.g. we treat this as
>> + valid code:
>> +
>> + int a[256];
>> + ...
>> + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0];
>> +
>> + where two lvalues with the same int[4][3] type overlap, and where
>> + both lvalues are distinct from the object's declared type. */
>> + if (same_base_p)
>> {
>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>> - return res;
>> + if (DR_UNCONSTRAINED_BASE (a))
>> + full_seq.length += 1;
>> }
>> + else
>> + full_seq = struct_seq;
>>
>> - /* If the base of the object is not invariant in the loop nest, we cannot
>> - analyze it. TODO -- in fact, it would suffice to record that there may
>> - be arbitrary dependences in the loops where the base object varies. */
>> - if ((loop_nest.exists ()
>> - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a)))
>> - || DR_NUM_DIMENSIONS (a) == 0)
>> + /* Punt if we didn't find a suitable sequence. */
>> + if (full_seq.length == 0)
>> {
>> DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>> return res;
>> }
>>
>> - /* If the number of dimensions of the access to not agree we can have
>> - a pointer access to a component of the array element type and an
>> - array access while the base-objects are still the same. Punt. */
>> - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b))
>> + if (!same_base_p)
>> {
>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>> - return res;
>> + /* Partial overlap is possible for different bases when strict aliasing
>> + is not in effect. It's also possible if either base involves a union
>> + access; e.g. for:
>> +
>> + struct s1 { int a[2]; };
>> + struct s2 { struct s1 b; int c; };
>> + struct s3 { int d; struct s1 e; };
>> + union u { struct s2 f; struct s3 g; } *p, *q;
>> +
>> + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at
>> + "p->g.e" (base "p->g") and might partially overlap the s1 at
>> + "q->g.e" (base "q->g"). */
>> + if (!flag_strict_aliasing
>> + || ref_contains_union_access_p (full_seq.object_a)
>> + || ref_contains_union_access_p (full_seq.object_b))
>> + {
>> + DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>> + return res;
>> + }
>> +
>> + DDR_COULD_BE_INDEPENDENT_P (res) = true;
>> }
>>
>> DDR_AFFINE_P (res) = true;
>> DDR_ARE_DEPENDENT (res) = NULL_TREE;
>> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a));
>> + DDR_SUBSCRIPTS (res).create (full_seq.length);
>> DDR_LOOP_NEST (res) = loop_nest;
>> DDR_INNER_LOOP (res) = 0;
>> DDR_SELF_REFERENCE (res) = false;
>>
>> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
>> + for (i = 0; i < full_seq.length; ++i)
>> {
>> struct subscript *subscript;
>>
>> subscript = XNEW (struct subscript);
>> + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
>> + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
>> SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
>> SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
>> SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
>> @@ -3163,14 +3416,15 @@ add_outer_distances (struct data_depende
>> }
>>
>> /* Return false when fail to represent the data dependence as a
>> - distance vector. INIT_B is set to true when a component has been
>> + distance vector. A_INDEX is the index of the first reference
>> + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the
>> + second reference. INIT_B is set to true when a component has been
>> added to the distance vector DIST_V. INDEX_CARRY is then set to
>> the index in DIST_V that carries the dependence. */
>>
>> static bool
>> build_classic_dist_vector_1 (struct data_dependence_relation *ddr,
>> - struct data_reference *ddr_a,
>> - struct data_reference *ddr_b,
>> + unsigned int a_index, unsigned int b_index,
>> lambda_vector dist_v, bool *init_b,
>> int *index_carry)
>> {
>> @@ -3188,8 +3442,8 @@ build_classic_dist_vector_1 (struct data
>> return false;
>> }
>>
>> - access_fn_a = DR_ACCESS_FN (ddr_a, i);
>> - access_fn_b = DR_ACCESS_FN (ddr_b, i);
>> + access_fn_a = SUB_ACCESS_FN (subscript, a_index);
>> + access_fn_b = SUB_ACCESS_FN (subscript, b_index);
>>
>> if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC
>> && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC)
>> @@ -3249,10 +3503,11 @@ build_classic_dist_vector_1 (struct data
>> constant_access_functions (const struct data_dependence_relation *ddr)
>> {
>> unsigned i;
>> + subscript *sub;
>>
>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
>> - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i))
>> - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i)))
>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
>> + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0))
>> + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1)))
>> return false;
>>
>> return true;
>> @@ -3315,10 +3570,11 @@ add_other_self_distances (struct data_de
>> lambda_vector dist_v;
>> unsigned i;
>> int index_carry = DDR_NB_LOOPS (ddr);
>> + subscript *sub;
>>
>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
>> {
>> - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i);
>> + tree access_fun = SUB_ACCESS_FN (sub, 0);
>>
>> if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC)
>> {
>> @@ -3330,7 +3586,7 @@ add_other_self_distances (struct data_de
>> return;
>> }
>>
>> - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0);
>> + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0);
>>
>> if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC)
>> add_multivariate_self_dist (ddr, access_fun);
>> @@ -3401,6 +3657,23 @@ add_distance_for_zero_overlaps (struct d
>> }
>> }
>>
>> +/* Return true when the DDR contains two data references that have the
>> + same access functions. */
>> +
>> +static inline bool
>> +same_access_functions (const struct data_dependence_relation *ddr)
>> +{
>> + unsigned i;
>> + subscript *sub;
>> +
>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
>> + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0),
>> + SUB_ACCESS_FN (sub, 1)))
>> + return false;
>> +
>> + return true;
>> +}
>> +
>> /* Compute the classic per loop distance vector. DDR is the data
>> dependence relation to build a vector from. Return false when fail
>> to represent the data dependence as a distance vector. */
>> @@ -3432,8 +3705,7 @@ build_classic_dist_vector (struct data_d
>> }
>>
>> dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>> - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr),
>> - dist_v, &init_b, &index_carry))
>> + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry))
>> return false;
>>
>> /* Save the distance vector if we initialized one. */
>> @@ -3466,12 +3738,11 @@ build_classic_dist_vector (struct data_d
>> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
>> {
>> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr),
>> - loop_nest))
>> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
>> return false;
>> compute_subscript_distance (ddr);
>> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr),
>> - save_v, &init_b, &index_carry))
>> + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
>> + &index_carry))
>> return false;
>> save_dist_v (ddr, save_v);
>> DDR_REVERSED_P (ddr) = true;
>> @@ -3507,12 +3778,10 @@ build_classic_dist_vector (struct data_d
>> {
>> lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>>
>> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr),
>> - DDR_A (ddr), loop_nest))
>> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
>> return false;
>> compute_subscript_distance (ddr);
>> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr),
>> - opposite_v, &init_b,
>> + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b,
>> &index_carry))
>> return false;
>>
>> @@ -3591,13 +3860,13 @@ build_classic_dir_vector (struct data_de
>> }
>> }
>>
>> -/* Helper function. Returns true when there is a dependence between
>> - data references DRA and DRB. */
>> +/* Helper function. Returns true when there is a dependence between the
>> + data references. A_INDEX is the index of the first reference (0 for
>> + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */
>>
>> static bool
>> subscript_dependence_tester_1 (struct data_dependence_relation *ddr,
>> - struct data_reference *dra,
>> - struct data_reference *drb,
>> + unsigned int a_index, unsigned int b_index,
>> struct loop *loop_nest)
>> {
>> unsigned int i;
>> @@ -3609,8 +3878,8 @@ subscript_dependence_tester_1 (struct da
>> {
>> conflict_function *overlaps_a, *overlaps_b;
>>
>> - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i),
>> - DR_ACCESS_FN (drb, i),
>> + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index),
>> + SUB_ACCESS_FN (subscript, b_index),
>> &overlaps_a, &overlaps_b,
>> &last_conflicts, loop_nest);
>>
>> @@ -3659,7 +3928,7 @@ subscript_dependence_tester_1 (struct da
>> subscript_dependence_tester (struct data_dependence_relation *ddr,
>> struct loop *loop_nest)
>> {
>> - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest))
>> + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest))
>> dependence_stats.num_dependence_dependent++;
>>
>> compute_subscript_distance (ddr);
>> Index: gcc/tree-ssa-loop-prefetch.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:26.127377591 +0100
>> +++ gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:50.871904726 +0100
>> @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop *
>> refb = (struct mem_ref *) DDR_B (dep)->aux;
>>
>> if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know
>> + || DDR_COULD_BE_INDEPENDENT_P (dep)
>> || DDR_NUM_DIST_VECTS (dep) == 0)
>> {
>> /* If the dependence cannot be analyzed, assume that there might be
>> Index: gcc/tree-vectorizer.h
>> ===================================================================
>> --- gcc/tree-vectorizer.h 2017-05-18 07:51:26.128377491 +0100
>> +++ gcc/tree-vectorizer.h 2017-05-18 07:51:50.872904626 +0100
>> @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L)
>> #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
>> ((L)->may_misalign_stmts.length () > 0)
>> #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \
>> - ((L)->may_alias_ddrs.length () > 0)
>> + ((L)->comp_alias_ddrs.length () > 0)
>> #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \
>> (LOOP_VINFO_NITERS_ASSUMPTIONS (L))
>> #define LOOP_REQUIRES_VERSIONING(L) \
>> Index: gcc/tree-vect-data-refs.c
>> ===================================================================
>> --- gcc/tree-vect-data-refs.c 2017-05-18 07:51:23.307659382 +0100
>> +++ gcc/tree-vect-data-refs.c 2017-05-18 07:51:50.872904626 +0100
>> @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct
>> }
>>
>> loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
>> +
>> + if (DDR_COULD_BE_INDEPENDENT_P (ddr))
>> + /* For dependence distances of 2 or more, we have the option of
>> + limiting VF or checking for an alias at runtime. Prefer to check
>> + at runtime if we can, to avoid limiting the VF unnecessarily when
>> + the bases are in fact independent.
>> +
>> + Note that the alias checks will be removed if the VF ends up
>> + being small enough. */
>> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>> + {
>> + int dist = dist_v[loop_depth];
>> + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr)))
>> + {
>> + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo))
>> + return false;
>> + break;
>> + }
>> + }
>> +
>> FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>> {
>> int dist = dist_v[loop_depth];
>> @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference *
>> return false;
>> }
>>
>> +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH
>> + in DDR is >= VF. */
>> +
>> +static bool
>> +dependence_distance_ge_vf (data_dependence_relation *ddr,
>> + unsigned int loop_depth, unsigned HOST_WIDE_INT vf)
>> +{
>> + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE
>> + || DDR_NUM_DIST_VECTS (ddr) == 0)
>> + return false;
>> +
>> + /* If the dependence is exact, we should have limited the VF instead. */
>> + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr));
>> +
>> + unsigned int i;
>> + lambda_vector dist_v;
>> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>> + {
>> + HOST_WIDE_INT dist = dist_v[loop_depth];
>> + if (dist != 0
>> + && !(dist > 0 && DDR_REVERSED_P (ddr))
>> + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf)
>> + return false;
>> + }
>> +
>> + if (dump_enabled_p ())
>> + {
>> + dump_printf_loc (MSG_NOTE, vect_location,
>> + "dependence distance between ");
>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr)));
>> + dump_printf (MSG_NOTE, " and ");
>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr)));
>> + dump_printf (MSG_NOTE, " is >= VF\n");
>> + }
>> +
>> + return true;
>> +}
>> +
>> /* Function vect_prune_runtime_alias_test_list.
>>
>> Prune a list of ddrs to be tested at run-time by versioning for alias.
>> @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop
>>
>> comp_alias_ddrs.create (may_alias_ddrs.length ());
>>
>> + unsigned int loop_depth
>> + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num,
>> + LOOP_VINFO_LOOP_NEST (loop_vinfo));
>> +
>> /* First, we collect all data ref pairs for aliasing checks. */
>> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>> {
>> @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop
>> tree segment_length_a, segment_length_b;
>> gimple *stmt_a, *stmt_b;
>>
>> + /* Ignore the alias if the VF we chose ended up being no greater
>> + than the dependence distance. */
>> + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor))
>> + continue;
>> +
>> dr_a = DDR_A (ddr);
>> stmt_a = DR_STMT (DDR_A (ddr));
>> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>> @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop
>> return false;
>> }
>>
>> - /* All alias checks have been resolved at compilation time. */
>> - if (!comp_alias_ddrs.length ())
>> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0);
>> -
>> return true;
>> }
>>
>> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c
>> ===================================================================
>> --- /dev/null 2017-05-17 17:16:48.996861112 +0100
>> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-18 07:51:50.870904826 +0100
>> @@ -0,0 +1,112 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */
>> +
>> +/* Intended to be larger than any VF. */
>> +#define GAP 128
>> +#define N (GAP * 3)
>> +
>> +struct s { int x[N + 1]; };
>> +struct t { struct s x[N + 1]; };
>> +struct u { int x[N + 1]; int y; };
>> +struct v { struct s s; };
>> +
>> +void
>> +f1 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a->x[i] += b->x[i];
>> +}
>> +
>> +void
>> +f2 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a[1].x[i] += b[2].x[i];
>> +}
>> +
>> +void
>> +f3 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a[1].x[i] += b[i].x[i];
>> +}
>> +
>> +void
>> +f4 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a[i].x[i] += b[i].x[i];
>> +}
>> +
>> +void
>> +f5 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a->x[i] += b->x[i + 1];
>> +}
>> +
>> +void
>> +f6 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a[1].x[i] += b[2].x[i + 1];
>> +}
>> +
>> +void
>> +f7 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a[1].x[i] += b[i].x[i + 1];
>> +}
>> +
>> +void
>> +f8 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a[i].x[i] += b[i].x[i + 1];
>> +}
>> +
>> +void
>> +f9 (struct s *a, struct t *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a->x[i] += b->x[1].x[i];
>> +}
>> +
>> +void
>> +f10 (struct s *a, struct t *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a->x[i] += b->x[i].x[i];
>> +}
>> +
>> +void
>> +f11 (struct u *a, struct u *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a->x[i] += b->x[i] + b[i].y;
>> +}
>> +
>> +void
>> +f12 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < GAP; ++i)
>> + a->x[i + GAP] += b->x[i];
>> +}
>> +
>> +void
>> +f13 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < GAP * 2; ++i)
>> + a->x[i + GAP] += b->x[i];
>> +}
>> +
>> +void
>> +f14 (struct v *a, struct s *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a->s.x[i] = b->x[i];
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 14 "vect" } } */
>> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c
>> ===================================================================
>> --- /dev/null 2017-05-17 17:16:48.996861112 +0100
>> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-18 07:51:50.870904826 +0100
>> @@ -0,0 +1,35 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */
>> +
>> +#define N 16
>> +
>> +struct s1 { int a[N]; };
>> +struct s2 { struct s1 b; int c; };
>> +struct s3 { int d; struct s1 e; };
>> +union u { struct s2 f; struct s3 g; };
>> +
>> +/* We allow a and b to overlap arbitrarily. */
>> +
>> +void
>> +f1 (int a[][N], int b[][N])
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a[0][i] += b[0][i];
>> +}
>> +
>> +void
>> +f2 (union u *a, union u *b)
>> +{
>> + for (int i = 0; i < N; ++i)
>> + a->f.b.a[i] += b->g.e.a[i];
>> +}
>> +
>> +void
>> +f3 (struct s1 *a, struct s1 *b)
>> +{
>> + for (int i = 0; i < N - 1; ++i)
>> + a->a[i + 1] += b->a[i];
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
>> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c
>> ===================================================================
>> --- /dev/null 2017-05-17 17:16:48.996861112 +0100
>> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-18 07:51:50.870904826 +0100
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* Intended to be larger than any VF. */
>> +#define GAP 128
>> +#define N (GAP * 3)
>> +
>> +struct s { int x[N]; };
>> +
>> +void
>> +f1 (struct s *a, struct s *b)
>> +{
>> + for (int i = 0; i < GAP * 2; ++i)
>> + a->x[i + GAP] += b->x[i];
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */
>> +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */
>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */