This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Handle data dependence relations with different bases


Ping

Richard Sandiford <richard.sandiford@linaro.org> writes:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> This patch tries to calculate conservatively-correct distance
>>>>>> vectors for two references whose base addresses are not the same.
>>>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>>>>>> isn't guaranteed to occur.
>>>>>>
>>>>>> The motivating example is:
>>>>>>
>>>>>>   struct s { int x[8]; };
>>>>>>   void
>>>>>>   f (struct s *a, struct s *b)
>>>>>>   {
>>>>>>     for (int i = 0; i < 8; ++i)
>>>>>>       a->x[i] += b->x[i];
>>>>>>   }
>>>>>>
>>>>>> in which the "a" and "b" accesses are either independent or have a
>>>>>> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
>>>>>> prevents vectorisation, so we can vectorise without an alias check.
>>>>>>
>>>>>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>>>>>
>>>>>>   void
>>>>>>   f (int a[][8], struct b[][8])
>>>>>>   {
>>>>>>     for (int i = 0; i < 8; ++i)
>>>>>>       a[0][i] += b[0][i];
>>>>>>   }
>>>>>>
>>>>>> I think this is valid because C11 6.7.6.2/6 says:
>>>>>>
>>>>>>   For two array types to be compatible, both shall have compatible
>>>>>>   element types, and if both size specifiers are present, and are
>>>>>>   integer constant expressions, then both size specifiers shall have
>>>>>>   the same constant value.
>>>>>>
>>>>>> So if we access an array through an int (*)[8], it must have type X[8]
>>>>>> or X[], where X is compatible with int.  It doesn't seem possible in
>>>>>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>>>>>
>>>>>> However, Richard B said that (at least in gimple) we support arbitrary
>>>>>> overlap of arrays and allow arrays to be accessed with different
>>>>>> dimensionality.  There are examples of this in PR50067.  I've therefore
>>>>>> only handled references that end in a structure field access.
>>>>>>
>>>>>> There are two ways of handling these dependences in the vectoriser:
>>>>>> use them to limit VF, or check at runtime as before.  I've gone for
>>>>>> the approach of checking at runtime if we can, to avoid limiting VF
>>>>>> unnecessarily.  We still fall back to a VF cap when runtime checks
>>>>>> aren't allowed.
>>>>>>
>>>>>> The patch tests whether we queued an alias check with a dependence
>>>>>> distance of X and then picked a VF <= X, in which case it's safe to
>>>>>> drop the alias check.  Since vect_prune_runtime_alias_check_list can
>>>>>> be called twice with different VF for the same loop, it's no longer
>>>>>> safe to clear may_alias_ddrs on exit.  Instead we should use
>>>>>> comp_alias_ddrs to check whether versioning is necessary.
>>>>>>
>>>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>>>
>>>>> You seem to do your "fancy" thing but also later compute the old
>>>>> base equality anyway (for same_base_p).  It looks to me for this
>>>>> case the new fancy code can be simply skipped, keeping num_dimensions
>>>>> as before?
>>>>>
>>>>> +      /* Try to approach equal type sizes.  */
>>>>> +      if (!COMPLETE_TYPE_P (type_a)
>>>>> +         || !COMPLETE_TYPE_P (type_b)
>>>>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>>>>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>>>>> +       break;
>>>>>
>>>>> ah, interesting idea to avoid a quadratic search.  Note that you should
>>>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
>>>>> as they are used for type-punning.
>>>
>>> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs,
>>> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that
>>> dr_analyze_indices allows, so I think we safe in terms of the tree codes.
>>
>> Yeah.  I think we need to document that we should have a 1:1 match here.
>
> OK, I added that to the comments and also added an access_fn_component_p
> that we can assert on.
>
>>>>> I see nonoverlapping_component_refs_of_decl_p should simply skip
>>>>> ARRAY_REFs - but I also see there:
>>>>>
>>>>>       /* ??? We cannot simply use the type of operand #0 of the refs here
>>>>>          as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>>>>          for common blocks instead of using unions like everyone else.  */
>>>>>       tree type1 = DECL_CONTEXT (field1);
>>>>>       tree type2 = DECL_CONTEXT (field2);
>>>>>
>>>>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility.
>>>>> You also may not use types_compatible_p here as for LTO that is _way_ too
>>>>> lax for aggregates.  The above uses
>>>>>
>>>>>       /* We cannot disambiguate fields in a union or qualified union.  */
>>>>>       if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>>>>          return false;
>>>>>
>>>>> so you should also bail out on unions here, rather than the check you do later.
>>>
>>> The loop stops before we get to a union, so I think "only" the RECORD_TYPE
>>> COMPONENT_REF handling is a potential problem.  Does this mean that
>>> I should use the nonoverlapping_component_refs_of_decl_p code:
>>>
>>>       tree field1 = TREE_OPERAND (ref1, 1);
>>>       tree field2 = TREE_OPERAND (ref2, 1);
>>>
>>>       /* ??? We cannot simply use the type of operand #0 of the refs here
>>>          as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>>          for common blocks instead of using unions like everyone else.  */
>>>       tree type1 = DECL_CONTEXT (field1);
>>>       tree type2 = DECL_CONTEXT (field2);
>>>
>>>       /* We cannot disambiguate fields in a union or qualified union.  */
>>>       if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>>          return false;
>>>
>>>       if (field1 != field2)
>>>         {
>>>           /* A field and its representative need to be considered the
>>>              same.  */
>>>           if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2
>>>               || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1)
>>>             return false;
>>>           /* Different fields of the same record type cannot overlap.
>>>              ??? Bitfields can overlap at RTL level so punt on them.  */
>>>           if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
>>>             return false;
>>>           return true;
>>>         }
>>>
>>> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p
>>> during the new loop?
>>
>> Yes.  OTOH you want to "match" while the above disambiguates.  So it means
>> you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL
>> equality (which should be the same in the end).  The RTL concern
>> should not matter
>> here.
>
> The attached patch adds an access_fn_components_comparable_p helper
> function that checks whether the DECL_CONTEXTs are the same.
>
>>>  And test for this as well as unions in the outer
>>> references?
>>
>> So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT,
>> and you (should) stop the ref walk at DR_BASE_OBJECT.
>
> I was just thinking that if the Fortran front-end has cases in which
> TREE_TYPE (TREE_OPERAND (ref, 0)) != DECL_CONTEXT (TREE_OPERAND (ref, 1))
> for a COMPONENT_REF, should we treat that as equivalent to a union
> access in ref_contains_union_access_p?  But I'm not sure that's
> necessary after all.
>
>> The dr_analyze_indices code is also somewhat fishy in that it simply
>> ignores everything below unhandled component-refs even if there are
>> indices involved (and it gets away with this because dependence
>> analysis likely/hopefully gives up on the DR_BASE_OBJECT equality test
>> in case it is sth like a[i].union for example ... hopefully ...).
>
> I think this is what you meant, but: I don't think the base object
> itself can be a union, because we need at least one component reference
> for the DR, and don't accept COMPONENT_REFs for unions as access functions.
> So if the base involves a union, the base would also need to have a
> COMPONENT_REF that selects a particular member of that union.
>
> And yeah, before the patch we did allow a dependence distance to be
> calculated for a[i].union.f[j] vs. a[i].union.f[j + 1] (and still do
> after the patch), on the basis that a[i].union.f refers to the same
> object in both cases.
>
>>>>> You seem to rely on getting an access_fn entry for each handled_component_p.
>>>>> It looks like this is the case -- we even seem to stop at unions
>>>>> (with the same
>>>>> fortran "issue").  I'm not sure that's the best thing to do but you
>>>>> rely on that.
>>>
>>> Yeah, the loop is deliberately limited to the components associated with
>>> an access_fn.  I did wonder at first whether dr_analyze_indices should
>>> store the original component reference trees for each access function.
>>> That would make things simpler and more explicit, but would also eat up
>>> more memory.  Things like object_address_invariant_in_loop_p rely on the
>>> access_fns in the same way that the loop in the patch does.
>>
>> in fact it fails to handle ARRAY_RANGE_REFs ...
>
> Yeah, the whole file seems to ignore those.  What kind of code would
> benefit?
>
>>>>> I don't understand the looping, it needs more comments.  You seem to be
>>>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions
>>>>> is how many compatible refs you found on the way (with incompatible ones
>>>>> not counting?!).  What about an inner varying array of structs?
>>>>> This seems to
>>>>> be disregarded in the analysis now?  Thus, a[i].s.b[i].j vs. __real
>>>>> b[i].s.b[i].j?
>>>
>>> I'll try to improve the comments.  But the idea is that both sequences are
>>> as long as possible, while that still gives compatible types.  If there is
>>> more than one such sequence, we pick the one nearest the base.
>>>
>>> So in your example, the access functions would be:
>>>
>>>                0   1   2   3   4
>>>   a:          .j [i]  .b  .s [i]
>>>
>>>            0   1   2   3   4   5
>>>   b:  __real  .j [i]  .b  .s [i]
>>>
>>> If a and b are pointers, the final access functions would be
>>> unconstrained base accesses, so we'd end up with:
>>>
>>>   a: [0, 3]
>>>   b: [1, 4]
>>>
>>> for both sequences.
>>>
>>>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
>>>>> conveniently start from the other
>>>>> end of the ref here.
>>>>
>>>> That said, for the motivational cases we either have one ref having
>>>> more dimensions than the other (the __real vs. full complex access) or
>>>> they have the same number of dimensions (and no access fn for the
>>>> base).
>>>>
>>>> For the first case we should simply "drop" access_fns of the larger
>>>> dimensional ref (from the start, plus outer component refs) up to the
>>>> point the number of dimensions are equal.
>>>
>>> Yeah, that's what happens for your example.  But if we had:
>>>
>>>     a[i].s.c.d
>>>     __real b[i].s.b[i].j
>>>
>>> (where d is the same type as the real component) then the access
>>> functions would be:
>>>
>>>                    0   1   2   3
>>>   a:              .d  .c  .s [i]
>>>
>>>            0   1   2   3   4   5
>>>   b:  __real  .j [i]  .b  .s [i]
>>>
>>> Comparing the a0/b2 column doesn't make sense, because one's an array
>>> and the other is a structure.  In this case the sequence we care about is:
>>>
>>>   a: [1, 3]
>>>   b: [3, 5]
>>>
>>> which is what the loop gives.  The a1/b3 column is the one that proves
>>> there's no dependence.
>>>
>>>> Then we have the case of
>>>>
>>>>   ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b))
>>>>
>>>> where we have to punt.
>>>>
>>>> Then we have the case of
>>>>
>>>>   ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>>>>
>>>> which is where the new code should kick in to see if we can drop access_fns
>>>> from the other end (as unanalyzable but either having distance zero or not
>>>> aliased because of TBAA).
>>>>
>>>> At least your testcases suggest you do not want to handle
>>>>
>>>>  struct s { int x[N]; };
>>>>  struct r { struct s s; };
>>>>  f (struct s *a, struct r *b)
>>>>  {
>>>>     for (i = 0; i < N; ++i)
>>>>       a->s.x[i] = b->x[i];
>>>>  }
>>>>
>>>> ?
>>>>
>>>> With this example your loop which seems to search for a "common"
>>>> sequence in (different) midst of the reference trees makes more sense
>>>> (still that loop is awkward to understand).
>>>
>>> Yeah, I want to handle that too, just hadn't thought of it as a specific
>>> testcase.  The code does give the expected dependence distance of 0.
>>
>> Ok.
>>
>> I think the patch is reasonable, maybe the loop can be restructured /
>> simplified a bit and handling of the union case for example be done
>> first (by looking at DR_BASE_OBJECT).
>
> I still prefer doing the loop first and keeping the "same base" check
> together as a single condition, since it means that we're analysing the
> reference in a single direction (DR_REF to base) rather than jumping
> around.  And unequal bases should be more common that equal ones.
>
> I think both orders involve doing potentially redundant work.  The
> current order tends towards doing redundant work for union accesses
> and !flag_strict_aliasing, but they should be the less common cases.
>
> How does this look?  Changes since v1:
>
> - Added access_fn_component_p to check for valid access function components.
>
> - Added access_fn_components_comparable_p instead of using
>   types_compatibloe_p directly.
>
> - Added more commentary.
>
> - Added local structures to represent the sequence, so that it's
>   more obvious which variables are temporaries and which aren't.
>
> - Added the test above to vect-alias-check-3.c.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.
>
> Thanks,
> Richard
>
>
> 2017-05-18  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>
> 	* tree-data-ref.h (subscript): Add access_fn field.
> 	(data_dependence_relation): Add could_be_independent_p.
> 	(SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros.
> 	(same_access_functions): Move to tree-data-ref.c.
> 	* tree-data-ref.c (ref_contains_union_access_p): New function.
> 	(access_fn_component_p): Likewise.
> 	(access_fn_components_comparable_p): Likewise.
> 	(dr_analyze_indices): Add a comment that this code needs to be
> 	kept in sync with access_fn_component_p.
> 	(dump_data_dependence_relation): Use SUB_ACCESS_FN instead of
> 	DR_ACCESS_FN.
> 	(constant_access_functions): Likewise.
> 	(add_other_self_distances): Likewise.
> 	(same_access_functions): Likewise.  (Moved from tree-data-ref.h.)
> 	(initialize_data_dependence_relation): Use XCNEW and remove
> 	explicit zeroing of DDR_REVERSED_P.  Look for a subsequence
> 	of access functions that have the same type.  Allow the
> 	subsequence to end with different bases in some circumstances.
> 	Record the chosen access functions in SUB_ACCESS_FN.
> 	(build_classic_dist_vector_1): Replace ddr_a and ddr_b with
> 	a_index and b_index.  Use SUB_ACCESS_FN instead of DR_ACCESS_FN.
> 	(subscript_dependence_tester_1): Likewise dra and drb.
> 	(build_classic_dist_vector): Update calls accordingly.
> 	(subscript_dependence_tester): Likewise.
> 	* tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check
> 	DDR_COULD_BE_INDEPENDENT_P.
> 	* tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test
> 	comp_alias_ddrs instead of may_alias_ddrs.
> 	* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try
> 	to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back
> 	to using the recorded distance vectors if that fails.
> 	(dependence_distance_ge_vf): New function.
> 	(vect_prune_runtime_alias_test_list): Use it.  Don't clear
> 	LOOP_VINFO_MAY_ALIAS_DDRS.
>
> gcc/testsuite/
> 	* gcc.dg/vect/vect-alias-check-3.c: New test.
> 	* gcc.dg/vect/vect-alias-check-4.c: Likewise.
> 	* gcc.dg/vect/vect-alias-check-5.c: Likewise.
>
> Index: gcc/tree-data-ref.h
> ===================================================================
> --- gcc/tree-data-ref.h	2017-05-04 11:36:51.157328631 +0100
> +++ gcc/tree-data-ref.h	2017-05-18 07:51:50.871904726 +0100
> @@ -191,6 +191,9 @@ struct conflict_function
>  
>  struct subscript
>  {
> +  /* The access functions of the two references.  */
> +  tree access_fn[2];
> +
>    /* A description of the iterations for which the elements are
>       accessed twice.  */
>    conflict_function *conflicting_iterations_in_a;
> @@ -209,6 +212,7 @@ struct subscript
>  
>  typedef struct subscript *subscript_p;
>  
> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I]
>  #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a
>  #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b
>  #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict
> @@ -264,6 +268,33 @@ struct data_dependence_relation
>    /* Set to true when the dependence relation is on the same data
>       access.  */
>    bool self_reference_p;
> +
> +  /* True if the dependence described is conservatively correct rather
> +     than exact, and if it is still possible for the accesses to be
> +     conditionally independent.  For example, the a and b references in:
> +
> +       struct s *a, *b;
> +       for (int i = 0; i < n; ++i)
> +         a->f[i] += b->f[i];
> +
> +     conservatively have a distance vector of (0), for the case in which
> +     a == b, but the accesses are independent if a != b.  Similarly,
> +     the a and b references in:
> +
> +       struct s *a, *b;
> +       for (int i = 0; i < n; ++i)
> +         a[0].f[i] += b[i].f[i];
> +
> +     conservatively have a distance vector of (0), but they are indepenent
> +     when a != b + i.  In contrast, the references in:
> +
> +       struct s *a;
> +       for (int i = 0; i < n; ++i)
> +         a->f[i] += a->f[i];
> +
> +     have the same distance vector of (0), but the accesses can never be
> +     independent.  */
> +  bool could_be_independent_p;
>  };
>  
>  typedef struct data_dependence_relation *ddr_p;
> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \
>  #define DDR_DIST_VECT(DDR, I) \
>    DDR_DIST_VECTS (DDR)[I]
>  #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p
>  
>  
>  bool dr_analyze_innermost (struct data_reference *, struct loop *);
> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data
>        return false;
>  
>    return true;
> -}
> -
> -/* Return true when the DDR contains two data references that have the
> -   same access functions.  */
> -
> -static inline bool
> -same_access_functions (const struct data_dependence_relation *ddr)
> -{
> -  unsigned i;
> -
> -  for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
> -    if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i),
> -			  DR_ACCESS_FN (DDR_B (ddr), i)))
> -      return false;
> -
> -  return true;
>  }
>  
>  /* Returns true when all the dependences are computable.  */
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c	2017-05-18 07:51:26.126377691 +0100
> +++ gcc/tree-data-ref.c	2017-05-18 07:51:50.871904726 +0100
> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o
>  } dependence_stats;
>  
>  static bool subscript_dependence_tester_1 (struct data_dependence_relation *,
> -					   struct data_reference *,
> -					   struct data_reference *,
> +					   unsigned int, unsigned int,
>  					   struct loop *);
>  /* Returns true iff A divides B.  */
>  
> @@ -144,6 +143,21 @@ int_divides_p (int a, int b)
>    return ((b % a) == 0);
>  }
>  
> +/* Return true if reference REF contains a union access.  */
> +
> +static bool
> +ref_contains_union_access_p (tree ref)
> +{
> +  while (handled_component_p (ref))
> +    {
> +      ref = TREE_OPERAND (ref, 0);
> +      if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE
> +	  || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE)
> +	return true;
> +    }
> +  return false;
> +}
> +
>  
>  
>  /* Dump into FILE all the data references from DATAREFS.  */
> @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out
>        unsigned int i;
>        struct loop *loopi;
>  
> -      for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
> +      subscript *sub;
> +      FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
>  	{
>  	  fprintf (outf, "  access_fn_A: ");
> -	  print_generic_stmt (outf, DR_ACCESS_FN (dra, i));
> +	  print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0));
>  	  fprintf (outf, "  access_fn_B: ");
> -	  print_generic_stmt (outf, DR_ACCESS_FN (drb, i));
> -	  dump_subscript (outf, DDR_SUBSCRIPT (ddr, i));
> +	  print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1));
> +	  dump_subscript (outf, sub);
>  	}
>  
>        fprintf (outf, "  inner loop index: %d\n", DDR_INNER_LOOP (ddr));
> @@ -886,6 +901,27 @@ dr_analyze_innermost (struct data_refere
>    return true;
>  }
>  
> +/* Return true if OP is a valid component reference for a DR access
> +   function.  This accepts a subset of what handled_component_p accepts.  */
> +
> +static bool
> +access_fn_component_p (tree op)
> +{
> +  switch (TREE_CODE (op))
> +    {
> +    case REALPART_EXPR:
> +    case IMAGPART_EXPR:
> +    case ARRAY_REF:
> +      return true;
> +
> +    case COMPONENT_REF:
> +      return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE;
> +
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Determines the base object and the list of indices of memory reference
>     DR, analyzed in LOOP and instantiated in loop nest NEST.  */
>  
> @@ -923,7 +959,9 @@ dr_analyze_indices (struct data_referenc
>        access_fns.safe_push (integer_one_node);
>      }
>  
> -  /* Analyze access functions of dimensions we know to be independent.  */
> +  /* Analyze access functions of dimensions we know to be independent.
> +     The list of component references handled here should be kept in
> +     sync with access_fn_component_p.  */
>    while (handled_component_p (ref))
>      {
>        if (TREE_CODE (ref) == ARRAY_REF)
> @@ -1472,6 +1510,27 @@ dr_may_alias_p (const struct data_refere
>    return refs_may_alias_p (addr_a, addr_b);
>  }
>  
> +/* REF_A and REF_B both satisfy access_fns_comparable_p.  Return true
> +   if it is meaningful to compare their associated access functions
> +   when checking for dependencies.  */
> +
> +static bool
> +access_fn_components_comparable_p (tree ref_a, tree ref_b)
> +{
> +  if (TREE_CODE (ref_a) != TREE_CODE (ref_b))
> +    return false;
> +
> +  if (TREE_CODE (ref_a) == COMPONENT_REF)
> +    /* ??? We cannot simply use the type of operand #0 of the refs here as
> +       the Fortran compiler smuggles type punning into COMPONENT_REFs.
> +       Use the DECL_CONTEXT of the FIELD_DECLs instead.  */
> +    return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1))
> +	    == DECL_CONTEXT (TREE_OPERAND (ref_b, 1)));
> +
> +  return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)),
> +			     TREE_TYPE (TREE_OPERAND (ref_b, 0)));
> +}
> +
>  /* Initialize a data dependence relation between data accesses A and
>     B.  NB_LOOPS is the number of loops surrounding the references: the
>     size of the classic distance/direction vectors.  */
> @@ -1484,11 +1543,10 @@ initialize_data_dependence_relation (str
>    struct data_dependence_relation *res;
>    unsigned int i;
>  
> -  res = XNEW (struct data_dependence_relation);
> +  res = XCNEW (struct data_dependence_relation);
>    DDR_A (res) = a;
>    DDR_B (res) = b;
>    DDR_LOOP_NEST (res).create (0);
> -  DDR_REVERSED_P (res) = false;
>    DDR_SUBSCRIPTS (res).create (0);
>    DDR_DIR_VECTS (res).create (0);
>    DDR_DIST_VECTS (res).create (0);
> @@ -1506,82 +1564,277 @@ initialize_data_dependence_relation (str
>        return res;
>      }
>  
> -  /* The case where the references are exactly the same.  */
> -  if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
> +  unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
> +  unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
> +  if (num_dimensions_a == 0 || num_dimensions_b == 0)
>      {
> -      if ((loop_nest.exists ()
> -	   && !object_address_invariant_in_loop_p (loop_nest[0],
> -						   DR_BASE_OBJECT (a)))
> -	  || DR_NUM_DIMENSIONS (a) == 0)
> +      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> +      return res;
> +    }
> +
> +  /* For unconstrained bases, the root (highest-indexed) subscript
> +     describes a variation in the base of the original DR_REF rather
> +     than a component access.  We have no type that accurately describes
> +     the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after*
> +     applying this subscript) so limit the search to the last real
> +     component access.
> +
> +     E.g. for:
> +
> +	void
> +	f (int a[][8], int b[][8])
> +	{
> +	  for (int i = 0; i < 8; ++i)
> +	    a[i * 2][0] = b[i][0];
> +	}
> +
> +     the a and b accesses have a single ARRAY_REF component reference [0]
> +     but have two subscripts.  */
> +  if (DR_UNCONSTRAINED_BASE (a))
> +    num_dimensions_a -= 1;
> +  if (DR_UNCONSTRAINED_BASE (b))
> +    num_dimensions_b -= 1;
> +
> +  /* These structures describe sequences of component references in
> +     DR_REF (A) and DR_REF (B).  Each component reference is tied to a
> +     specific access function.  */
> +  struct {
> +    /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and
> +       DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher
> +       indices.  In C notation, these are the indices of the rightmost
> +       component references; e.g. for a sequence .b.c.d, the start
> +       index is for .d.  */
> +    unsigned int start_a;
> +    unsigned int start_b;
> +
> +    /* The sequence contains LENGTH consecutive access functions from
> +       each DR.  */
> +    unsigned int length;
> +
> +    /* The enclosing objects for the A and B sequences respectively,
> +       i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1)
> +       and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied.  */
> +    tree object_a;
> +    tree object_b;
> +  } full_seq = {}, struct_seq = {};
> +
> +  /* Before each iteration of the loop:
> +
> +     - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and
> +     - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B).  */
> +  unsigned int index_a = 0;
> +  unsigned int index_b = 0;
> +  tree ref_a = DR_REF (a);
> +  tree ref_b = DR_REF (b);
> +
> +  /* Now walk the component references from the final DR_REFs back up to
> +     the enclosing base objects.  Each component reference corresponds
> +     to one access function in the DR, with access function 0 being for
> +     the final DR_REF and the highest-indexed access function being the
> +     one that is applied to the base of the DR.
> +
> +     Look for a sequence of component references whose access functions
> +     are comparable (see access_fn_components_comparable_p).  If more
> +     than one such sequence exists, pick the one nearest the base
> +     (which is the leftmost sequence in C notation).  Store this sequence
> +     in FULL_SEQ.
> +
> +     For example, if we have:
> +
> +	struct foo { struct bar s; ... } (*a)[10], (*b)[10];
> +
> +	A: a[0][i].s.c.d
> +	B: __real b[0][i].s.e[i].f
> +
> +     (where d is the same type as the real component of f) then the access
> +     functions would be:
> +
> +			 0   1   2   3
> +	A:              .d  .c  .s [i]
> +
> +		 0   1   2   3   4   5
> +	B:  __real  .f [i]  .e  .s [i]
> +
> +     The A0/B2 column isn't comparable, since .d is a COMPONENT_REF
> +     and [i] is an ARRAY_REF.  However, the A1/B3 column contains two
> +     COMPONENT_REF accesses for struct bar, so is comparable.  Likewise
> +     the A2/B4 column contains two COMPONENT_REF accesses for struct foo,
> +     so is comparable.  The A3/B5 column contains two ARRAY_REFs that
> +     index foo[10] arrays, so is again comparable.  The sequence is
> +     therefore:
> +
> +        A: [1, 3]  (i.e. [i].s.c)
> +        B: [3, 5]  (i.e. [i].s.e)
> +
> +     Also look for sequences of component references whose access
> +     functions are comparable and whose enclosing objects have the same
> +     RECORD_TYPE.  Store this sequence in STRUCT_SEQ.  In the above
> +     example, STRUCT_SEQ would be:
> +
> +        A: [1, 2]  (i.e. s.c)
> +        B: [3, 4]  (i.e. s.e)  */
> +  while (index_a < num_dimensions_a && index_b < num_dimensions_b)
> +    {
> +      /* REF_A and REF_B must be one of the component access types
> +	 allowed by dr_analyze_indices.  */
> +      gcc_checking_assert (access_fn_component_p (ref_a));
> +      gcc_checking_assert (access_fn_component_p (ref_b));
> +
> +      /* Get the immediately-enclosing objects for REF_A and REF_B,
> +	 i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A)
> +	 and DR_ACCESS_FN (B, INDEX_B).  */
> +      tree object_a = TREE_OPERAND (ref_a, 0);
> +      tree object_b = TREE_OPERAND (ref_b, 0);
> +
> +      tree type_a = TREE_TYPE (object_a);
> +      tree type_b = TREE_TYPE (object_b);
> +      if (access_fn_components_comparable_p (ref_a, ref_b))
> +	{
> +	  /* This pair of component accesses is comparable for dependence
> +	     analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and
> +	     DR_ACCESS_FN (B, INDEX_B) in the sequence.  */
> +	  if (full_seq.start_a + full_seq.length != index_a
> +	      || full_seq.start_b + full_seq.length != index_b)
> +	    {
> +	      /* The accesses don't extend the current sequence,
> +		 so start a new one here.  */
> +	      full_seq.start_a = index_a;
> +	      full_seq.start_b = index_b;
> +	      full_seq.length = 0;
> +	    }
> +
> +	  /* Add this pair of references to the sequence.  */
> +	  full_seq.length += 1;
> +	  full_seq.object_a = object_a;
> +	  full_seq.object_b = object_b;
> +
> +	  /* If the enclosing objects are structures (and thus have the
> +	     same RECORD_TYPE), record the new sequence in STRUCT_SEQ.  */
> +	  if (TREE_CODE (type_a) == RECORD_TYPE)
> +	    struct_seq = full_seq;
> +
> +	  /* Move to the next containing reference for both A and B.  */
> +	  ref_a = object_a;
> +	  ref_b = object_b;
> +	  index_a += 1;
> +	  index_b += 1;
> +	  continue;
> +	}
> +
> +      /* Try to approach equal type sizes.  */
> +      if (!COMPLETE_TYPE_P (type_a)
> +	  || !COMPLETE_TYPE_P (type_b)
> +	  || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
> +	  || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
> +	break;
> +
> +      unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a));
> +      unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b));
> +      if (size_a <= size_b)
>  	{
> -	  DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> -	  return res;
> +	  index_a += 1;
> +	  ref_a = object_a;
> +	}
> +      if (size_b <= size_a)
> +	{
> +	  index_b += 1;
> +	  ref_b = object_b;
>  	}
> -      DDR_AFFINE_P (res) = true;
> -      DDR_ARE_DEPENDENT (res) = NULL_TREE;
> -      DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a));
> -      DDR_LOOP_NEST (res) = loop_nest;
> -      DDR_INNER_LOOP (res) = 0;
> -      DDR_SELF_REFERENCE (res) = true;
> -      for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> -       {
> -         struct subscript *subscript;
> -
> -         subscript = XNEW (struct subscript);
> -         SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
> -         SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
> -         SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
> -         SUB_DISTANCE (subscript) = chrec_dont_know;
> -         DDR_SUBSCRIPTS (res).safe_push (subscript);
> -       }
> -      return res;
>      }
>  
> -  /* If the references do not access the same object, we do not know
> -     whether they alias or not.  We do not care about TBAA or alignment
> -     info so we can use OEP_ADDRESS_OF to avoid false negatives.
> -     But the accesses have to use compatible types as otherwise the
> -     built indices would not match.  */
> -  if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF)
> -      || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)),
> -			      TREE_TYPE (DR_BASE_OBJECT (b))))
> +  /* See whether FULL_SEQ ends at the base and whether the two bases
> +     are equal.  We do not care about TBAA or alignment info so we can
> +     use OEP_ADDRESS_OF to avoid false negatives.  */
> +  tree base_a = DR_BASE_OBJECT (a);
> +  tree base_b = DR_BASE_OBJECT (b);
> +  bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
> +		      && full_seq.start_b + full_seq.length == num_dimensions_b
> +		      && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
> +		      && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
> +		      && types_compatible_p (TREE_TYPE (base_a),
> +					     TREE_TYPE (base_b))
> +		      && (!loop_nest.exists ()
> +			  || (object_address_invariant_in_loop_p
> +			      (loop_nest[0], base_a))));
> +
> +  /* If the bases are the same, we can include the base variation too.
> +     E.g. the b accesses in:
> +
> +       for (int i = 0; i < n; ++i)
> +         b[i + 4][0] = b[i][0];
> +
> +     have a definite dependence distance of 4, while for:
> +
> +       for (int i = 0; i < n; ++i)
> +         a[i + 4][0] = b[i][0];
> +
> +     the dependence distance depends on the gap between a and b.
> +
> +     If the bases are different then we can only rely on the sequence
> +     rooted at a structure access, since arrays are allowed to overlap
> +     arbitrarily and change shape arbitrarily.  E.g. we treat this as
> +     valid code:
> +
> +       int a[256];
> +       ...
> +       ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0];
> +
> +     where two lvalues with the same int[4][3] type overlap, and where
> +     both lvalues are distinct from the object's declared type.  */
> +  if (same_base_p)
>      {
> -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> -      return res;
> +      if (DR_UNCONSTRAINED_BASE (a))
> +	full_seq.length += 1;
>      }
> +  else
> +    full_seq = struct_seq;
>  
> -  /* If the base of the object is not invariant in the loop nest, we cannot
> -     analyze it.  TODO -- in fact, it would suffice to record that there may
> -     be arbitrary dependences in the loops where the base object varies.  */
> -  if ((loop_nest.exists ()
> -       && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a)))
> -      || DR_NUM_DIMENSIONS (a) == 0)
> +  /* Punt if we didn't find a suitable sequence.  */
> +  if (full_seq.length == 0)
>      {
>        DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>        return res;
>      }
>  
> -  /* If the number of dimensions of the access to not agree we can have
> -     a pointer access to a component of the array element type and an
> -     array access while the base-objects are still the same.  Punt.  */
> -  if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b))
> +  if (!same_base_p)
>      {
> -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> -      return res;
> +      /* Partial overlap is possible for different bases when strict aliasing
> +	 is not in effect.  It's also possible if either base involves a union
> +	 access; e.g. for:
> +
> +	   struct s1 { int a[2]; };
> +	   struct s2 { struct s1 b; int c; };
> +	   struct s3 { int d; struct s1 e; };
> +	   union u { struct s2 f; struct s3 g; } *p, *q;
> +
> +	 the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at
> +	 "p->g.e" (base "p->g") and might partially overlap the s1 at
> +	 "q->g.e" (base "q->g").  */
> +      if (!flag_strict_aliasing
> +	  || ref_contains_union_access_p (full_seq.object_a)
> +	  || ref_contains_union_access_p (full_seq.object_b))
> +	{
> +	  DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> +	  return res;
> +	}
> +
> +      DDR_COULD_BE_INDEPENDENT_P (res) = true;
>      }
>  
>    DDR_AFFINE_P (res) = true;
>    DDR_ARE_DEPENDENT (res) = NULL_TREE;
> -  DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a));
> +  DDR_SUBSCRIPTS (res).create (full_seq.length);
>    DDR_LOOP_NEST (res) = loop_nest;
>    DDR_INNER_LOOP (res) = 0;
>    DDR_SELF_REFERENCE (res) = false;
>  
> -  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> +  for (i = 0; i < full_seq.length; ++i)
>      {
>        struct subscript *subscript;
>  
>        subscript = XNEW (struct subscript);
> +      SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
> +      SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
>        SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
>        SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
>        SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
> @@ -3163,14 +3416,15 @@ add_outer_distances (struct data_depende
>  }
>  
>  /* Return false when fail to represent the data dependence as a
> -   distance vector.  INIT_B is set to true when a component has been
> +   distance vector.  A_INDEX is the index of the first reference
> +   (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the
> +   second reference.  INIT_B is set to true when a component has been
>     added to the distance vector DIST_V.  INDEX_CARRY is then set to
>     the index in DIST_V that carries the dependence.  */
>  
>  static bool
>  build_classic_dist_vector_1 (struct data_dependence_relation *ddr,
> -			     struct data_reference *ddr_a,
> -			     struct data_reference *ddr_b,
> +			     unsigned int a_index, unsigned int b_index,
>  			     lambda_vector dist_v, bool *init_b,
>  			     int *index_carry)
>  {
> @@ -3188,8 +3442,8 @@ build_classic_dist_vector_1 (struct data
>  	  return false;
>  	}
>  
> -      access_fn_a = DR_ACCESS_FN (ddr_a, i);
> -      access_fn_b = DR_ACCESS_FN (ddr_b, i);
> +      access_fn_a = SUB_ACCESS_FN (subscript, a_index);
> +      access_fn_b = SUB_ACCESS_FN (subscript, b_index);
>  
>        if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC
>  	  && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC)
> @@ -3249,10 +3503,11 @@ build_classic_dist_vector_1 (struct data
>  constant_access_functions (const struct data_dependence_relation *ddr)
>  {
>    unsigned i;
> +  subscript *sub;
>  
> -  for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
> -    if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i))
> -	|| !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i)))
> +  FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
> +    if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0))
> +	|| !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1)))
>        return false;
>  
>    return true;
> @@ -3315,10 +3570,11 @@ add_other_self_distances (struct data_de
>    lambda_vector dist_v;
>    unsigned i;
>    int index_carry = DDR_NB_LOOPS (ddr);
> +  subscript *sub;
>  
> -  for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
> +  FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
>      {
> -      tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i);
> +      tree access_fun = SUB_ACCESS_FN (sub, 0);
>  
>        if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC)
>  	{
> @@ -3330,7 +3586,7 @@ add_other_self_distances (struct data_de
>  		  return;
>  		}
>  
> -	      access_fun = DR_ACCESS_FN (DDR_A (ddr), 0);
> +	      access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0);
>  
>  	      if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC)
>  		add_multivariate_self_dist (ddr, access_fun);
> @@ -3401,6 +3657,23 @@ add_distance_for_zero_overlaps (struct d
>      }
>  }
>  
> +/* Return true when the DDR contains two data references that have the
> +   same access functions.  */
> +
> +static inline bool
> +same_access_functions (const struct data_dependence_relation *ddr)
> +{
> +  unsigned i;
> +  subscript *sub;
> +
> +  FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub)
> +    if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0),
> +			  SUB_ACCESS_FN (sub, 1)))
> +      return false;
> +
> +  return true;
> +}
> +
>  /* Compute the classic per loop distance vector.  DDR is the data
>     dependence relation to build a vector from.  Return false when fail
>     to represent the data dependence as a distance vector.  */
> @@ -3432,8 +3705,7 @@ build_classic_dist_vector (struct data_d
>      }
>  
>    dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
> -  if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr),
> -				    dist_v, &init_b, &index_carry))
> +  if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry))
>      return false;
>  
>    /* Save the distance vector if we initialized one.  */
> @@ -3466,12 +3738,11 @@ build_classic_dist_vector (struct data_d
>        if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
>  	{
>  	  lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
> -	  if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr),
> -					      loop_nest))
> +	  if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
>  	    return false;
>  	  compute_subscript_distance (ddr);
> -	  if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr),
> -					    save_v, &init_b, &index_carry))
> +	  if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
> +					    &index_carry))
>  	    return false;
>  	  save_dist_v (ddr, save_v);
>  	  DDR_REVERSED_P (ddr) = true;
> @@ -3507,12 +3778,10 @@ build_classic_dist_vector (struct data_d
>  	    {
>  	      lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>  
> -	      if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr),
> -						  DDR_A (ddr), loop_nest))
> +	      if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
>  		return false;
>  	      compute_subscript_distance (ddr);
> -	      if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr),
> -						opposite_v, &init_b,
> +	      if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b,
>  						&index_carry))
>  		return false;
>  
> @@ -3591,13 +3860,13 @@ build_classic_dir_vector (struct data_de
>      }
>  }
>  
> -/* Helper function.  Returns true when there is a dependence between
> -   data references DRA and DRB.  */
> +/* Helper function.  Returns true when there is a dependence between the
> +   data references.  A_INDEX is the index of the first reference (0 for
> +   DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference.  */
>  
>  static bool
>  subscript_dependence_tester_1 (struct data_dependence_relation *ddr,
> -			       struct data_reference *dra,
> -			       struct data_reference *drb,
> +			       unsigned int a_index, unsigned int b_index,
>  			       struct loop *loop_nest)
>  {
>    unsigned int i;
> @@ -3609,8 +3878,8 @@ subscript_dependence_tester_1 (struct da
>      {
>        conflict_function *overlaps_a, *overlaps_b;
>  
> -      analyze_overlapping_iterations (DR_ACCESS_FN (dra, i),
> -				      DR_ACCESS_FN (drb, i),
> +      analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index),
> +				      SUB_ACCESS_FN (subscript, b_index),
>  				      &overlaps_a, &overlaps_b,
>  				      &last_conflicts, loop_nest);
>  
> @@ -3659,7 +3928,7 @@ subscript_dependence_tester_1 (struct da
>  subscript_dependence_tester (struct data_dependence_relation *ddr,
>  			     struct loop *loop_nest)
>  {
> -  if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest))
> +  if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest))
>      dependence_stats.num_dependence_dependent++;
>  
>    compute_subscript_distance (ddr);
> Index: gcc/tree-ssa-loop-prefetch.c
> ===================================================================
> --- gcc/tree-ssa-loop-prefetch.c	2017-05-18 07:51:26.127377591 +0100
> +++ gcc/tree-ssa-loop-prefetch.c	2017-05-18 07:51:50.871904726 +0100
> @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop *
>        refb = (struct mem_ref *) DDR_B (dep)->aux;
>  
>        if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know
> +	  || DDR_COULD_BE_INDEPENDENT_P (dep)
>  	  || DDR_NUM_DIST_VECTS (dep) == 0)
>  	{
>  	  /* If the dependence cannot be analyzed, assume that there might be
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h	2017-05-18 07:51:26.128377491 +0100
> +++ gcc/tree-vectorizer.h	2017-05-18 07:51:50.872904626 +0100
> @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L)
>  #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L)	\
>    ((L)->may_misalign_stmts.length () > 0)
>  #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L)		\
> -  ((L)->may_alias_ddrs.length () > 0)
> +  ((L)->comp_alias_ddrs.length () > 0)
>  #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L)		\
>    (LOOP_VINFO_NITERS_ASSUMPTIONS (L))
>  #define LOOP_REQUIRES_VERSIONING(L)			\
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c	2017-05-18 07:51:23.307659382 +0100
> +++ gcc/tree-vect-data-refs.c	2017-05-18 07:51:50.872904626 +0100
> @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct
>      }
>  
>    loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
> +
> +  if (DDR_COULD_BE_INDEPENDENT_P (ddr))
> +    /* For dependence distances of 2 or more, we have the option of
> +       limiting VF or checking for an alias at runtime.  Prefer to check
> +       at runtime if we can, to avoid limiting the VF unnecessarily when
> +       the bases are in fact independent.
> +
> +       Note that the alias checks will be removed if the VF ends up
> +       being small enough.  */
> +    FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
> +      {
> +	int dist = dist_v[loop_depth];
> +	if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr)))
> +	  {
> +	    if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo))
> +	      return false;
> +	    break;
> +	  }
> +      }
> +
>    FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>      {
>        int dist = dist_v[loop_depth];
> @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference *
>    return false;
>  }
>  
> +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH
> +   in DDR is >= VF.  */
> +
> +static bool
> +dependence_distance_ge_vf (data_dependence_relation *ddr,
> +			   unsigned int loop_depth, unsigned HOST_WIDE_INT vf)
> +{
> +  if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE
> +      || DDR_NUM_DIST_VECTS (ddr) == 0)
> +    return false;
> +
> +  /* If the dependence is exact, we should have limited the VF instead.  */
> +  gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr));
> +
> +  unsigned int i;
> +  lambda_vector dist_v;
> +  FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
> +    {
> +      HOST_WIDE_INT dist = dist_v[loop_depth];
> +      if (dist != 0
> +	  && !(dist > 0 && DDR_REVERSED_P (ddr))
> +	  && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf)
> +	return false;
> +    }
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location,
> +		       "dependence distance between ");
> +      dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr)));
> +      dump_printf (MSG_NOTE,  " and ");
> +      dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr)));
> +      dump_printf (MSG_NOTE,  " is >= VF\n");
> +    }
> +
> +  return true;
> +}
> +
>  /* Function vect_prune_runtime_alias_test_list.
>  
>     Prune a list of ddrs to be tested at run-time by versioning for alias.
> @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop
>  
>    comp_alias_ddrs.create (may_alias_ddrs.length ());
>  
> +  unsigned int loop_depth
> +    = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num,
> +			  LOOP_VINFO_LOOP_NEST (loop_vinfo));
> +
>    /* First, we collect all data ref pairs for aliasing checks.  */
>    FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>      {
> @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop
>        tree segment_length_a, segment_length_b;
>        gimple *stmt_a, *stmt_b;
>  
> +      /* Ignore the alias if the VF we chose ended up being no greater
> +	 than the dependence distance.  */
> +      if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor))
> +	continue;
> +
>        dr_a = DDR_A (ddr);
>        stmt_a = DR_STMT (DDR_A (ddr));
>        dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
> @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop
>        return false;
>      }
>  
> -  /* All alias checks have been resolved at compilation time.  */
> -  if (!comp_alias_ddrs.length ())
> -    LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0);
> -
>    return true;
>  }
>  
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c
> ===================================================================
> --- /dev/null	2017-05-17 17:16:48.996861112 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c	2017-05-18 07:51:50.870904826 +0100
> @@ -0,0 +1,112 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */
> +
> +/* Intended to be larger than any VF.  */
> +#define GAP 128
> +#define N (GAP * 3)
> +
> +struct s { int x[N + 1]; };
> +struct t { struct s x[N + 1]; };
> +struct u { int x[N + 1]; int y; };
> +struct v { struct s s; };
> +
> +void
> +f1 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a->x[i] += b->x[i];
> +}
> +
> +void
> +f2 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a[1].x[i] += b[2].x[i];
> +}
> +
> +void
> +f3 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a[1].x[i] += b[i].x[i];
> +}
> +
> +void
> +f4 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a[i].x[i] += b[i].x[i];
> +}
> +
> +void
> +f5 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a->x[i] += b->x[i + 1];
> +}
> +
> +void
> +f6 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a[1].x[i] += b[2].x[i + 1];
> +}
> +
> +void
> +f7 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a[1].x[i] += b[i].x[i + 1];
> +}
> +
> +void
> +f8 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a[i].x[i] += b[i].x[i + 1];
> +}
> +
> +void
> +f9 (struct s *a, struct t *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a->x[i] += b->x[1].x[i];
> +}
> +
> +void
> +f10 (struct s *a, struct t *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a->x[i] += b->x[i].x[i];
> +}
> +
> +void
> +f11 (struct u *a, struct u *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a->x[i] += b->x[i] + b[i].y;
> +}
> +
> +void
> +f12 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < GAP; ++i)
> +    a->x[i + GAP] += b->x[i];
> +}
> +
> +void
> +f13 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < GAP * 2; ++i)
> +    a->x[i + GAP] += b->x[i];
> +}
> +
> +void
> +f14 (struct v *a, struct s *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a->s.x[i] = b->x[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 14 "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c
> ===================================================================
> --- /dev/null	2017-05-17 17:16:48.996861112 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c	2017-05-18 07:51:50.870904826 +0100
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */
> +
> +#define N 16
> +
> +struct s1 { int a[N]; };
> +struct s2 { struct s1 b; int c; };
> +struct s3 { int d; struct s1 e; };
> +union u { struct s2 f; struct s3 g; };
> +
> +/* We allow a and b to overlap arbitrarily.  */
> +
> +void
> +f1 (int a[][N], int b[][N])
> +{
> +  for (int i = 0; i < N; ++i)
> +    a[0][i] += b[0][i];
> +}
> +
> +void
> +f2 (union u *a, union u *b)
> +{
> +  for (int i = 0; i < N; ++i)
> +    a->f.b.a[i] += b->g.e.a[i];
> +}
> +
> +void
> +f3 (struct s1 *a, struct s1 *b)
> +{
> +  for (int i = 0; i < N - 1; ++i)
> +    a->a[i + 1] += b->a[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c
> ===================================================================
> --- /dev/null	2017-05-17 17:16:48.996861112 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c	2017-05-18 07:51:50.870904826 +0100
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* Intended to be larger than any VF.  */
> +#define GAP 128
> +#define N (GAP * 3)
> +
> +struct s { int x[N]; };
> +
> +void
> +f1 (struct s *a, struct s *b)
> +{
> +  for (int i = 0; i < GAP * 2; ++i)
> +    a->x[i + GAP] += b->x[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]