PR81635: Use chrecs to help find related data refs

Richard Sandiford richard.sandiford@linaro.org
Wed Aug 16 16:50:00 GMT 2017


"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> The first loop in the testcase regressed after my recent changes to
>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>> for bb analysis and end up with:
>>
>>    DR_BASE_ADDRESS: p or q
>>    DR_OFFSET: 0
>>    DR_INIT: 0 or 4
>>    DR_STEP: 16
>>
>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>
>>    DR_BASE_ADDRESS: p or q
>>    DR_OFFSET: (intptr_t) i
>>    DR_INIT: 0 or 4
>>    DR_STEP: 0
>>
>> This is also what we'd like to have for the unsigned "i", but the
>> problem is that strip_constant_offset thinks that the "i + 1" in
>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>> longer seem to be related to the [i] ones.
>
> Didn't read the change in detail, so sorry if I mis-understood the issue.
> I made changes in scev to better fold type conversion by various sources
> of information, for example, vrp, niters, undefined overflow behavior etc.
> In theory these information should be available for other optimizers without
> querying scev.  For the mentioned test, vrp should compute accurate range
> information for "i" so that we can fold (intptr_t) (i + 1) it without worrying
> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
> could be more expensive (especially in case of (T)(i + j)), or because the
> CST part is in bigger precision after conversion.
> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
> an interface, we changed tree-affine and made it do aggressive fold.  I am
> curious if it's possible to use aff_tree to implement strip_constant_offset
> here since aggressive folding is wanted.  After all, using additional chrec
> looks like a little heavy wrto the simple test.

Yeah, using aff_tree does work here when the bounds are constant.
It doesn't look like it works for things like:

    double p[1000];
    double q[1000];

    void
    f4 (unsigned int n)
    {
      for (unsigned int i = 0; i < n; i += 4)
        {
          double a = q[i] + p[i];
          double b = q[i + 1] + p[i + 1];
          q[i] = a;
          q[i + 1] = b;
        }
    }

though, where the bounds on the global arrays guarantee that [i + 1] can't
overflow, even though "n" is unconstrained.  The patch as posted handles
this case too.

Thanks,
Richard

>
> Thanks,
> bin
>>
>> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
>> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
>> references, with the distance between their start addresses being
>> the difference between their DR_INITs.  For this second type, the
>> exact form of DR_OFFSET doesn't really matter, and the more it is
>> able to canonicalise the non-constant offset, the better.
>>
>> The patch fixes the PR by adding another offset/init pair to the
>> innermost loop behaviour for this second group.  The new pair use chrecs
>> rather than generic exprs for the offset part, with the chrec being
>> analysed in the innermost loop for which the offset isn't invariant.
>> This allows us to vectorise the second function in the testcase
>> as well, which wasn't possible before the earlier patch.
>>
>> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> gcc/
>>         PR tree-optimization/81635
>>         * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>>         chrec_init.
>>         (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>>         (dr_equal_offsets_p): Delete.
>>         (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>>         * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>>         (split_constant_offset): Handle POLYNOMIAL_CHREC.
>>         (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
>>         (operator ==): Use dr_chrec_offsets_equal_p and compare the
>>         DR_CHREC_INITs.
>>         (prune_runtime_alias_test_list): Likewise.
>>         (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
>>         the DR_CHREC_INITs.
>>         (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>>         (analyze_offset_scev): New function.
>>         (compute_offset_chrecs): Likewise.
>>         (dr_chrec_offsets_equal_p): Likewise.
>>         (dr_chrec_offsets_compare): Likewise.
>>         * tree-loop-distribution.c (compute_alias_check_pairs): Use
>>         dr_chrec_offsets_compare.
>>         * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
>>         dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
>>         (dr_group_sort_cmp): Likewise.
>>         (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
>>         (vect_no_alias_p): Likewise.
>>         (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
>>         compare the DR_CHREC_INITs.
>>         (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
>>         * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
>>         of DR_INIT.
>>
>> gcc/testsuite/
>>         PR tree-optimization/81635
>>         * gcc.dg/vect/bb-slp-pr81635.c: New test.
>>



More information about the Gcc-patches mailing list