This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PR81635: Use chrecs to help find related data refs


On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> The first loop in the testcase regressed after my recent changes to
>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>> for bb analysis and end up with:
>>>
>>>    DR_BASE_ADDRESS: p or q
>>>    DR_OFFSET: 0
>>>    DR_INIT: 0 or 4
>>>    DR_STEP: 16
>>>
>>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>>
>>>    DR_BASE_ADDRESS: p or q
>>>    DR_OFFSET: (intptr_t) i
>>>    DR_INIT: 0 or 4
>>>    DR_STEP: 0
>>>
>>> This is also what we'd like to have for the unsigned "i", but the
>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>> longer seem to be related to the [i] ones.
>>
>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>> I made changes in scev to better fold type conversion by various sources
>> of information, for example, vrp, niters, undefined overflow behavior etc.
>> In theory these information should be available for other optimizers without
>> querying scev.  For the mentioned test, vrp should compute accurate range
>> information for "i" so that we can fold (intptr_t) (i + 1) it without worrying
>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
>> could be more expensive (especially in case of (T)(i + j)), or because the
>> CST part is in bigger precision after conversion.
>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
>> an interface, we changed tree-affine and made it do aggressive fold.  I am
>> curious if it's possible to use aff_tree to implement strip_constant_offset
>> here since aggressive folding is wanted.  After all, using additional chrec
>> looks like a little heavy wrto the simple test.
>
> Yeah, using aff_tree does work here when the bounds are constant.
> It doesn't look like it works for things like:
>
>     double p[1000];
>     double q[1000];
>
>     void
>     f4 (unsigned int n)
>     {
>       for (unsigned int i = 0; i < n; i += 4)
>         {
>           double a = q[i] + p[i];
>           double b = q[i + 1] + p[i + 1];
>           q[i] = a;
>           q[i + 1] = b;
>         }
>     }
>
> though, where the bounds on the global arrays guarantee that [i + 1] can't
> overflow, even though "n" is unconstrained.  The patch as posted handles
> this case too.
BTW is this a missed optimization in value range analysis?  The range
information for i should flow in a way like: array boundary -> niters
-> scev/vrp.
I think that's what niters/scev do in analysis.

Thanks,
bin
>
> Thanks,
> Richard
>
>>
>> Thanks,
>> bin
>>>
>>> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
>>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
>>> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
>>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
>>> references, with the distance between their start addresses being
>>> the difference between their DR_INITs.  For this second type, the
>>> exact form of DR_OFFSET doesn't really matter, and the more it is
>>> able to canonicalise the non-constant offset, the better.
>>>
>>> The patch fixes the PR by adding another offset/init pair to the
>>> innermost loop behaviour for this second group.  The new pair use chrecs
>>> rather than generic exprs for the offset part, with the chrec being
>>> analysed in the innermost loop for which the offset isn't invariant.
>>> This allows us to vectorise the second function in the testcase
>>> as well, which wasn't possible before the earlier patch.
>>>
>>> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>>>
>>> Richard
>>>
>>>
>>> gcc/
>>>         PR tree-optimization/81635
>>>         * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>>>         chrec_init.
>>>         (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>>>         (dr_equal_offsets_p): Delete.
>>>         (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>>>         * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>>>         (split_constant_offset): Handle POLYNOMIAL_CHREC.
>>>         (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
>>>         (operator ==): Use dr_chrec_offsets_equal_p and compare the
>>>         DR_CHREC_INITs.
>>>         (prune_runtime_alias_test_list): Likewise.
>>>         (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
>>>         the DR_CHREC_INITs.
>>>         (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>>>         (analyze_offset_scev): New function.
>>>         (compute_offset_chrecs): Likewise.
>>>         (dr_chrec_offsets_equal_p): Likewise.
>>>         (dr_chrec_offsets_compare): Likewise.
>>>         * tree-loop-distribution.c (compute_alias_check_pairs): Use
>>>         dr_chrec_offsets_compare.
>>>         * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
>>>         dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
>>>         (dr_group_sort_cmp): Likewise.
>>>         (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
>>>         (vect_no_alias_p): Likewise.
>>>         (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
>>>         compare the DR_CHREC_INITs.
>>>         (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
>>>         * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
>>>         of DR_INIT.
>>>
>>> gcc/testsuite/
>>>         PR tree-optimization/81635
>>>         * gcc.dg/vect/bb-slp-pr81635.c: New test.
>>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]