This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PR81635: Use chrecs to help find related data refs
On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> The first loop in the testcase regressed after my recent changes to
>>> dr_analyze_innermost. Previously we would treat "i" as an iv even
>>> for bb analysis and end up with:
>>>
>>> DR_BASE_ADDRESS: p or q
>>> DR_OFFSET: 0
>>> DR_INIT: 0 or 4
>>> DR_STEP: 16
>>>
>>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>>
>>> DR_BASE_ADDRESS: p or q
>>> DR_OFFSET: (intptr_t) i
>>> DR_INIT: 0 or 4
>>> DR_STEP: 0
>>>
>>> This is also what we'd like to have for the unsigned "i", but the
>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>> longer seem to be related to the [i] ones.
>>
>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>> I made changes in scev to better fold type conversion by various sources
>> of information, for example, vrp, niters, undefined overflow behavior etc.
>> In theory these information should be available for other optimizers without
>> querying scev. For the mentioned test, vrp should compute accurate range
>> information for "i" so that we can fold (intptr_t) (i + 1) it without worrying
>> overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1
>> could be more expensive (especially in case of (T)(i + j)), or because the
>> CST part is in bigger precision after conversion.
>> But such folding is wanted in several places, e.g, IVOPTs. To provide such
>> an interface, we changed tree-affine and made it do aggressive fold. I am
>> curious if it's possible to use aff_tree to implement strip_constant_offset
>> here since aggressive folding is wanted. After all, using additional chrec
>> looks like a little heavy wrto the simple test.
>
> Yeah, using aff_tree does work here when the bounds are constant.
> It doesn't look like it works for things like:
>
> double p[1000];
> double q[1000];
>
> void
> f4 (unsigned int n)
> {
> for (unsigned int i = 0; i < n; i += 4)
> {
> double a = q[i] + p[i];
> double b = q[i + 1] + p[i + 1];
> q[i] = a;
> q[i + 1] = b;
> }
> }
>
> though, where the bounds on the global arrays guarantee that [i + 1] can't
> overflow, even though "n" is unconstrained. The patch as posted handles
> this case too.
BTW is this a missed optimization in value range analysis? The range
information for i should flow in a way like: array boundary -> niters
-> scev/vrp.
I think that's what niters/scev do in analysis.
Thanks,
bin
>
> Thanks,
> Richard
>
>>
>> Thanks,
>> bin
>>>
>>> There seem to be two main uses of DR_OFFSET and DR_INIT. One type
>>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
>>> gives the initial offset from DR_BASE_ADDRESS. The other type uses
>>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
>>> references, with the distance between their start addresses being
>>> the difference between their DR_INITs. For this second type, the
>>> exact form of DR_OFFSET doesn't really matter, and the more it is
>>> able to canonicalise the non-constant offset, the better.
>>>
>>> The patch fixes the PR by adding another offset/init pair to the
>>> innermost loop behaviour for this second group. The new pair use chrecs
>>> rather than generic exprs for the offset part, with the chrec being
>>> analysed in the innermost loop for which the offset isn't invariant.
>>> This allows us to vectorise the second function in the testcase
>>> as well, which wasn't possible before the earlier patch.
>>>
>>> Tested on x86_64-linux-gnu and aarch64-linux-gnu. OK to install?
>>>
>>> Richard
>>>
>>>
>>> gcc/
>>> PR tree-optimization/81635
>>> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>>> chrec_init.
>>> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>>> (dr_equal_offsets_p): Delete.
>>> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>>> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>>> (split_constant_offset): Handle POLYNOMIAL_CHREC.
>>> (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
>>> (operator ==): Use dr_chrec_offsets_equal_p and compare the
>>> DR_CHREC_INITs.
>>> (prune_runtime_alias_test_list): Likewise.
>>> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
>>> the DR_CHREC_INITs.
>>> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>>> (analyze_offset_scev): New function.
>>> (compute_offset_chrecs): Likewise.
>>> (dr_chrec_offsets_equal_p): Likewise.
>>> (dr_chrec_offsets_compare): Likewise.
>>> * tree-loop-distribution.c (compute_alias_check_pairs): Use
>>> dr_chrec_offsets_compare.
>>> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
>>> dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
>>> (dr_group_sort_cmp): Likewise.
>>> (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
>>> (vect_no_alias_p): Likewise.
>>> (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
>>> compare the DR_CHREC_INITs.
>>> (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
>>> * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
>>> of DR_INIT.
>>>
>>> gcc/testsuite/
>>> PR tree-optimization/81635
>>> * gcc.dg/vect/bb-slp-pr81635.c: New test.
>>>