This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PR81635: Use chrecs to help find related data refs


On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> The first loop in the testcase regressed after my recent changes to
>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>>>> for bb analysis and end up with:
>>>>>
>>>>>    DR_BASE_ADDRESS: p or q
>>>>>    DR_OFFSET: 0
>>>>>    DR_INIT: 0 or 4
>>>>>    DR_STEP: 16
>>>>>
>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>>>>
>>>>>    DR_BASE_ADDRESS: p or q
>>>>>    DR_OFFSET: (intptr_t) i
>>>>>    DR_INIT: 0 or 4
>>>>>    DR_STEP: 0
>>>>>
>>>>> This is also what we'd like to have for the unsigned "i", but the
>>>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>>>> longer seem to be related to the [i] ones.
>>>>
>>>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>>>> I made changes in scev to better fold type conversion by various sources
>>>> of information, for example, vrp, niters, undefined overflow behavior etc.
>>>> In theory these information should be available for other optimizers without
>>>> querying scev.  For the mentioned test, vrp should compute accurate range
>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>>>> worrying
>>>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
>>>> could be more expensive (especially in case of (T)(i + j)), or because the
>>>> CST part is in bigger precision after conversion.
>>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
>>>> an interface, we changed tree-affine and made it do aggressive fold.  I am
>>>> curious if it's possible to use aff_tree to implement strip_constant_offset
>>>> here since aggressive folding is wanted.  After all, using additional chrec
>>>> looks like a little heavy wrto the simple test.
>>>
>>> Yeah, using aff_tree does work here when the bounds are constant.
>>> It doesn't look like it works for things like:
>>>
>>>     double p[1000];
>>>     double q[1000];
>>>
>>>     void
>>>     f4 (unsigned int n)
>>>     {
>>>       for (unsigned int i = 0; i < n; i += 4)
>>>         {
>>>           double a = q[i] + p[i];
>>>           double b = q[i + 1] + p[i + 1];
>>>           q[i] = a;
>>>           q[i + 1] = b;
>>>         }
>>>     }
>>>
>>> though, where the bounds on the global arrays guarantee that [i + 1] can't
>>> overflow, even though "n" is unconstrained.  The patch as posted handles
>>> this case too.
>> BTW is this a missed optimization in value range analysis?  The range
>> information for i should flow in a way like: array boundary -> niters
>> -> scev/vrp.
>> I think that's what niters/scev do in analysis.
>
> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
> the previous VRP pass came before loop header copying, so the (single)
> header has to cope with n == 0 case.  Thus we get:
Ah, there are several passes in between vrp and pass_ch, not sure if
any such pass depends on vrp intensively.  I would suggestion reorder
the two passes, or standalone VRP interface updating information for
loop region after header copied?   This is a non-trivial issue that
needs to be fixed.  Niters analyzer rely on
simplify_using_initial_conditions heavily to get the same information,
which in my opinion should be provided by VRP.  Though this won't be
able to obsolete simplify_using_initial_conditions because VRP is weak
in symbolic range...

>
>   Visiting statement:
>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;
>   Intersecting
>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   and
>     [0, 0]
>   to
>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>   Intersecting
>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>   and
>     [0, 1000]
>   to
>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>   Found new range for i_15: [0, 0]
>
>   Visiting statement:
>   _3 = i_15 + 1;
>   Match-and-simplified i_15 + 1 to 1
>   Intersecting
>     [1, 1]
>   and
>     [0, +INF]
>   to
>     [1, 1]
>   Found new range for _3: [1, 1]
>
> (where _3 is the index we care about), followed by:
>
>   Visiting statement:
>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;
>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'
>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   and
>     [0, 4]
>   to
>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   Intersecting
>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   and
>     [0, 1000]
>   to
>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   Found new range for i_15: [0, n_9(D) + 4294967295]
>
>   Visiting statement:
>   _3 = i_15 + 1;
>   Intersecting
>     [0, +INF]
>   and
>     [0, +INF]
>   to
>     [0, +INF]
>   Found new range for _3: [0, +INF]
>
> I guess in this case it would be better to intersect the i_15 ranges
> to [0, 1000] rather than [0, n_9(D) + 4294967295].
>
> It does work if another VRP pass runs after CH.  But even then,
> is it a good idea to rely on the range info being kept up-to-date
> all the way through to SLP?  A lot happens inbetween.
To some extend yes.  Now I understand that SCEV uses niters
information to prove no_overflow.  Niters analysis does infer better
information from array boundary, while VRP fails to do that.  I don't
worry much about gap between vrp pass and slp, it's basically the same
as niters.  Both information are analyzed at one point and meant to be
used by following passes.  It's also not common for vrp information
become invalid given we are on SSA?
Now that data address is not analyzed against loop, VRP would be the
only information we can use unless adding back scev analysis.  IMHO,
the patch is doing so in another way than before.  It requires
additional chrec data structure.  I remember the previous patch
enables more slp vectorization, is it because of "step" information
from scev?  In this patch, step information is simply discarded.  I am
wondering if possible to always analyze scev within innermost loop for
slp while discards step information.

Thanks,
bin
>
> FWIW, the old simple_iv check that I removed for bb data-ref analysis
> relies on SCEV analysis too, so I don't think this is more expensive
> than what we had before.
>
> Thanks,
> Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]