This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Tweak BB analysis for dr_analyze_innermost
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <richard dot sandiford at linaro dot org>
- Date: Thu, 29 Jun 2017 12:40:23 +0200
- Subject: Re: Tweak BB analysis for dr_analyze_innermost
- Authentication-results: sourceware.org; auth=none
- References: <87r2y46xpp.fsf@linaro.org> <CAFiYyc2TZOn=ERAPfWOJyAhybLCoHLG9ET=wBb7d+BFOaZo39g@mail.gmail.com> <87r2y39ja1.fsf@linaro.org>
On Thu, Jun 29, 2017 at 12:32 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Wed, Jun 28, 2017 at 3:36 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> dr_analyze_innermost had a "struct loop *nest" parameter that acted
>>> like a boolean. This was added in r179161, with the idea that a
>>> null nest selected BB-level analysis rather than loop analysis.
>>>
>>> The handling seemed strange though. If the DR was part of a loop,
>>> we still tried to express the base and offset values as IVs, potentially
>>> giving a nonzero step. If that failed for any reason, we'd revert to
>>> using the original base and offset, just as we would if we hadn't asked
>>> for an IV in the first place.
>>>
>>> It seems more natural to use the !in_loop handling whenever nest is null
>>> and always set the step to zero. This actually enables one more SLP
>>> opportunity in bb-slp-pr65935.c.
>>>
>>> I checked out r179161 and tried the patch there. The test case added
>>> in that revision still passes, so I don't think there was any particular
>>> need to check simple_iv.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>>
>> I have a few additional comments for consideration. I remembered code
>> in vect_compute_data_ref_alignment explicitely looking at DR_STEP in
>> BB mode:
>>
>> /* Similarly we can only use base and misalignment information relative to
>> an innermost loop if the misalignment stays the same throughout the
>> execution of the loop. As above, this is the case if the stride of
>> the dataref evenly divides by the vector size. */
>> else
>> {
>> tree step = DR_STEP (dr);
>> unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;
>>
>> if (tree_fits_shwi_p (step)
>> && ((tree_to_shwi (step) * vf)
>> % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
>> {
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> "step doesn't divide the vector-size.\n");
>> misalign = NULL_TREE;
>> }
>> }
>>
>> so I guess with the change we may end up with worse (misalign is
>> always NULL?) or bogus (with DR_STEP == 0 the test is always false)
>> alignment analysis results for BB vectorization? I guess worse
>> or we had the issue before if DR_STEP was just not analyzable.
>
> DR_STEP will always be zero for bb vectorisation, so the results
> shouldn't get worse. But the value that was previously a nonzero
> step is now part of the base or offset instead (the choice between
> the two being the same as it would be for get_inner_reference).
> We still take the alignments of those into account, so I think
> we should be safe (at least after DR_BASE_ALIGNMENT).
I think for
DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));
we have a harder time extracting alignment from sth like (sizetype)i * 4 + 8
than if the base is zero. But yes, in principle we can do that and hopefully
SCEV analysis is not much more powerful than highest_pow2_factor.
> Like you say, we previously had the same situation for bases
> that weren't simple IVs, or for bases that were simple IVs but
> had an invariant rather than constant step.
Yes.
>> The testcase that now gets vectorized tried to use the asm to prevent
>> vectorization but that of course doesn't work for BB vectorization.
>> Using volatile stores might. I guess the chance that we miscompile
>> both loops in a way that we still pass the testcase (even the compare
>> loop could be BB vectorized I guess) is unlikely.
>
> Putting an extra volatile asm between the real and imag stores stops
> the vectorisation, if you'd prefer that.
No, your patch is fine.
>> The patch is ok but I guess we need to keep an eye on BB vectorization
>> results for targets w/o unaligned vector loads/stores for the above
>> alignment issue?
Still keep an eye on SCEV vs. highest_pow2_factor from testsuite fallout.
Patch still ok,
Richard.
> Thanks,
> Richard