This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Use SCEV information when aligning for vectorisation (PR 84005)
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Index: gcc/tree-data-ref.c
>>>> ===================================================================
>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000
>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000
>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d
>>>> return alignment;
>>>> }
>>>>
>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it
>>>> + is based on. Return this object X on success and store the alignment
>>>> + in bytes of BASE - &X in *ALIGNMENT_OUT. */
>>>> +
>>>> +static tree
>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)
>>>> +{
>>>> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))
>>>> + return NULL_TREE;
>>>> +
>>>> + gimple *def = SSA_NAME_DEF_STMT (base);
>>>> + base = analyze_scalar_evolution (loop_containing_stmt (def), base);
>>>
>>> I think it is an error to use the def stmt of base here. Instead you
>>> need to pass down the point/loop of the use. For the same reason you
>>> also want to instantiate parameters after analyzing the evolution.
>>>
>>> In the end, if the BB to be vectorized is contained in a loop nest we want to
>>> instantiate up to the point where (eventually) a DECL we can re-align
>>> appears,
>>> but using the containing loop for now looks OK.
>>
>> Why's that necessary/better though? We're not interested in the
>> evolution of the value at the point it*s used by the potential vector
>> code, but in how we get from the ultimate base (if there is one) to the
>> definition of the SSA name.
>
> Hmm, ok.
>
>> I don't think instantiating the SCEV would give any new information,
>> but it could lose some. E.g. if we have:
>>
>> x = &foo;
>> do
>> x += 8;
>> while (*y++ < 10)
>> ...potential vector use of *x...
>>
>> we wouldn't be able to express the address of x after the do-while
>> loop, because there's nothing that counts the number of iterations.
>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for
>> some (unknown) N.
>
> Not sure that it works that way. I'm not 100% sure what kind of
> parameters are left symbolic, and if analysis loop and instantiation
> "loop" are the same I guess it doesn't make any difference...
>
>> (OK, so that doesn't actually work unless we take the effort
>> to look through loop-closed SSA form, but still...)
>>
>>>> + /* Peel chrecs and record the minimum alignment preserved by
>>>> + all steps. */
>>>> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
>>>> + while (TREE_CODE (base) == POLYNOMIAL_CHREC)
>>>> + {
>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT
>>>> (base));
>>>> + alignment = MIN (alignment, step_alignment);
>>>> + base = CHREC_LEFT (base);
>>>> + }
>>>> +
>>>> + /* Punt if the expression is too complicated to handle. */
>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE
>>>> (base)))
>>>> + return NULL_TREE;
>>>> +
>>>> + /* Analyze the base to which the steps we peeled were applied. */
>>>> + poly_int64 bitsize, bitpos, bytepos;
>>>> + machine_mode mode;
>>>> + int unsignedp, reversep, volatilep;
>>>> + tree offset;
>>>> + base = get_inner_reference (build_fold_indirect_ref (base),
>>>
>>> ick :/
>>>
>>> what happens if you simply use get_pointer_alignment here and
>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically
>>> what get_pointer_alignment_1 does) After all get_base_for_alignment_1
>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4].
>>
>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost
>> already, which should have stripped off both the constant and variable
>> parts of the offset. So I think SSA names are the only interesting
>> input. The problem is that once we follow the definitions of an SSA
>> name through CHREC_LEFTs, we get a general address again.
>>
>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want
>> because they take the alignment of the existing object into account,
>> whereas here we explicitly want to ignore that and only calculate the
>> alignment of the offset.
>
> Ah, indeed - I missed that fact.
>
>> I guess we could pass a flag down to ignore the current alignment though?
>
> But we need the base anyway... so, given we can only ever re-align decls
> and never plain pointers instead of using build_fold_indirect_ref do
>
> if (TREE_CODE (base) != ADDR_EXPR)
> return NULL_TREE;
>
> else use TREE_OPERAND (base, 0)?
The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR,
which is what we get for things like the do-while above (e.g.
{ &a + 64, +, 64 }_n if x points to a double *.)
I guess everything we care about would be handled by fold_indirect_ref_1
though, so would that be OK instead?
Thanks,
Richard