This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.
Andrew Stubbs <ams@codesourcery.com> writes:
> On 17/09/18 10:14, Richard Sandiford wrote:
>> <ams@codesourcery.com> writes:
>>> If the autovectorizer tries to load a GCN 64-lane vector elementwise then it
>>> blows away the register file and produces horrible code.
>>
>> Do all the registers really need to be live at once, or is it "just" bad
>> scheduling? I'd have expected the initial rtl to load each element and
>> then insert it immediately, so that the number of insertions doesn't
>> directly affect register pressure.
>
> They don't need to be live at once, architecturally speaking, but that's
> the way it happened. No doubt there is another solution to fix it, but
> it's not a use case I believe we want to spend time optimizing.
>
> Actually, I've not tested what happens without this in GCC 9, so that's
> probably worth checking, but I'd still be concerned about it blowing up
> on real code somewhere.
>
>>> This patch simply disallows elementwise loads for such large vectors.
>>> Is there
>>> a better way to disable this in the middle-end?
>>
>> Do you ever want elementwise accesses for GCN? If not, it might be
>> better to disable them in the target's cost model.
>
> The hardware is perfectly capable of extracting or setting vector
> elements, but given that it can do full gather/scatter from arbitrary
> addresses it's not something we want to do in general.
>
> A normal scalar load will use a vector register (lane 0). The value then
> has to be moved to a scalar register, and only then can v_writelane
> insert it into the final destination.
OK, sounds like the cost of vec_construct is too low then. But looking
at the port, I see you have:
/* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST. */
int
gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost),
tree ARG_UNUSED (vectype), int ARG_UNUSED (misalign))
{
/* Always vectorize. */
return 1;
}
which short-circuits the cost-model altogether. Isn't that part
of the problem?
Richard
>
> Alternatively you could use a mask_load to load the value directly to
> the correct lane, but I don't believe that's something GCC does.
>
> Andrew