[0/7] Tweak vector load/store code
Richard Sandiford
richard.sandiford@arm.com
Wed Jun 15 08:48:00 GMT 2016
This patch series adds a new enum and routines for classifying a vector
load or store implementation. Originally there were three motivations:
(1) Reduce cut-&-paste
(2) Make the chosen vectorisation strategy more obvious. At the
moment this is derived implicitly from various other bits of
state (GROUPED, STRIDED, SLP, etc.)
(3) Decouple the vectorisation strategy from those other bits of state,
so that there can be a choice of implementation for a given scalar
statement. The specific problem here is that we class:
for (...)
{
... = a[i * x];
... = a[i * x + 1];
}
as "strided and grouped" but:
for (...)
{
... = a[i * 7];
... = a[i * 7 + 1];
}
as "non-strided and grouped". Before the patches, "strided and
grouped" loads would always try to use separate scalar loads
while "non-strided and grouped" loads would always try to use
load-and-permute. But load-and-permute is never supported for
a group size of 7, so the effect was that the first loop was
vectorisable and the second wasn't. It seemed odd that not
knowing x (but accepting it could be 7) would allow more
optimisation opportunities than knowing x is 7.
Unfortunately, it looks like we underestimate the cost of separate
scalar accesses on at least aarch64, so I've disabled (3) for now;
see the "if" statement at the end of get_load_store_type in patch 6.
I think the series still does (1) and (2) though, so that's the
justification for it in its current form. It also means that (3)
is now simply a case of removing the FIXME code, once the cost model
problems have been sorted out. (I did wonder about adding a --param,
but that seems overkill. I hope to get back to this during GCC 7 stage 1.)
Thanks,
Richard
More information about the Gcc-patches
mailing list