This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR


On Thu, 13 Nov 2014, Alan Lawrence wrote:

> Hmmm, I see. Yes, that would be another way. It has me wondering however
> exactly where to draw the line, as to what permutations
> TARGET_VECTORIZE_VEC_PERM_CONST_OK should allow / how much effort it should
> go to. For example, should said hook return true for any combination of two
> supported permutations, i.e. of the form (perm1(vec1) & mask) |
> (perm2(vec2) & ~mask) ?

Well, if it can expand them then yes ;)  Not sure how you get to
two permutations though - you can't ask that with the current
interface of TARGET_VECTORIZE_VEC_PERM_CONST_OK.  Thus the second
permutation has to be the identity.

Also (perm1(vec) & mask) | (perm2(vec2) & ~mask) is equal to
perm(vec, vec2, perm1 & mask | (perm2 + nelements) & ~mask)

that is, doing a single two-vector permute with a properly combined
mask.

> I would probably argue not, but on the other hand, if the hook returns
> false for that - will the generic vec-perm machinery consider the
> possibility, i.e. individually checking both perm1 and perm2? What if
> vec1==vec2, we can't really search to find if any appropriate perm1+perm2
> exist...

?  I was only asking for special-casing the case where the 2nd
vector is used unpermuted which would match shifts if it is zero.

> Looking ahead to a proper solution, a few options struck me...any of which
> would take quite a bit more fleshing out before they could be
> implemented....
> (1) provide a hook, whereby targets can specify a vec_perm mask that they
> do implement, and that is usable for performing
> reductions-via-shift(-like-operation)s. For example, I note MIPS implements
> direct reductions via a scheme that is very similar to
> reductions-via-shifts, but the first step is not a shift but a different
> operation - IIUC, because like many platforms they can't do a shift on such
> a big vector, but can do after that.
> (2) extend the various vec_perm_const hooks, to allow specifying a "don't
> care" in the mask, i.e. the platform can output any result in there.
> Theoretically, this strikes me as the "correct" solution, however from an
> engineering perspective I can see (possibly fatal) issues: it almost seems
> to invite code to come to depend on particular values of the unspecified
> elements (e.g., what does constant-folding do).
> (3) providing known-constant values to the vectorization hooks....this
> seems far too messy, but there is a possibly-useful case even with
> non-constant values, i.e. where multiple elements are known to be the same
> (==> like a special form of the previous, "dont-care-which" of a subset).

I don't see why we would need any of this special things.  What could
be useful is asking for zero/all-ones elements explicitely as this
is what hardware usually allows.  This would mean adding special
values for the mask, like for example using -1 for "zero this element"
and -2 for "put all-ones into this element".

> For the moment, however: powerpc64-unknown-linux-gnu tests come back
> all-clear (with previously posted vec_shr fix which David Edelsohn OK'd "if
> no regressions"), and I've done a bootstrap+check-gcc on
> arm-none-linux-gnueabihf too. I'll commit the first three patches this
> afternoon, the fourth....soon after ;)

Nice.

Thanks again for working on this,
Richard.

> Cheers, Alan
> 
> 
> On 13 November 2014 09:07, Richard Biener <rguenther@suse.de> wrote:
> 
> > On Wed, 12 Nov 2014, Alan Lawrence wrote:
> >
> > > In response to https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01803.html,
> > this
> > > series removes the VEC_RSHIFT_EXPR, instead using a VEC_PERM_EXPR (with a
> > > second argument full of constant zeroes) to represent the shift.
> > >
> > > I've kept the use of vec_shr optab for platforms that define it, as even
> > on
> > > platforms with a whole-vector-shift operation, this typically does not
> > work as
> > > a vec-perm on arbitrary vectors (the shift will pull in zeroes from the
> > end,
> > > whereas TARGET_VECTORIZE_VEC_PERM_CONST_OK and related machinery allows
> > only
> > > to check for a shift-like permutation that will work for two arbitrary
> > > vectors).
> >
> > That's reasonable for the moment though I expected to use
> >
> >  VEC_PERM <v4si, { 0, 0, 0, 0 }, { 4, 5, 0, 1 }>
> >
> > for the shift - thus the shifted in vector elements should map
> > 1:1 from the 2nd vector.  This means that the target can
> > answer "yes" to vec_perm_const_ok (v4si, ...) which such
> > a permute if it can shift in zeros as it then can do
> >
> >   tem = shift-in-zeros
> >   tem2 = vec2 & ~<mask to clear not wanted stuff>
> >   perm_result = tem | tem2;
> >
> > that is, simply OR in the wanted parts of the 2nd vector.  Of
> > course the actual expansion can special-case a constant or
> > zero 2nd vector.
> >
> > Usually targets provide a way of setting vector elements to
> > all ones or zero with their permute instructions as well.
> >
> > Of course the above requires adjustments to all targets vec_perm_const_ok
> > hooks and vec_perm_const expanders so for now asking for vec_shr
> > is ok, but long-term it shouldn't be needed, even without changes
> > to the vec_perm_const_ok interface.
> >
> > Thanks,
> > Richard.
> >
> > > I've also changed from the endianness-dependent shift direction of the
> > old
> > > VEC_RSHIFT_EXPR, to an endian-neutral direction (VEC_PERM_EXPR is
> > inherently
> > > endian-neutral), changing the meaning of vec_shr_optab to match (as I
> > did in
> > > https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html). As
> > previously, this
> > > will break any *bigendian* platform defining vec_shr; I see MIPS and
> > RS6000,
> > > but candidate fixes for both of these have already been posted:
> > >
> > > (for MIPS) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html,
> > although
> > > I have not been able to test this as there doesn't seem to be any working
> > > MIPS/Loongson hardware in the Compile Farm;
> > >
> > > (for PowerPC) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01480.html,
> > > testing in progress.
> > >
> > > ARM defines vec_shr only for little-endian; AArch64 does not yet,
> > although in
> > > previous patch series I both added a vec_shr and made it endian-neutral
> > (I
> > > will post revised patches for both of these shortly).
> > >
> > > Bootstrapped and check-gcc on x86-none-linux-gnu and arm-none-linux-gnu;
> > > cross-tested on aarch64{,_be}-none-elf (FWIW, both with and without
> > previous
> > > patches adding a vec_shr pattern)
> > >
> > > Ok for trunk if no regressions on PowerPC?
> > >
> > > Thanks, Alan
> >
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]