[PATCH 2/2][RFC] Add loop masking support for x86
Richard Biener
rguenther@suse.de
Wed Jul 21 06:17:54 GMT 2021
On Tue, 20 Jul 2021, Richard Biener wrote:
> On Thu, 15 Jul 2021, Richard Sandiford wrote:
>
> > Richard Biener <rguenther@suse.de> writes:
> > > The following extends the existing loop masking support using
> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > > mask using VEC_COND_EXPRs. So with --param vect-partial-vector-usage
> > > you can now enable masked vectorized epilogues (=1) or fully
> > > masked vector loops (=2).
> >
> > As mentioned on IRC, WHILE_ULT is supposed to ensure that every
> > element after the first zero is also zero. That happens naturally
> > for power-of-2 vectors if the start index is a multiple of the VF.
> > (And at the moment, variable-length vectors are the only way of
> > supporting non-power-of-2 vectors.)
> >
> > This probably works fine for =2 and =1 as things stand, since the
> > vector IVs always start at zero. But if in future we have a single
> > IV counting scalar iterations, and use it even for peeled prologue
> > iterations, we could end up with a situation where the approximation
> > is no longer safe.
> >
> > E.g. suppose we had a uint32_t scalar IV with a limit of (uint32_t)-3.
> > If we peeled 2 iterations for alignment and then had a VF of 8,
> > the final vector would have a start index of (uint32_t)-6 and the
> > vector would be { -1, -1, -1, 0, 0, 0, -1, -1 }.
> >
> > So I think it would be safer to handle this as an alternative to
> > using while, rather than as a direct emulation, so that we can take
> > the extra restrictions into account. Alternatively, we could probably
> > do { 0, 1, 2, ... } < { end - start, end - start, ... }.
>
> That doesn't end up working since in the last iteration with a
> non-zero mask we'll compare with all underflowed values (start
> will be > end). So while we compute a correct mask we cannot use
> that for loop control anymore.
Of course I can just use a signed comparison here (until we get
V128QI and a QImode iterator).
Richard.
More information about the Gcc-patches
mailing list