[PATCH, rs6000] Add expansions for min/max vector reductions

Wed Sep 16 14:36:00 GMT 2015

On Wed, Sep 16, 2015 at 10:28 AM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> A recent patch proposal from Alan Hayward
> (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00690.html) uncovered
> that the PowerPC back end doesn't have expansions for
> reduc_{smax,smin,umax,umin}_<mode> and
> reduc_{smax,smin,umax,umin}_scal_<mode> for the integer modes.  This
> prevents vectorization of reductions involving comparisons that can be
> transformed into REDUC_{MAX,MIN}_EXPR expressions.  This patch adds
> these expansions.
>
> PowerPC does not have hardware reduction instructions for maximum and
> minimum.  However, we can emulate this with varying degrees of
> efficiency for different modes.  The size of the expansion is
> logarithmic in the number of vector elements E.  The expansions for
> reduc_{smax,smin,umax,umin}_<mode> consist of log E stages, each
> comprising a rotate operation and a maximum or minimum operation.  After
> stage N, the maximum value in the vector will appear in at least 2^N
> consecutive positions in the intermediate result.
>
> The ...scal_<mode> expansions just invoke the related non-scalar
> expansions, and then extract an arbitrary element from the result
> vector.
>
> The expansions for V16QI, V8HI, and V4SI require TARGET_ALTIVEC.  The
> expansions for V2DI make use of vector instructions added for ISA 2.07,
> so they require TARGET_P8_VECTOR.
>
> I was able to use iterators for the sub-doubleword ...scal_<mode>
> expansions, but that's all.  I experimented with trying to use
> code_iterators to generate the {smax,smin,umax,umin} expansions, but
> couldn't find a way to make that work, as the substitution wasn't being
> done into the UNSPEC constants.  If there is a way to do this, please
> let me know and I'll try to reduce the code size.
>
> There are already a number of common reduction execution tests that
> exercise this logic.  I've also added PowerPC-specific code generation
> tests to verify the patterns produce what's expected.  These are based
> on the existing execution tests.
>
> Some future work will be required:
>
> (1) The vectorization cost model does not currently allow us to
> distinguish between reductions of additions and reductions of max/min.
> On PowerPC, these costs are very different, as the former is supported
> by hardware and the latter is not.  After this patch is applied, we will
> possibly vectorize some code when it's not profitable to do so.  I think
> it's probably best to go ahead with this patch now, and deal with the
> cost model as a separate issue after Alan's patch is complete and
> upstream.
>
> (2) The use of rs6000_expand_vector_extract to obtain a scalar from a
> vector is not optimal for sub-doubleword modes using the latest
> hardware.  Currently this generates a vector store followed by a scalar
> load, which is Very Bad.  We should instead use a mfvsrd and sign- or
> zero-extend the rightmost element in the result GPR.  To accomplish
> this, we should update rs6000_expand_vector_extract to do the more
> general thing:  mfvsrd, shift the selected element into the rightmost
> position, and extend it.  At that time we should change the _scal_<mode>
> expansions to select the element number that avoids the shift (that
> number will differ for BE and LE).
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2015-09-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDUC_SMIN,
>         UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, UNSPEC_REDUC_SMAX_SCAL,
>         UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL,
>         UNSPEC_REDUC_UMIN_SCAL): New enumerated constants.
>         (reduc_smax_v2di): New define_expand.
>         (reduc_smax_scal_v2di): Likewise.
>         (reduc_smin_v2di): Likewise.
>         (reduc_smin_scal_v2di): Likewise.
>         (reduc_umax_v2di): Likewise.
>         (reduc_umax_scal_v2di): Likewise.
>         (reduc_umin_v2di): Likewise.
>         (reduc_umin_scal_v2di): Likewise.
>         (reduc_smax_v4si): Likewise.
>         (reduc_smin_v4si): Likewise.
>         (reduc_umax_v4si): Likewise.
>         (reduc_umin_v4si): Likewise.
>         (reduc_smax_v8hi): Likewise.
>         (reduc_smin_v8hi): Likewise.
>         (reduc_umax_v8hi): Likewise.
>         (reduc_umin_v8hi): Likewise.
>         (reduc_smax_v16qi): Likewise.
>         (reduc_smin_v16qi): Likewise.
>         (reduc_umax_v16qi): Likewise.
>         (reduc_umin_v16qi): Likewise.
>         (reduc_smax_scal_<mode>): Likewise.
>         (reduc_smin_scal_<mode>): Likewise.
>         (reduc_umax_scal_<mode>): Likewise.
>         (reduc_umin_scal_<mode>): Likewise.
>
> [gcc/testsuite]
>
> 2015-09-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/vect-reduc-minmax-char.c: New.
>         * gcc.target/powerpc/vect-reduc-minmax-short.c: New.
>         * gcc.target/powerpc/vect-reduc-minmax-int.c: New.
>         * gcc.target/powerpc/vect-reduc-minmax-long.c: New.

This is okay.

I don't think that I have seen iterators for UNSPECs, but maybe
someone else is aware of the right idiom.

Thanks, David