This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, rs6000] Add expansions for min/max vector reductions


Hi,

A recent patch proposal from Alan Hayward
(https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00690.html) uncovered
that the PowerPC back end doesn't have expansions for
reduc_{smax,smin,umax,umin}_<mode> and
reduc_{smax,smin,umax,umin}_scal_<mode> for the integer modes.  This
prevents vectorization of reductions involving comparisons that can be
transformed into REDUC_{MAX,MIN}_EXPR expressions.  This patch adds
these expansions.

PowerPC does not have hardware reduction instructions for maximum and
minimum.  However, we can emulate this with varying degrees of
efficiency for different modes.  The size of the expansion is
logarithmic in the number of vector elements E.  The expansions for
reduc_{smax,smin,umax,umin}_<mode> consist of log E stages, each
comprising a rotate operation and a maximum or minimum operation.  After
stage N, the maximum value in the vector will appear in at least 2^N
consecutive positions in the intermediate result.

The ...scal_<mode> expansions just invoke the related non-scalar
expansions, and then extract an arbitrary element from the result
vector.

The expansions for V16QI, V8HI, and V4SI require TARGET_ALTIVEC.  The
expansions for V2DI make use of vector instructions added for ISA 2.07,
so they require TARGET_P8_VECTOR.

I was able to use iterators for the sub-doubleword ...scal_<mode>
expansions, but that's all.  I experimented with trying to use
code_iterators to generate the {smax,smin,umax,umin} expansions, but
couldn't find a way to make that work, as the substitution wasn't being
done into the UNSPEC constants.  If there is a way to do this, please
let me know and I'll try to reduce the code size.

There are already a number of common reduction execution tests that
exercise this logic.  I've also added PowerPC-specific code generation
tests to verify the patterns produce what's expected.  These are based
on the existing execution tests.

Some future work will be required:

(1) The vectorization cost model does not currently allow us to
distinguish between reductions of additions and reductions of max/min.
On PowerPC, these costs are very different, as the former is supported
by hardware and the latter is not.  After this patch is applied, we will
possibly vectorize some code when it's not profitable to do so.  I think
it's probably best to go ahead with this patch now, and deal with the
cost model as a separate issue after Alan's patch is complete and
upstream.

(2) The use of rs6000_expand_vector_extract to obtain a scalar from a
vector is not optimal for sub-doubleword modes using the latest
hardware.  Currently this generates a vector store followed by a scalar
load, which is Very Bad.  We should instead use a mfvsrd and sign- or
zero-extend the rightmost element in the result GPR.  To accomplish
this, we should update rs6000_expand_vector_extract to do the more
general thing:  mfvsrd, shift the selected element into the rightmost
position, and extend it.  At that time we should change the _scal_<mode>
expansions to select the element number that avoids the shift (that
number will differ for BE and LE).

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2015-09-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDUC_SMIN,
	UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, UNSPEC_REDUC_SMAX_SCAL,
	UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL,
	UNSPEC_REDUC_UMIN_SCAL): New enumerated constants.
	(reduc_smax_v2di): New define_expand.
	(reduc_smax_scal_v2di): Likewise.
	(reduc_smin_v2di): Likewise.
	(reduc_smin_scal_v2di): Likewise.
	(reduc_umax_v2di): Likewise.
	(reduc_umax_scal_v2di): Likewise.
	(reduc_umin_v2di): Likewise.
	(reduc_umin_scal_v2di): Likewise.
	(reduc_smax_v4si): Likewise.
	(reduc_smin_v4si): Likewise.
	(reduc_umax_v4si): Likewise.
	(reduc_umin_v4si): Likewise.
	(reduc_smax_v8hi): Likewise.
	(reduc_smin_v8hi): Likewise.
	(reduc_umax_v8hi): Likewise.
	(reduc_umin_v8hi): Likewise.
	(reduc_smax_v16qi): Likewise.
	(reduc_smin_v16qi): Likewise.
	(reduc_umax_v16qi): Likewise.
	(reduc_umin_v16qi): Likewise.
	(reduc_smax_scal_<mode>): Likewise.
	(reduc_smin_scal_<mode>): Likewise.
	(reduc_umax_scal_<mode>): Likewise.
	(reduc_umin_scal_<mode>): Likewise.

[gcc/testsuite]

2015-09-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/vect-reduc-minmax-char.c: New.
	* gcc.target/powerpc/vect-reduc-minmax-short.c: New.
	* gcc.target/powerpc/vect-reduc-minmax-int.c: New.
	* gcc.target/powerpc/vect-reduc-minmax-long.c: New.


Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 227817)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -87,6 +87,14 @@
    UNSPEC_GET_VRSAVE
    UNSPEC_LVX
    UNSPEC_REDUC_PLUS
+   UNSPEC_REDUC_SMAX
+   UNSPEC_REDUC_SMIN
+   UNSPEC_REDUC_UMAX
+   UNSPEC_REDUC_UMIN
+   UNSPEC_REDUC_SMAX_SCAL
+   UNSPEC_REDUC_SMIN_SCAL
+   UNSPEC_REDUC_UMAX_SCAL
+   UNSPEC_REDUC_UMIN_SCAL
    UNSPEC_VECSH
    UNSPEC_EXTEVEN_V4SI
    UNSPEC_EXTEVEN_V8HI
@@ -2690,6 +2698,430 @@
   DONE;
 })
 
+;; Strategy used here guarantees that after round N, the maximum value
+;; will appear in 2^N adjacent positions (where adjacent can include the
+;; lowest and highest positions being adjacent).  Thus after N = log E
+;; rounds, where E is the number of vector elements, the maximum value
+;; will appear in every position in the vector.
+;;
+;; Note that reduc_s{max,min}_v{2d,4s}f already exist in vector.md.
+;;
+(define_expand "reduc_smax_v2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+        (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMAX))]
+  "TARGET_P8_VECTOR"
+{
+  rtx shifted = gen_reg_rtx (V2DImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v2di (shifted, op1, op1, GEN_INT (8)));
+  emit_insn (gen_smaxv2di3 (operands[0], op1, shifted));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_v2di"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMAX_SCAL))]
+  "TARGET_P8_VECTOR"
+{
+  rtx reduc = gen_reg_rtx (V2DImode);
+  emit_insn (gen_reduc_smax_v2di (reduc, operands[1]));
+  rs6000_expand_vector_extract (operands[0], reduc, 0);
+  DONE;
+})
+
+(define_expand "reduc_smin_v2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+        (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMIN))]
+  "TARGET_P8_VECTOR"
+{
+  rtx shifted = gen_reg_rtx (V2DImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v2di (shifted, op1, op1, GEN_INT (8)));
+  emit_insn (gen_sminv2di3 (operands[0], op1, shifted));
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_v2di"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMIN_SCAL))]
+  "TARGET_P8_VECTOR"
+{
+  rtx reduc = gen_reg_rtx (V2DImode);
+  emit_insn (gen_reduc_smin_v2di (reduc, operands[1]));
+  rs6000_expand_vector_extract (operands[0], reduc, 0);
+  DONE;
+})
+
+(define_expand "reduc_umax_v2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+        (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMAX))]
+  "TARGET_P8_VECTOR"
+{
+  rtx shifted = gen_reg_rtx (V2DImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v2di (shifted, op1, op1, GEN_INT (8)));
+  emit_insn (gen_umaxv2di3 (operands[0], op1, shifted));
+  DONE;
+})
+
+(define_expand "reduc_umax_scal_v2di"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMAX_SCAL))]
+  "TARGET_P8_VECTOR"
+{
+  rtx reduc = gen_reg_rtx (V2DImode);
+  emit_insn (gen_reduc_umax_v2di (reduc, operands[1]));
+  rs6000_expand_vector_extract (operands[0], reduc, 0);
+  DONE;
+})
+
+(define_expand "reduc_umin_v2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+        (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMIN))]
+  "TARGET_P8_VECTOR"
+{
+  rtx shifted = gen_reg_rtx (V2DImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v2di (shifted, op1, op1, GEN_INT (8)));
+  emit_insn (gen_uminv2di3 (operands[0], op1, shifted));
+  DONE;
+})
+
+(define_expand "reduc_umin_scal_v2di"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMIN_SCAL))]
+  "TARGET_P8_VECTOR"
+{
+  rtx reduc = gen_reg_rtx (V2DImode);
+  emit_insn (gen_reduc_umin_v2di (reduc, operands[1]));
+  rs6000_expand_vector_extract (operands[0], reduc, 0);
+  DONE;
+})
+
+(define_expand "reduc_smax_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMAX))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V4SImode);
+  rtx shifted2 = gen_reg_rtx (V4SImode);
+  rtx max1 = gen_reg_rtx (V4SImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v4si (shifted1, op1, op1, GEN_INT (4)));
+  emit_insn (gen_smaxv4si3 (max1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v4si (shifted2, max1, max1, GEN_INT (8)));
+  emit_insn (gen_smaxv4si3 (operands[0], max1, shifted2));
+  DONE;
+})
+
+(define_expand "reduc_smin_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMIN))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V4SImode);
+  rtx shifted2 = gen_reg_rtx (V4SImode);
+  rtx min1 = gen_reg_rtx (V4SImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v4si (shifted1, op1, op1, GEN_INT (4)));
+  emit_insn (gen_sminv4si3 (min1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v4si (shifted2, min1, min1, GEN_INT (8)));
+  emit_insn (gen_sminv4si3 (operands[0], min1, shifted2));
+  DONE;
+})
+
+(define_expand "reduc_umax_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMAX))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V4SImode);
+  rtx shifted2 = gen_reg_rtx (V4SImode);
+  rtx max1 = gen_reg_rtx (V4SImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v4si (shifted1, op1, op1, GEN_INT (4)));
+  emit_insn (gen_umaxv4si3 (max1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v4si (shifted2, max1, max1, GEN_INT (8)));
+  emit_insn (gen_umaxv4si3 (operands[0], max1, shifted2));
+  DONE;
+})
+
+(define_expand "reduc_umin_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMIN))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V4SImode);
+  rtx shifted2 = gen_reg_rtx (V4SImode);
+  rtx min1 = gen_reg_rtx (V4SImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v4si (shifted1, op1, op1, GEN_INT (4)));
+  emit_insn (gen_uminv4si3 (min1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v4si (shifted2, min1, min1, GEN_INT (8)));
+  emit_insn (gen_uminv4si3 (operands[0], min1, shifted2));
+  DONE;
+})
+
+(define_expand "reduc_smax_v8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMAX))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V8HImode);
+  rtx shifted2 = gen_reg_rtx (V8HImode);
+  rtx shifted3 = gen_reg_rtx (V8HImode);
+  rtx max1 = gen_reg_rtx (V8HImode);
+  rtx max2 = gen_reg_rtx (V8HImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted1, op1, op1, GEN_INT (2)));
+  emit_insn (gen_smaxv8hi3 (max1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted2, max1, max1, GEN_INT (4)));
+  emit_insn (gen_smaxv8hi3 (max2, max1, shifted2));
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted3, max2, max2, GEN_INT (8)));
+  emit_insn (gen_smaxv8hi3 (operands[0], max2, shifted3));
+  DONE;
+})
+
+(define_expand "reduc_smin_v8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMIN))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V8HImode);
+  rtx shifted2 = gen_reg_rtx (V8HImode);
+  rtx shifted3 = gen_reg_rtx (V8HImode);
+  rtx min1 = gen_reg_rtx (V8HImode);
+  rtx min2 = gen_reg_rtx (V8HImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted1, op1, op1, GEN_INT (2)));
+  emit_insn (gen_sminv8hi3 (min1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted2, min1, min1, GEN_INT (4)));
+  emit_insn (gen_sminv8hi3 (min2, min1, shifted2));
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted3, min2, min2, GEN_INT (8)));
+  emit_insn (gen_sminv8hi3 (operands[0], min2, shifted3));
+  DONE;
+})
+
+(define_expand "reduc_umax_v8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMAX))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V8HImode);
+  rtx shifted2 = gen_reg_rtx (V8HImode);
+  rtx shifted3 = gen_reg_rtx (V8HImode);
+  rtx max1 = gen_reg_rtx (V8HImode);
+  rtx max2 = gen_reg_rtx (V8HImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted1, op1, op1, GEN_INT (2)));
+  emit_insn (gen_umaxv8hi3 (max1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted2, max1, max1, GEN_INT (4)));
+  emit_insn (gen_umaxv8hi3 (max2, max1, shifted2));
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted3, max2, max2, GEN_INT (8)));
+  emit_insn (gen_umaxv8hi3 (operands[0], max2, shifted3));
+  DONE;
+})
+
+(define_expand "reduc_umin_v8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMIN))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V8HImode);
+  rtx shifted2 = gen_reg_rtx (V8HImode);
+  rtx shifted3 = gen_reg_rtx (V8HImode);
+  rtx min1 = gen_reg_rtx (V8HImode);
+  rtx min2 = gen_reg_rtx (V8HImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted1, op1, op1, GEN_INT (2)));
+  emit_insn (gen_uminv8hi3 (min1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted2, min1, min1, GEN_INT (4)));
+  emit_insn (gen_uminv8hi3 (min2, min1, shifted2));
+  emit_insn (gen_altivec_vsldoi_v8hi (shifted3, min2, min2, GEN_INT (8)));
+  emit_insn (gen_uminv8hi3 (operands[0], min2, shifted3));
+  DONE;
+})
+
+(define_expand "reduc_smax_v16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+        (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
+                      UNSPEC_REDUC_SMAX))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V16QImode);
+  rtx shifted2 = gen_reg_rtx (V16QImode);
+  rtx shifted3 = gen_reg_rtx (V16QImode);
+  rtx shifted4 = gen_reg_rtx (V16QImode);
+  rtx max1 = gen_reg_rtx (V16QImode);
+  rtx max2 = gen_reg_rtx (V16QImode);
+  rtx max3 = gen_reg_rtx (V16QImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted1, op1, op1, const1_rtx));
+  emit_insn (gen_smaxv16qi3 (max1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted2, max1, max1, GEN_INT (2)));
+  emit_insn (gen_smaxv16qi3 (max2, max1, shifted2));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted3, max2, max2, GEN_INT (4)));
+  emit_insn (gen_smaxv16qi3 (max3, max2, shifted3));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted4, max3, max3, GEN_INT (8)));
+  emit_insn (gen_smaxv16qi3 (operands[0], max3, shifted4));
+  DONE;
+})
+
+(define_expand "reduc_smin_v16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+        (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
+                      UNSPEC_REDUC_SMIN))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V16QImode);
+  rtx shifted2 = gen_reg_rtx (V16QImode);
+  rtx shifted3 = gen_reg_rtx (V16QImode);
+  rtx shifted4 = gen_reg_rtx (V16QImode);
+  rtx min1 = gen_reg_rtx (V16QImode);
+  rtx min2 = gen_reg_rtx (V16QImode);
+  rtx min3 = gen_reg_rtx (V16QImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted1, op1, op1, const1_rtx));
+  emit_insn (gen_sminv16qi3 (min1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted2, min1, min1, GEN_INT (2)));
+  emit_insn (gen_sminv16qi3 (min2, min1, shifted2));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted3, min2, min2, GEN_INT (4)));
+  emit_insn (gen_sminv16qi3 (min3, min2, shifted3));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted4, min3, min3, GEN_INT (8)));
+  emit_insn (gen_sminv16qi3 (operands[0], min3, shifted4));
+  DONE;
+})
+
+(define_expand "reduc_umax_v16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+        (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
+                      UNSPEC_REDUC_UMAX))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V16QImode);
+  rtx shifted2 = gen_reg_rtx (V16QImode);
+  rtx shifted3 = gen_reg_rtx (V16QImode);
+  rtx shifted4 = gen_reg_rtx (V16QImode);
+  rtx max1 = gen_reg_rtx (V16QImode);
+  rtx max2 = gen_reg_rtx (V16QImode);
+  rtx max3 = gen_reg_rtx (V16QImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted1, op1, op1, const1_rtx));
+  emit_insn (gen_umaxv16qi3 (max1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted2, max1, max1, GEN_INT (2)));
+  emit_insn (gen_umaxv16qi3 (max2, max1, shifted2));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted3, max2, max2, GEN_INT (4)));
+  emit_insn (gen_umaxv16qi3 (max3, max2, shifted3));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted4, max3, max3, GEN_INT (8)));
+  emit_insn (gen_umaxv16qi3 (operands[0], max3, shifted4));
+  DONE;
+})
+
+(define_expand "reduc_umin_v16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+        (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
+                      UNSPEC_REDUC_UMIN))]
+  "TARGET_ALTIVEC"
+{
+  rtx shifted1 = gen_reg_rtx (V16QImode);
+  rtx shifted2 = gen_reg_rtx (V16QImode);
+  rtx shifted3 = gen_reg_rtx (V16QImode);
+  rtx shifted4 = gen_reg_rtx (V16QImode);
+  rtx min1 = gen_reg_rtx (V16QImode);
+  rtx min2 = gen_reg_rtx (V16QImode);
+  rtx min3 = gen_reg_rtx (V16QImode);
+  rtx op1 = operands[1];
+
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted1, op1, op1, const1_rtx));
+  emit_insn (gen_uminv16qi3 (min1, op1, shifted1));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted2, min1, min1, GEN_INT (2)));
+  emit_insn (gen_uminv16qi3 (min2, min1, shifted2));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted3, min2, min2, GEN_INT (4)));
+  emit_insn (gen_uminv16qi3 (min3, min2, shifted3));
+  emit_insn (gen_altivec_vsldoi_v16qi (shifted4, min3, min3, GEN_INT (8)));
+  emit_insn (gen_uminv16qi3 (operands[0], min3, shifted4));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(set (match_operand:<VI_scalar> 0 "register_operand" "=r")
+        (unspec:VI [(match_operand:<MODE> 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMAX_SCAL))]
+  "TARGET_ALTIVEC"
+{
+  rtx reduc = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_reduc_smax_<mode> (reduc, operands[1]));
+  rs6000_expand_vector_extract (operands[0], reduc, 0);
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(set (match_operand:<VI_scalar> 0 "register_operand" "=r")
+        (unspec:VI [(match_operand:<MODE> 1 "register_operand" "v")]
+                     UNSPEC_REDUC_SMIN_SCAL))]
+  "TARGET_ALTIVEC"
+{
+  rtx reduc = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_reduc_smin_<mode> (reduc, operands[1]));
+  rs6000_expand_vector_extract (operands[0], reduc, 0);
+  DONE;
+})
+
+(define_expand "reduc_umax_scal_<mode>"
+  [(set (match_operand:<VI_scalar> 0 "register_operand" "=r")
+        (unspec:VI [(match_operand:<MODE> 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMAX_SCAL))]
+  "TARGET_ALTIVEC"
+{
+  rtx reduc = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_reduc_umax_<mode> (reduc, operands[1]));
+  rs6000_expand_vector_extract (operands[0], reduc, 0);
+  DONE;
+})
+
+(define_expand "reduc_umin_scal_<mode>"
+  [(set (match_operand:<VI_scalar> 0 "register_operand" "=r")
+        (unspec:VI [(match_operand:<MODE> 1 "register_operand" "v")]
+                     UNSPEC_REDUC_UMIN_SCAL))]
+  "TARGET_ALTIVEC"
+{
+  rtx reduc = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_reduc_umin_<mode> (reduc, operands[1]));
+  rs6000_expand_vector_extract (operands[0], reduc, 0);
+  DONE;
+})
+
 (define_expand "neg<mode>2"
   [(use (match_operand:VI 0 "register_operand" ""))
    (use (match_operand:VI 1 "register_operand" ""))]
Index: gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-char.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-char.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-char.c	(working copy)
@@ -0,0 +1,85 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler-times "vmaxsb" 6 } } */
+/* { dg-final { scan-assembler-times "vminsb" 6 } } */
+/* { dg-final { scan-assembler-times "vmaxub" 6 } } */
+/* { dg-final { scan-assembler-times "vminub" 6 } } */
+/* { dg-final { scan-assembler-times "vsldoi" 16 } } */
+/* { dg-final { scan-assembler-times "stxvd2x" 4 } } */
+/* { dg-final { scan-assembler-times "lbz" 4 } } */
+/* { dg-final { scan-assembler-times "extsb" 2 } } */
+
+/* Test maximum and minimum reduction operations for V16QImode.
+   This code will be unrolled and SLP vectorized.  The numbers
+   of instructions expected are obtained as follows.  There will be
+   two max or min instructions to compare the broadcast init vector
+   with the two source vectors in lc[], followed by four shift-max/min
+   pairs to obtain the maximum value in all positions.  An arbitrary
+   one of these is extracted using stxvd2x and lbz; in the signed
+   case, the lbz is followed by extsb. */
+
+#define N 32
+
+typedef unsigned char T;
+typedef signed char T2;
+
+T 
+testmax (const T *c, T init, T result)
+{
+  T lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum < lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T
+testmin (const T *c, T init, T result)
+{
+  T lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum > lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T2 
+testmax2 (const T2 *c, T2 init, T2 result)
+{
+  T2 lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum < lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T2
+testmin2 (const T2 *c, T2 init, T2 result)
+{
+  T2 lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum > lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
Index: gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-int.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-int.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-int.c	(working copy)
@@ -0,0 +1,85 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler-times "vmaxsw" 4 } } */
+/* { dg-final { scan-assembler-times "vminsw" 4 } } */
+/* { dg-final { scan-assembler-times "vmaxuw" 4 } } */
+/* { dg-final { scan-assembler-times "vminuw" 4 } } */
+/* { dg-final { scan-assembler-times "vsldoi" 8 } } */
+/* { dg-final { scan-assembler-times "stxvd2x" 4 } } */
+/* { dg-final { scan-assembler-times "lwz" 2 } } */
+/* { dg-final { scan-assembler-times "lwa" 2 } } */
+
+/* Test maximum and minimum reduction operations for V4SImode.
+   This code will be unrolled and SLP vectorized.  The numbers
+   of instructions expected are obtained as follows.  There will be
+   two max or min instructions to compare the broadcast init vector
+   with the two source vectors in lc[], followed by two shift-max/min
+   pairs to obtain the maximum value in all positions.  An arbitrary
+   one of these is extracted using stxvd2x and either lwz (unsigned)
+   or lwa (signed). */
+
+#define N 8
+
+typedef unsigned int T;
+typedef signed int T2;
+
+T 
+testmax (const T *c, T init, T result)
+{
+  T lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum < lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T
+testmin (const T *c, T init, T result)
+{
+  T lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum > lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T2 
+testmax2 (const T2 *c, T2 init, T2 result)
+{
+  T2 lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum < lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T2
+testmin2 (const T2 *c, T2 init, T2 result)
+{
+  T2 lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum > lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
Index: gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-long.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-long.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-long.c	(working copy)
@@ -0,0 +1,82 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler-times "vmaxsd" 3 } } */
+/* { dg-final { scan-assembler-times "vminsd" 3 } } */
+/* { dg-final { scan-assembler-times "vmaxud" 3 } } */
+/* { dg-final { scan-assembler-times "vminud" 3 } } */
+/* { dg-final { scan-assembler-times "vsldoi" 4 } } */
+/* { dg-final { scan-assembler-times "mfvsrd" 4 } } */
+
+/* Test maximum and minimum reduction operations for V2DImode.
+   This code will be unrolled and SLP vectorized.  The numbers
+   of instructions expected are obtained as follows.  There will be
+   two max or min instructions to compare the broadcast init vector
+   with the two source vectors in lc[], followed by a shift and
+   max/min to obtain the maximum value in both positions.  An
+   arbitrary one of these is extracted using mfvsrd.  */
+
+#define N 4
+
+typedef unsigned long long T;
+typedef signed long long T2;
+
+T 
+testmax (const T *c, T init, T result)
+{
+  T lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum < lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T
+testmin (const T *c, T init, T result)
+{
+  T lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum > lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T2 
+testmax2 (const T2 *c, T2 init, T2 result)
+{
+  T2 lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum < lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T2
+testmin2 (const T2 *c, T2 init, T2 result)
+{
+  T2 lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum > lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
Index: gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-short.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-short.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vect-reduc-minmax-short.c	(working copy)
@@ -0,0 +1,85 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler-times "vmaxsh" 5 } } */
+/* { dg-final { scan-assembler-times "vminsh" 5 } } */
+/* { dg-final { scan-assembler-times "vmaxuh" 5 } } */
+/* { dg-final { scan-assembler-times "vminuh" 5 } } */
+/* { dg-final { scan-assembler-times "vsldoi" 12 } } */
+/* { dg-final { scan-assembler-times "stxvd2x" 4 } } */
+/* { dg-final { scan-assembler-times "lhz" 2 } } */
+/* { dg-final { scan-assembler-times "lha" 2 } } */
+
+/* Test maximum and minimum reduction operations for V8HImode.
+   This code will be unrolled and SLP vectorized.  The numbers
+   of instructions expected are obtained as follows.  There will be
+   two max or min instructions to compare the broadcast init vector
+   with the two source vectors in lc[], followed by three shift-max/min
+   pairs to obtain the maximum value in all positions.  An arbitrary
+   one of these is extracted using stxvd2x and either lhz (unsigned)
+   or lha (signed).  */
+
+#define N 16
+
+typedef unsigned short T;
+typedef signed short T2;
+
+T 
+testmax (const T *c, T init, T result)
+{
+  T lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum < lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T
+testmin (const T *c, T init, T result)
+{
+  T lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum > lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T2 
+testmax2 (const T2 *c, T2 init, T2 result)
+{
+  T2 lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum < lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}
+
+T2
+testmin2 (const T2 *c, T2 init, T2 result)
+{
+  T2 lc[N], accum = init;
+  int i;
+
+  __builtin_memcpy (lc, c, sizeof(lc));
+
+  for (i = 0; i < N; i++) {
+    accum = accum > lc[i] ? lc[i] : accum;
+  }
+
+  return accum;
+}



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]