This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH]: Machine independent patch, was: Update SSE5 vector multiplication, shift, rotate

From: Michael Meissner <michael dot meissner at amd dot com>
To: Uros Bizjak <ubizjak at gmail dot com>
Cc: gcc-patches at gcc dot gnu dot org, dwarak dot rajagopal at amd dot com, christophe dot harle at amd dot com, hongjiu dot lu at intel dot com, Richard Guenther <rguenther at suse dot de>
Date: Fri, 18 Apr 2008 11:44:55 -0400
Subject: Re: [PATCH]: Machine independent patch, was: Update SSE5 vector multiplication, shift, rotate
References: <20080417185036.GA15776@mmeissner-gold.amd.com> <5787cf470804172258t7caedb73m3b499abf6e57cc67@mail.gmail.com>
On Fri, Apr 18, 2008 at 07:58:38AM +0200, Uros Bizjak wrote:
> On Thu, Apr 17, 2008 at 8:50 PM, Michael Meissner
> <michael.meissner@amd.com> wrote:
> 
> > The following patch updates some of the current SSE5 code patterns to add the
> >  following:
> >
> >  1) Update vector 64-bit integer multiply
> >  2) Update vector 32x32->64-bit integer widening multiply
> >  3) Add support for SSE5 vector/vector shift patterns
> >  4) Add support for vectorizing rotate patterns
> 
> Is it possible to split this patch into machine-independent and
> machine-dependant part? Machine-independent part should be reviewed by
> a middle-end (vectorizer) maintainer, and I will look at
> machine-dependant/testsuite part. It is recommended to mark each part
> of the patch with [PATCH, middle-end] or [PATCH, i386].
> 
> BTW: If I can choose, I would prefer the later part in a unidiff format.

Fair enough.  Here is the machine independent portion of the patch:

2008-04-18  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	
	* optabs.h (OTI_vashl): New optab index for vector shift/rotate by
	vector support.
	(OTI_vlshr): Ditto.
	(OTI_vashr): Ditto.
	(OTI_vrotl): Ditto.
	(OTI_vrotr): Ditto.
	(vashl_optab): New optab for vector shift/rotate by vector
	support.
	(vlshl_optab): Ditto.
	(vashr_optab): Ditto.
	(vrotl_optab): Ditto.
	(vrotr_optab): Ditto.

	* optabs.c (optab_for_tree_code): Add support for vector
	shift/rotate by vector.

	* genopinit.c (optabs): Add vashl, vlshl, vashr, vrotl, vrotr
	optabs.

	* expmed.c (expand_shift): If a machine description has a vashl,
	vlshl, vashr, vrotl, or vrotr optabs, use that for vector shift
	and rotate by a vector instruction.

	* tree-vect-transform.c (vectorizable_operation): If a machine has
	vashl, vlshl, vashr optabs, use that for vector shift by a vector
	operation.  Fall back to looking at ashl, lshl, ashr's second
	operand mode if vashl/vlshl/vashr aren't present to determine if
	the machine has a vector shift by scalar or vector shift by
	vector operation.  Add vector rotate support.

	* tree.def (VLSHIFT_EXPR): New tree code for vector shift/rotate
	by vector.
	(VRSHIFT_EXPR): Ditto.
	(VLROTATE_EXPR): Ditto.
	(VRROTATE_EXPR): Ditto.

	* expr.c (expand_expr_real_1): Support vectorized rotates.

	* doc/c-tree.texi (VLSHIFT_EXPR): New tree code for vector
	shift/rotate by vector.
	(VRSHIFT_EXPR): Ditto.
	(VLROTATE_EXPR): Ditto.
	(VRROTATE_EXPR): Ditto.
	(LROTATE_EXPR): Document missing tree code.
	(RROTATE_EXPR): Ditto.

	* doc/md.texi (vashl<mode>3): Document new standard name for shift
	and rotate of a vector by a vector.
	(vashl<mode>3): Ditto.
	(vlshr<mode>3): Ditto.
	(vrotl<mode>3): Ditto.
	(vrotr<mode>3): Ditto.

--- gcc/optabs.h.~0~	2008-04-17 12:28:06.643070000 -0400
+++ gcc/optabs.h	2008-04-15 16:40:00.462084000 -0400
@@ -167,6 +167,18 @@ enum optab_index
   OTI_rotl,
   /* Rotate right */
   OTI_rotr,
+
+  /* Arithmetic shift left of vector by vector */
+  OTI_vashl,
+  /* Logical shift right of vector by vector */
+  OTI_vlshr,
+  /* Arithmetic shift right of vector by vector */
+  OTI_vashr,
+  /* Rotate left of vector by vector */
+  OTI_vrotl,
+  /* Rotate right of vector by vector */
+  OTI_vrotr,
+
   /* Signed and floating-point minimum value */
   OTI_smin,
   /* Signed and floating-point maximum value */
@@ -412,6 +424,11 @@ extern struct optab optab_table[OTI_MAX]
 #define ashr_optab (&optab_table[OTI_ashr])
 #define rotl_optab (&optab_table[OTI_rotl])
 #define rotr_optab (&optab_table[OTI_rotr])
+#define vashl_optab (&optab_table[OTI_vashl])
+#define vlshr_optab (&optab_table[OTI_vlshr])
+#define vashr_optab (&optab_table[OTI_vashr])
+#define vrotl_optab (&optab_table[OTI_vrotl])
+#define vrotr_optab (&optab_table[OTI_vrotr])
 #define smin_optab (&optab_table[OTI_smin])
 #define smax_optab (&optab_table[OTI_smax])
 #define umin_optab (&optab_table[OTI_umin])
--- gcc/optabs.c.~0~	2008-04-17 12:28:06.594117000 -0400
+++ gcc/optabs.c	2008-04-15 16:40:00.489112000 -0400
@@ -387,6 +387,18 @@ optab_for_tree_code (enum tree_code code
     case RROTATE_EXPR:
       return rotr_optab;
 
+    case VLSHIFT_EXPR:
+      return vashl_optab;
+
+    case VRSHIFT_EXPR:
+      return TYPE_UNSIGNED (type) ? vlshr_optab : vashl_optab;
+
+    case VLROTATE_EXPR:
+      return vrotl_optab;
+
+    case VRROTATE_EXPR:
+      return vrotr_optab;
+
     case MAX_EXPR:
       return TYPE_UNSIGNED (type) ? umax_optab : smax_optab;
 
--- gcc/genopinit.c.~0~	2008-04-17 12:28:06.667044000 -0400
+++ gcc/genopinit.c	2008-04-15 16:40:00.510502000 -0400
@@ -130,6 +130,11 @@ static const char * const optabs[] =
   "optab_handler (lshr_optab, $A)->insn_code = CODE_FOR_$(lshr$a3$)",
   "optab_handler (rotl_optab, $A)->insn_code = CODE_FOR_$(rotl$a3$)",
   "optab_handler (rotr_optab, $A)->insn_code = CODE_FOR_$(rotr$a3$)",
+  "optab_handler (vashr_optab, $A)->insn_code = CODE_FOR_$(vashr$a3$)",
+  "optab_handler (vlshr_optab, $A)->insn_code = CODE_FOR_$(vlshr$a3$)",
+  "optab_handler (vashl_optab, $A)->insn_code = CODE_FOR_$(vashl$a3$)",
+  "optab_handler (vrotl_optab, $A)->insn_code = CODE_FOR_$(vrotl$a3$)",
+  "optab_handler (vrotr_optab, $A)->insn_code = CODE_FOR_$(vrotr$a3$)",
   "optab_handler (smin_optab, $A)->insn_code = CODE_FOR_$(smin$a3$)",
   "optab_handler (smax_optab, $A)->insn_code = CODE_FOR_$(smax$a3$)",
   "optab_handler (umin_optab, $A)->insn_code = CODE_FOR_$(umin$I$a3$)",
--- gcc/expmed.c.~0~	2008-04-17 12:28:06.416295000 -0400
+++ gcc/expmed.c	2008-04-15 16:40:00.531902000 -0400
@@ -2027,6 +2027,9 @@ expand_dec (rtx target, rtx dec)
     emit_move_insn (target, value);
 }
 
+#define optab_handler_valid_p(o, m) \
+  optab_handler(o, m)->insn_code != CODE_FOR_nothing
+
 /* Output a shift instruction for expression code CODE,
    with SHIFTED being the rtx for the value to shift,
    and AMOUNT the tree for the amount to shift by.
@@ -2041,14 +2044,69 @@ expand_shift (enum tree_code code, enum 
   rtx op1, temp = 0;
   int left = (code == LSHIFT_EXPR || code == LROTATE_EXPR);
   int rotate = (code == LROTATE_EXPR || code == RROTATE_EXPR);
+  optab lshift_optab = ashl_optab;
+  optab rshift_arith_optab = ashr_optab;
+  optab rshift_uns_optab = lshr_optab;
+  optab lrotate_optab = rotl_optab;
+  optab rrotate_optab = rotr_optab;
+  enum machine_mode op1_mode;
   int try;
 
+  op1 = expand_normal (amount);
+  op1_mode = GET_MODE (op1);
+
+  /* Determine whether the shift/rotate amount is a vector, or scalar.  If the
+     shift amount is a vector, see if the machine has a separate set of optabs
+     for vector by vector shifts.  Historically, GCC looked at the 2nd
+     operand's type in the shift optab to see what type of shift was
+     supported.  */
+  if (VECTOR_MODE_P (mode) && VECTOR_MODE_P (op1_mode))
+    {
+      enum tree_code new_code = code;
+      optab shift_optab;
+
+      switch (code)
+	{
+	default:
+	  break;
+
+	case LSHIFT_EXPR:
+	  if (optab_handler_valid_p (vashl_optab, op1_mode))
+	    new_code = VLSHIFT_EXPR;
+	  break;
+
+	case RSHIFT_EXPR:
+	  shift_optab = (unsignedp) ? vlshr_optab : vashr_optab;
+	  if (optab_handler_valid_p (shift_optab, op1_mode))
+	    new_code = VRSHIFT_EXPR;
+	  break;
+
+	case LROTATE_EXPR:
+	  if (optab_handler_valid_p (vrotl_optab, op1_mode))
+	    new_code = VLROTATE_EXPR;
+	  break;
+
+	case RROTATE_EXPR:
+	  if (optab_handler_valid_p (vrotr_optab, op1_mode))
+	    new_code = VRROTATE_EXPR;
+	  break;
+	}
+
+      if (code != new_code)
+	{
+	  code = new_code;
+	  lshift_optab = vashl_optab;
+	  rshift_arith_optab = vashr_optab;
+	  rshift_uns_optab = vlshr_optab;
+	  lrotate_optab = vrotl_optab;
+	  rrotate_optab = vrotr_optab;
+	}
+    }
+
   /* Previously detected shift-counts computed by NEGATE_EXPR
      and shifted in the other direction; but that does not work
      on all machines.  */
 
-  op1 = expand_normal (amount);
-
   if (SHIFT_COUNT_TRUNCATED)
     {
       if (GET_CODE (op1) == CONST_INT
@@ -2138,12 +2196,12 @@ expand_shift (enum tree_code code, enum 
 	    }
 
 	  temp = expand_binop (mode,
-			       left ? rotl_optab : rotr_optab,
+			       left ? lrotate_optab : rrotate_optab,
 			       shifted, op1, target, unsignedp, methods);
 	}
       else if (unsignedp)
 	temp = expand_binop (mode,
-			     left ? ashl_optab : lshr_optab,
+			     left ? lshift_optab : rshift_uns_optab,
 			     shifted, op1, target, unsignedp, methods);
 
       /* Do arithmetic shifts.
@@ -2162,7 +2220,7 @@ expand_shift (enum tree_code code, enum 
 	  /* Arithmetic shift */
 
 	  temp = expand_binop (mode,
-			       left ? ashl_optab : ashr_optab,
+			       left ? lshift_optab : rshift_arith_optab,
 			       shifted, op1, target, unsignedp, methods1);
 	}
 
--- gcc/tree-vect-transform.c.~0~	2008-04-17 12:28:06.451265000 -0400
+++ gcc/tree-vect-transform.c	2008-04-15 16:40:01.005950000 -0400
@@ -3830,7 +3830,7 @@ vectorizable_operation (tree stmt, block
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  enum tree_code code;
+  enum tree_code code, alt_code;
   enum machine_mode vec_mode;
   tree new_temp;
   int op_type;
@@ -3850,6 +3850,7 @@ vectorizable_operation (tree stmt, block
   tree vop0, vop1;
   unsigned int k;
   bool scalar_shift_arg = false;
+  bool shift_rotate_p = false;
 
   /* FORNOW: SLP with multiple types is not supported. The SLP analysis verifies
      this, so we can safely override NCOPIES with 1 here.  */
@@ -3923,6 +3924,59 @@ vectorizable_operation (tree stmt, block
 	}
     }
 
+  /* If this is a shift/rotate, determine whether the shift amount is a vector,
+     or scalar.  If the shift/rotate amount is a vector, see if the machine has
+     a separate set of optabs for vector by vector shifts.  Historically, GCC
+     looked at the 2nd operand's type in the shift optab to see what type of
+     shift was supported.  */
+  alt_code = code;
+  switch (code)
+    {
+    default:
+      break;
+
+    case LSHIFT_EXPR:
+      alt_code = VLSHIFT_EXPR;
+      shift_rotate_p = true;
+      break;
+
+    case RSHIFT_EXPR:
+      alt_code = VRSHIFT_EXPR;
+      shift_rotate_p = true;
+      break;
+
+    case LROTATE_EXPR:
+      alt_code = VLROTATE_EXPR;
+      shift_rotate_p = true;
+      break;
+
+    case RROTATE_EXPR:
+      alt_code = VRROTATE_EXPR;
+      shift_rotate_p = true;
+      break;
+    }
+
+  if (shift_rotate_p)
+    {
+      if (dt[1] == vect_loop_def
+	  || (!optab && (dt[1] == vect_constant_def
+			 || dt[1] == vect_invariant_def)))
+	{
+	  struct optab *voptab = optab_for_tree_code (alt_code, vectype);
+
+	  if (voptab
+	      && (optab_handler (voptab, TYPE_MODE (vectype))->insn_code
+		  != CODE_FOR_nothing))
+	    {
+	      if (vect_print_dump_info (REPORT_DETAILS))
+		fprintf (vect_dump, "vector shift/rotate by vector found, mode %s",
+			 GET_MODE_NAME (TYPE_MODE (vectype)));
+
+	      optab = voptab;
+	    }
+	}
+    }
+
   /* Supportable by target?  */
   if (!optab)
     {
@@ -3957,11 +4011,15 @@ vectorizable_operation (tree stmt, block
       return false;
     }
 
-  if (code == LSHIFT_EXPR || code == RSHIFT_EXPR)
+  if (shift_rotate_p)
     {
       /* FORNOW: not yet supported.  */
       if (!VECTOR_MODE_P (vec_mode))
-	return false;
+	{
+	  if (vect_print_dump_info (REPORT_DETAILS))
+	    fprintf (vect_dump, "vec_mode is not a vector type");
+	  return false;
+	}
 
       /* Invariant argument is needed for a vector shift
 	 by a scalar shift operand.  */
@@ -4072,8 +4130,7 @@ vectorizable_operation (tree stmt, block
       /* Handle uses.  */
       if (j == 0)
 	{
-	  if (op_type == binary_op
-	      && (code == LSHIFT_EXPR || code == RSHIFT_EXPR))
+	  if (op_type == binary_op && scalar_shift_arg)
 	    {
 	      /* Vector shl and shr insn patterns can be defined with scalar 
 		 operand 2 (shift operand). In this case, use constant or loop 
--- gcc/tree.def.~0~	2008-04-17 12:28:06.393319000 -0400
+++ gcc/tree.def	2008-04-15 16:40:01.521653000 -0400
@@ -683,6 +683,13 @@ DEFTREECODE (RSHIFT_EXPR, "rshift_expr",
 DEFTREECODE (LROTATE_EXPR, "lrotate_expr", tcc_binary, 2)
 DEFTREECODE (RROTATE_EXPR, "rrotate_expr", tcc_binary, 2)
 
+/* Vector/vector shifts and rotates, where both arguments are vector types.
+   This is only used during the expansion of shifts and rotates.  */
+DEFTREECODE (VLSHIFT_EXPR, "vlshift_expr", tcc_binary, 2)
+DEFTREECODE (VRSHIFT_EXPR, "vrshift_expr", tcc_binary, 2)
+DEFTREECODE (VLROTATE_EXPR, "vlrotate_expr", tcc_binary, 2)
+DEFTREECODE (VRROTATE_EXPR, "vrrotate_expr", tcc_binary, 2)
+
 /* Bitwise operations.  Operands have same mode as result.  */
 DEFTREECODE (BIT_IOR_EXPR, "bit_ior_expr", tcc_binary, 2)
 DEFTREECODE (BIT_XOR_EXPR, "bit_xor_expr", tcc_binary, 2)
--- gcc/expr.c.~0~	2008-04-17 12:28:06.373344000 -0400
+++ gcc/expr.c	2008-04-15 16:47:24.587040000 -0400
@@ -8868,12 +8868,6 @@ expand_expr_real_1 (tree exp, rtx target
 
     case LROTATE_EXPR:
     case RROTATE_EXPR:
-      /* The expansion code only handles expansion of mode precision
-	 rotates.  */
-      gcc_assert (GET_MODE_PRECISION (TYPE_MODE (type))
-		  == TYPE_PRECISION (type));
-
-      /* Falltrough.  */
     case LSHIFT_EXPR:
     case RSHIFT_EXPR:
       /* If this is a fixed-point operation, then we cannot use the code
--- gcc/doc/c-tree.texi.~0~	2008-04-17 12:28:07.309401000 -0400
+++ gcc/doc/c-tree.texi	2008-04-15 16:40:01.666905000 -0400
@@ -1926,6 +1926,12 @@ This macro returns the attributes on the
 @tindex THROW_EXPR
 @tindex LSHIFT_EXPR
 @tindex RSHIFT_EXPR
+@tindex VLSHIFT_EXPR
+@tindex VRSHIFT_EXPR
+@tindex LROTATE_EXPR
+@tindex RROTATE_EXPR
+@tindex VLROTATE_EXPR
+@tindex VRROTATE_EXPR
 @tindex BIT_IOR_EXPR
 @tindex BIT_XOR_EXPR
 @tindex BIT_AND_EXPR
@@ -2300,6 +2306,22 @@ Note that the result is undefined if the
 than or equal to the first operand's type size.
 
 
+@item VLSHIFT_EXPR
+@itemx VRSHIFT_EXPR
+These nodes represent left and right shifts, respectively.
+@code{VLSHIFT_EXPR} and @code{VRSHIFT_EXPR} are used when expanding
+shifts of vector types by the same size vector type to distinguish
+them from shifts of vector types by scalar amounts.
+
+@item LROTATE_EXPR
+@itemx RROTATE_EXPR
+These nodes represent left and right rotates, respectively.
+
+@item VLROTATE_EXPR
+@itemx VRROTATE_EXPR
+These nodes represent left and right rotates of vector types by the
+same size vector type, respectively.
+
 @item BIT_IOR_EXPR
 @itemx BIT_XOR_EXPR
 @itemx BIT_AND_EXPR
--- gcc/doc/md.texi.~0~	2008-04-17 14:44:30.526922000 -0400
+++ gcc/doc/md.texi	2008-04-17 14:43:55.044816000 -0400
@@ -3858,6 +3858,20 @@ counts can optionally be specified by @c
 Other shift and rotate instructions, analogous to the
 @code{ashl@var{m}3} instructions.
 
+@cindex @code{vashl@var{m}3} instruction pattern
+@cindex @code{vashr@var{m}3} instruction pattern
+@cindex @code{vlshr@var{m}3} instruction pattern
+@cindex @code{vrotl@var{m}3} instruction pattern
+@cindex @code{vrotr@var{m}3} instruction pattern
+@item @samp{vashl@var{m}3}, @samp{vashr@var{m}3}, @samp{vlshr@var{m}3}, @samp{vrotl@var{m}3}, @samp{vrotr@var{m}3}
+Vector shift and rotate instructions that take vectors as operand 2 to
+allow a machine that has both a vector shift/rotate by a scalar
+instruction and a separate vector shift/rotate by a vector instruction
+to support both instructions.  If these vector shift instructions are
+not present, the machine will look at the mode of operand 2 of the
+normal shift instruction to determine which type of vector shift is
+supported.
+
 @cindex @code{neg@var{m}2} instruction pattern
 @cindex @code{ssneg@var{m}2} instruction pattern
 @cindex @code{usneg@var{m}2} instruction pattern

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
michael.meissner@amd.com
Follow-Ups:
- Re: [PATCH]: Machine independent patch, was: Update SSE5 vector multiplication, shift, rotate
  - From: Richard Guenther
References:
- [PATCH]: Update SSE5 vector multiplication, shift, rotate
  - From: Michael Meissner
- Re: [PATCH]: Update SSE5 vector multiplication, shift, rotate
  - From: Uros Bizjak
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]