This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH]: Machine independent patch, was: Update SSE5 vector multiplication, shift, rotate
- From: Michael Meissner <michael dot meissner at amd dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>
- Cc: gcc-patches at gcc dot gnu dot org, dwarak dot rajagopal at amd dot com, christophe dot harle at amd dot com, hongjiu dot lu at intel dot com, Richard Guenther <rguenther at suse dot de>
- Date: Fri, 18 Apr 2008 11:44:55 -0400
- Subject: Re: [PATCH]: Machine independent patch, was: Update SSE5 vector multiplication, shift, rotate
- References: <20080417185036.GA15776@mmeissner-gold.amd.com> <5787cf470804172258t7caedb73m3b499abf6e57cc67@mail.gmail.com>
On Fri, Apr 18, 2008 at 07:58:38AM +0200, Uros Bizjak wrote:
> On Thu, Apr 17, 2008 at 8:50 PM, Michael Meissner
> <michael.meissner@amd.com> wrote:
>
> > The following patch updates some of the current SSE5 code patterns to add the
> > following:
> >
> > 1) Update vector 64-bit integer multiply
> > 2) Update vector 32x32->64-bit integer widening multiply
> > 3) Add support for SSE5 vector/vector shift patterns
> > 4) Add support for vectorizing rotate patterns
>
> Is it possible to split this patch into machine-independent and
> machine-dependant part? Machine-independent part should be reviewed by
> a middle-end (vectorizer) maintainer, and I will look at
> machine-dependant/testsuite part. It is recommended to mark each part
> of the patch with [PATCH, middle-end] or [PATCH, i386].
>
> BTW: If I can choose, I would prefer the later part in a unidiff format.
Fair enough. Here is the machine independent portion of the patch:
2008-04-18 Michael Meissner <michael.meissner@amd.com>
Dwarakanath Rajagopal <dwarak.rajagopal@amd.com>
* optabs.h (OTI_vashl): New optab index for vector shift/rotate by
vector support.
(OTI_vlshr): Ditto.
(OTI_vashr): Ditto.
(OTI_vrotl): Ditto.
(OTI_vrotr): Ditto.
(vashl_optab): New optab for vector shift/rotate by vector
support.
(vlshl_optab): Ditto.
(vashr_optab): Ditto.
(vrotl_optab): Ditto.
(vrotr_optab): Ditto.
* optabs.c (optab_for_tree_code): Add support for vector
shift/rotate by vector.
* genopinit.c (optabs): Add vashl, vlshl, vashr, vrotl, vrotr
optabs.
* expmed.c (expand_shift): If a machine description has a vashl,
vlshl, vashr, vrotl, or vrotr optabs, use that for vector shift
and rotate by a vector instruction.
* tree-vect-transform.c (vectorizable_operation): If a machine has
vashl, vlshl, vashr optabs, use that for vector shift by a vector
operation. Fall back to looking at ashl, lshl, ashr's second
operand mode if vashl/vlshl/vashr aren't present to determine if
the machine has a vector shift by scalar or vector shift by
vector operation. Add vector rotate support.
* tree.def (VLSHIFT_EXPR): New tree code for vector shift/rotate
by vector.
(VRSHIFT_EXPR): Ditto.
(VLROTATE_EXPR): Ditto.
(VRROTATE_EXPR): Ditto.
* expr.c (expand_expr_real_1): Support vectorized rotates.
* doc/c-tree.texi (VLSHIFT_EXPR): New tree code for vector
shift/rotate by vector.
(VRSHIFT_EXPR): Ditto.
(VLROTATE_EXPR): Ditto.
(VRROTATE_EXPR): Ditto.
(LROTATE_EXPR): Document missing tree code.
(RROTATE_EXPR): Ditto.
* doc/md.texi (vashl<mode>3): Document new standard name for shift
and rotate of a vector by a vector.
(vashl<mode>3): Ditto.
(vlshr<mode>3): Ditto.
(vrotl<mode>3): Ditto.
(vrotr<mode>3): Ditto.
--- gcc/optabs.h.~0~ 2008-04-17 12:28:06.643070000 -0400
+++ gcc/optabs.h 2008-04-15 16:40:00.462084000 -0400
@@ -167,6 +167,18 @@ enum optab_index
OTI_rotl,
/* Rotate right */
OTI_rotr,
+
+ /* Arithmetic shift left of vector by vector */
+ OTI_vashl,
+ /* Logical shift right of vector by vector */
+ OTI_vlshr,
+ /* Arithmetic shift right of vector by vector */
+ OTI_vashr,
+ /* Rotate left of vector by vector */
+ OTI_vrotl,
+ /* Rotate right of vector by vector */
+ OTI_vrotr,
+
/* Signed and floating-point minimum value */
OTI_smin,
/* Signed and floating-point maximum value */
@@ -412,6 +424,11 @@ extern struct optab optab_table[OTI_MAX]
#define ashr_optab (&optab_table[OTI_ashr])
#define rotl_optab (&optab_table[OTI_rotl])
#define rotr_optab (&optab_table[OTI_rotr])
+#define vashl_optab (&optab_table[OTI_vashl])
+#define vlshr_optab (&optab_table[OTI_vlshr])
+#define vashr_optab (&optab_table[OTI_vashr])
+#define vrotl_optab (&optab_table[OTI_vrotl])
+#define vrotr_optab (&optab_table[OTI_vrotr])
#define smin_optab (&optab_table[OTI_smin])
#define smax_optab (&optab_table[OTI_smax])
#define umin_optab (&optab_table[OTI_umin])
--- gcc/optabs.c.~0~ 2008-04-17 12:28:06.594117000 -0400
+++ gcc/optabs.c 2008-04-15 16:40:00.489112000 -0400
@@ -387,6 +387,18 @@ optab_for_tree_code (enum tree_code code
case RROTATE_EXPR:
return rotr_optab;
+ case VLSHIFT_EXPR:
+ return vashl_optab;
+
+ case VRSHIFT_EXPR:
+ return TYPE_UNSIGNED (type) ? vlshr_optab : vashl_optab;
+
+ case VLROTATE_EXPR:
+ return vrotl_optab;
+
+ case VRROTATE_EXPR:
+ return vrotr_optab;
+
case MAX_EXPR:
return TYPE_UNSIGNED (type) ? umax_optab : smax_optab;
--- gcc/genopinit.c.~0~ 2008-04-17 12:28:06.667044000 -0400
+++ gcc/genopinit.c 2008-04-15 16:40:00.510502000 -0400
@@ -130,6 +130,11 @@ static const char * const optabs[] =
"optab_handler (lshr_optab, $A)->insn_code = CODE_FOR_$(lshr$a3$)",
"optab_handler (rotl_optab, $A)->insn_code = CODE_FOR_$(rotl$a3$)",
"optab_handler (rotr_optab, $A)->insn_code = CODE_FOR_$(rotr$a3$)",
+ "optab_handler (vashr_optab, $A)->insn_code = CODE_FOR_$(vashr$a3$)",
+ "optab_handler (vlshr_optab, $A)->insn_code = CODE_FOR_$(vlshr$a3$)",
+ "optab_handler (vashl_optab, $A)->insn_code = CODE_FOR_$(vashl$a3$)",
+ "optab_handler (vrotl_optab, $A)->insn_code = CODE_FOR_$(vrotl$a3$)",
+ "optab_handler (vrotr_optab, $A)->insn_code = CODE_FOR_$(vrotr$a3$)",
"optab_handler (smin_optab, $A)->insn_code = CODE_FOR_$(smin$a3$)",
"optab_handler (smax_optab, $A)->insn_code = CODE_FOR_$(smax$a3$)",
"optab_handler (umin_optab, $A)->insn_code = CODE_FOR_$(umin$I$a3$)",
--- gcc/expmed.c.~0~ 2008-04-17 12:28:06.416295000 -0400
+++ gcc/expmed.c 2008-04-15 16:40:00.531902000 -0400
@@ -2027,6 +2027,9 @@ expand_dec (rtx target, rtx dec)
emit_move_insn (target, value);
}
+#define optab_handler_valid_p(o, m) \
+ optab_handler(o, m)->insn_code != CODE_FOR_nothing
+
/* Output a shift instruction for expression code CODE,
with SHIFTED being the rtx for the value to shift,
and AMOUNT the tree for the amount to shift by.
@@ -2041,14 +2044,69 @@ expand_shift (enum tree_code code, enum
rtx op1, temp = 0;
int left = (code == LSHIFT_EXPR || code == LROTATE_EXPR);
int rotate = (code == LROTATE_EXPR || code == RROTATE_EXPR);
+ optab lshift_optab = ashl_optab;
+ optab rshift_arith_optab = ashr_optab;
+ optab rshift_uns_optab = lshr_optab;
+ optab lrotate_optab = rotl_optab;
+ optab rrotate_optab = rotr_optab;
+ enum machine_mode op1_mode;
int try;
+ op1 = expand_normal (amount);
+ op1_mode = GET_MODE (op1);
+
+ /* Determine whether the shift/rotate amount is a vector, or scalar. If the
+ shift amount is a vector, see if the machine has a separate set of optabs
+ for vector by vector shifts. Historically, GCC looked at the 2nd
+ operand's type in the shift optab to see what type of shift was
+ supported. */
+ if (VECTOR_MODE_P (mode) && VECTOR_MODE_P (op1_mode))
+ {
+ enum tree_code new_code = code;
+ optab shift_optab;
+
+ switch (code)
+ {
+ default:
+ break;
+
+ case LSHIFT_EXPR:
+ if (optab_handler_valid_p (vashl_optab, op1_mode))
+ new_code = VLSHIFT_EXPR;
+ break;
+
+ case RSHIFT_EXPR:
+ shift_optab = (unsignedp) ? vlshr_optab : vashr_optab;
+ if (optab_handler_valid_p (shift_optab, op1_mode))
+ new_code = VRSHIFT_EXPR;
+ break;
+
+ case LROTATE_EXPR:
+ if (optab_handler_valid_p (vrotl_optab, op1_mode))
+ new_code = VLROTATE_EXPR;
+ break;
+
+ case RROTATE_EXPR:
+ if (optab_handler_valid_p (vrotr_optab, op1_mode))
+ new_code = VRROTATE_EXPR;
+ break;
+ }
+
+ if (code != new_code)
+ {
+ code = new_code;
+ lshift_optab = vashl_optab;
+ rshift_arith_optab = vashr_optab;
+ rshift_uns_optab = vlshr_optab;
+ lrotate_optab = vrotl_optab;
+ rrotate_optab = vrotr_optab;
+ }
+ }
+
/* Previously detected shift-counts computed by NEGATE_EXPR
and shifted in the other direction; but that does not work
on all machines. */
- op1 = expand_normal (amount);
-
if (SHIFT_COUNT_TRUNCATED)
{
if (GET_CODE (op1) == CONST_INT
@@ -2138,12 +2196,12 @@ expand_shift (enum tree_code code, enum
}
temp = expand_binop (mode,
- left ? rotl_optab : rotr_optab,
+ left ? lrotate_optab : rrotate_optab,
shifted, op1, target, unsignedp, methods);
}
else if (unsignedp)
temp = expand_binop (mode,
- left ? ashl_optab : lshr_optab,
+ left ? lshift_optab : rshift_uns_optab,
shifted, op1, target, unsignedp, methods);
/* Do arithmetic shifts.
@@ -2162,7 +2220,7 @@ expand_shift (enum tree_code code, enum
/* Arithmetic shift */
temp = expand_binop (mode,
- left ? ashl_optab : ashr_optab,
+ left ? lshift_optab : rshift_arith_optab,
shifted, op1, target, unsignedp, methods1);
}
--- gcc/tree-vect-transform.c.~0~ 2008-04-17 12:28:06.451265000 -0400
+++ gcc/tree-vect-transform.c 2008-04-15 16:40:01.005950000 -0400
@@ -3830,7 +3830,7 @@ vectorizable_operation (tree stmt, block
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
- enum tree_code code;
+ enum tree_code code, alt_code;
enum machine_mode vec_mode;
tree new_temp;
int op_type;
@@ -3850,6 +3850,7 @@ vectorizable_operation (tree stmt, block
tree vop0, vop1;
unsigned int k;
bool scalar_shift_arg = false;
+ bool shift_rotate_p = false;
/* FORNOW: SLP with multiple types is not supported. The SLP analysis verifies
this, so we can safely override NCOPIES with 1 here. */
@@ -3923,6 +3924,59 @@ vectorizable_operation (tree stmt, block
}
}
+ /* If this is a shift/rotate, determine whether the shift amount is a vector,
+ or scalar. If the shift/rotate amount is a vector, see if the machine has
+ a separate set of optabs for vector by vector shifts. Historically, GCC
+ looked at the 2nd operand's type in the shift optab to see what type of
+ shift was supported. */
+ alt_code = code;
+ switch (code)
+ {
+ default:
+ break;
+
+ case LSHIFT_EXPR:
+ alt_code = VLSHIFT_EXPR;
+ shift_rotate_p = true;
+ break;
+
+ case RSHIFT_EXPR:
+ alt_code = VRSHIFT_EXPR;
+ shift_rotate_p = true;
+ break;
+
+ case LROTATE_EXPR:
+ alt_code = VLROTATE_EXPR;
+ shift_rotate_p = true;
+ break;
+
+ case RROTATE_EXPR:
+ alt_code = VRROTATE_EXPR;
+ shift_rotate_p = true;
+ break;
+ }
+
+ if (shift_rotate_p)
+ {
+ if (dt[1] == vect_loop_def
+ || (!optab && (dt[1] == vect_constant_def
+ || dt[1] == vect_invariant_def)))
+ {
+ struct optab *voptab = optab_for_tree_code (alt_code, vectype);
+
+ if (voptab
+ && (optab_handler (voptab, TYPE_MODE (vectype))->insn_code
+ != CODE_FOR_nothing))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "vector shift/rotate by vector found, mode %s",
+ GET_MODE_NAME (TYPE_MODE (vectype)));
+
+ optab = voptab;
+ }
+ }
+ }
+
/* Supportable by target? */
if (!optab)
{
@@ -3957,11 +4011,15 @@ vectorizable_operation (tree stmt, block
return false;
}
- if (code == LSHIFT_EXPR || code == RSHIFT_EXPR)
+ if (shift_rotate_p)
{
/* FORNOW: not yet supported. */
if (!VECTOR_MODE_P (vec_mode))
- return false;
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "vec_mode is not a vector type");
+ return false;
+ }
/* Invariant argument is needed for a vector shift
by a scalar shift operand. */
@@ -4072,8 +4130,7 @@ vectorizable_operation (tree stmt, block
/* Handle uses. */
if (j == 0)
{
- if (op_type == binary_op
- && (code == LSHIFT_EXPR || code == RSHIFT_EXPR))
+ if (op_type == binary_op && scalar_shift_arg)
{
/* Vector shl and shr insn patterns can be defined with scalar
operand 2 (shift operand). In this case, use constant or loop
--- gcc/tree.def.~0~ 2008-04-17 12:28:06.393319000 -0400
+++ gcc/tree.def 2008-04-15 16:40:01.521653000 -0400
@@ -683,6 +683,13 @@ DEFTREECODE (RSHIFT_EXPR, "rshift_expr",
DEFTREECODE (LROTATE_EXPR, "lrotate_expr", tcc_binary, 2)
DEFTREECODE (RROTATE_EXPR, "rrotate_expr", tcc_binary, 2)
+/* Vector/vector shifts and rotates, where both arguments are vector types.
+ This is only used during the expansion of shifts and rotates. */
+DEFTREECODE (VLSHIFT_EXPR, "vlshift_expr", tcc_binary, 2)
+DEFTREECODE (VRSHIFT_EXPR, "vrshift_expr", tcc_binary, 2)
+DEFTREECODE (VLROTATE_EXPR, "vlrotate_expr", tcc_binary, 2)
+DEFTREECODE (VRROTATE_EXPR, "vrrotate_expr", tcc_binary, 2)
+
/* Bitwise operations. Operands have same mode as result. */
DEFTREECODE (BIT_IOR_EXPR, "bit_ior_expr", tcc_binary, 2)
DEFTREECODE (BIT_XOR_EXPR, "bit_xor_expr", tcc_binary, 2)
--- gcc/expr.c.~0~ 2008-04-17 12:28:06.373344000 -0400
+++ gcc/expr.c 2008-04-15 16:47:24.587040000 -0400
@@ -8868,12 +8868,6 @@ expand_expr_real_1 (tree exp, rtx target
case LROTATE_EXPR:
case RROTATE_EXPR:
- /* The expansion code only handles expansion of mode precision
- rotates. */
- gcc_assert (GET_MODE_PRECISION (TYPE_MODE (type))
- == TYPE_PRECISION (type));
-
- /* Falltrough. */
case LSHIFT_EXPR:
case RSHIFT_EXPR:
/* If this is a fixed-point operation, then we cannot use the code
--- gcc/doc/c-tree.texi.~0~ 2008-04-17 12:28:07.309401000 -0400
+++ gcc/doc/c-tree.texi 2008-04-15 16:40:01.666905000 -0400
@@ -1926,6 +1926,12 @@ This macro returns the attributes on the
@tindex THROW_EXPR
@tindex LSHIFT_EXPR
@tindex RSHIFT_EXPR
+@tindex VLSHIFT_EXPR
+@tindex VRSHIFT_EXPR
+@tindex LROTATE_EXPR
+@tindex RROTATE_EXPR
+@tindex VLROTATE_EXPR
+@tindex VRROTATE_EXPR
@tindex BIT_IOR_EXPR
@tindex BIT_XOR_EXPR
@tindex BIT_AND_EXPR
@@ -2300,6 +2306,22 @@ Note that the result is undefined if the
than or equal to the first operand's type size.
+@item VLSHIFT_EXPR
+@itemx VRSHIFT_EXPR
+These nodes represent left and right shifts, respectively.
+@code{VLSHIFT_EXPR} and @code{VRSHIFT_EXPR} are used when expanding
+shifts of vector types by the same size vector type to distinguish
+them from shifts of vector types by scalar amounts.
+
+@item LROTATE_EXPR
+@itemx RROTATE_EXPR
+These nodes represent left and right rotates, respectively.
+
+@item VLROTATE_EXPR
+@itemx VRROTATE_EXPR
+These nodes represent left and right rotates of vector types by the
+same size vector type, respectively.
+
@item BIT_IOR_EXPR
@itemx BIT_XOR_EXPR
@itemx BIT_AND_EXPR
--- gcc/doc/md.texi.~0~ 2008-04-17 14:44:30.526922000 -0400
+++ gcc/doc/md.texi 2008-04-17 14:43:55.044816000 -0400
@@ -3858,6 +3858,20 @@ counts can optionally be specified by @c
Other shift and rotate instructions, analogous to the
@code{ashl@var{m}3} instructions.
+@cindex @code{vashl@var{m}3} instruction pattern
+@cindex @code{vashr@var{m}3} instruction pattern
+@cindex @code{vlshr@var{m}3} instruction pattern
+@cindex @code{vrotl@var{m}3} instruction pattern
+@cindex @code{vrotr@var{m}3} instruction pattern
+@item @samp{vashl@var{m}3}, @samp{vashr@var{m}3}, @samp{vlshr@var{m}3}, @samp{vrotl@var{m}3}, @samp{vrotr@var{m}3}
+Vector shift and rotate instructions that take vectors as operand 2 to
+allow a machine that has both a vector shift/rotate by a scalar
+instruction and a separate vector shift/rotate by a vector instruction
+to support both instructions. If these vector shift instructions are
+not present, the machine will look at the mode of operand 2 of the
+normal shift instruction to determine which type of vector shift is
+supported.
+
@cindex @code{neg@var{m}2} instruction pattern
@cindex @code{ssneg@var{m}2} instruction pattern
@cindex @code{usneg@var{m}2} instruction pattern
--
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
michael.meissner@amd.com