This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, vectorizer]: Take2: Vectorize FP conversions

From: Uros Bizjak <ubizjak at gmail dot com>
To: Uros Bizjak <ubizjak at gmail dot com>
Cc: Dorit Nuzman <DORIT at il dot ibm dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Sat, 21 Apr 2007 13:52:33 +0200
Subject: Re: [PATCH, vectorizer]: Take2: Vectorize FP conversions
References: <5787cf470704200108w9c58fecv6ef136f32815e793@mail.gmail.com>

Uros Bizjak wrote:

Ooops, previous version of the patch was attached. Correct version is now attached.

This is the second revision of the FP conversions patch. In addition
to conversions, this patch renames VEC_PACK_MOD_EXPR (and
corresponding optabs) to VEC_PACK_TRUNC_OPTAB. Following this rename,
vec_pack_trunc_optab also handles FP modes, so we don't have to invent
another trunc optab just for floating modes.

For bonus points, this patch also adds/updates the documentation.

Additionally, we can add vec_pack_sfix_optab to handle i.e. i386
cvtpd2dq (convert V2DFmode into V4SI mode) and vice versa for
vec_pack_float_optab for cvtdq2pd.
I'll add this functionality as a follow-on patch.

Patch was bootstrapped on i686-pc-linux-gnu and regresison tested for
all default languages. OK for mainline?

2007-04-20 Uros Bizjak <ubizjak@gmail.com>

    PR tree-optimization/24659
       * optabs.h (enum optab_index): Add OTI_vec_unpack_hi and
    OTI_vec_unpack_lo.  Rename OTI_vec_pack_mod to OTI_vec_pack_trunc.
    (vec_unpack_hi_optab): Define new macro.
    (vec_unpack_lo_optab): Ditto.
    (vec_pack_trunc_optab): Rename from vec_pack_mod_optab.
    * genopinit.c (optabs): Implement vec_unpack_hi_optab using
    vec_unpack_hi_* patterns.  Implement vec_unpack_lo_optab
    using vec_unpack_lo_* patterns.  Rename vec_pack_mod_optab
    to vec_pack_trunc_optab.
    * tree-vect-transform.c (vectorizable_type_demotion): Do not fail
    early for scalar floating point operands for NOP_EXPR.
    (vectorizable_type_promotion): Ditto.
    * optabs.c (optab_for_tree_code) [VEC_UNPACK_HI_EXPR] Return
    vec_unpack_hi_optab for FLOAT_TYPE type.
    [VEC_UNPACK_HI_EXPR]: Return vec_unpack_lo_optab for
    FLOAT_TYPE type.
    [VEC_PACK_TRUNC_EXPR]: Return vec_pack_trunc_optab.
    (expand_binop): Rename vec_float_trunc_optab to vec_pack_mod_optab.
    (init_optabs): Initialize vec_unpack_hi_optab,
    vec_unpack_lo_optab and vec_pack_trunc_optab.

    * tree.def (VEC_PACK_TRUNC_EXPR): Rename from VEC_PACK_MOD_EXPR.
    * tree-pretty-print.c (dump_generic_node) [VEC_PACK_TRUNC_EXPR]:
    Rename from VEC_PACK_MOD_EXPR.
    (op_prio) [VEC_PACK_TRUNC_EXPR]: Ditto.
    * expr.c (expand_expr_real_1): Ditto.
    * tree-inline.c (estimate_num_insns_1): Ditto.
    * tree-vect-generic.c (expand_vector_operations_1): Ditto.

* config/i386/sse.md (vec_unpack_hi_v4sf): New expander. (vec_unpack_lo_v4sf): Ditto. (vec_pack_trunc_v2df): Ditto. (vec_pack_trunc_v8hi): Rename from vec_pack_mod_v8hi. (vec_pack_trunc_v4si): Rename from vec_pack_mod_v4si. (vec_pack_trunc_v2di): Rename from vec_pack_mod_v2di. * config/rs6000/altivec.md (vec_pack_trunc_v8hi): Rename from vec_pack_mod_v8hi. (vec_pack_trunc_v4si): Rename from vec_pack_mod_v4si.

* doc/c-tree.texi (Expression trees) [VEC_PACK_TRUNC_EXPR]: Rename from VEC_PACK_MOD_EXPR. This expression also represent packing of floating point operands. [VEC_UNPACK_HI_EXPR, VEC_UNPACK_LO_EXPR]: These expression also represent unpacking of floating point operands. * doc/md.texi (Standard Names) [vec_pack_trunc]: Update documentation. [vec_unpack_hi]: Document. [vec_unpack_lo]: Ditto.

testsuite/ChangeLog:

2007-04-19 Uros Bizjak <ubizjak@gmail.com>

    PR tree-optimization/24659
    * gcc.dg/vect/vect-float-extend-1.c: New test.
    * gcc.dg/vect/vect-float-truncate-1.c: New test.

Uros.

Index: gcc/doc/c-tree.texi
===================================================================
--- gcc/doc/c-tree.texi	(revision 123966)
+++ gcc/doc/c-tree.texi	(working copy)
@@ -1983,7 +1983,7 @@
 @tindex VEC_WIDEN_MULT_LO_EXPR
 @tindex VEC_UNPACK_HI_EXPR
 @tindex VEC_UNPACK_LO_EXPR
-@tindex VEC_PACK_MOD_EXPR
+@tindex VEC_PACK_TRUNC_EXPR
 @tindex VEC_PACK_SAT_EXPR
 @tindex VEC_EXTRACT_EVEN_EXPR 
 @tindex VEC_EXTRACT_ODD_EXPR
@@ -2837,23 +2837,30 @@
 
 @item VEC_UNPACK_HI_EXPR
 @item VEC_UNPACK_LO_EXPR
-These nodes represent unpacking of the high and low parts of the input vector, 
+These nodes represent unpacking of the high and low parts of the input vector,
 respectively.  The single operand is a vector that contains @code{N} elements 
-of the same integral type.  The result is a vector that contains half as many 
-elements, of an integral type whose size is twice as wide.  In the case of 
-@code{VEC_UNPACK_HI_EXPR} the high @code{N/2} elements of the vector are 
-extracted and widened (promoted).  In the case of @code{VEC_UNPACK_LO_EXPR} the 
-low @code{N/2} elements of the vector are extracted and widened (promoted).
+of the same integral or floating point type.  The result is a vector
+that contains half as many elements, of an integral or floating point type
+whose size is twice as wide.  In the case of @code{VEC_UNPACK_HI_EXPR} the
+high @code{N/2} elements of the vector are extracted and widened (promoted).
+In the case of @code{VEC_UNPACK_LO_EXPR} the low @code{N/2} elements of the
+vector are extracted and widened (promoted).
 
-@item VEC_PACK_MOD_EXPR
+@item VEC_PACK_TRUNC_EXPR
+This node represent packing of truncated elements of the two input vectors
+into the output vector.  Input operands are vectors that contain the same
+number of elements of the same integral or floating point type.  The result
+is a vector that contains twice as many elements of an integral or floating
+point type whose size is half as wide. The elements of the two vectors are
+demoted and merged (concatenated) to form the output vector.
+
 @item VEC_PACK_SAT_EXPR
-These nodes represent packing of elements of the two input vectors into the
-output vector, using modulo or saturating arithmetic, respectively.
-Their operands are vectors that contain the same number of elements 
-of the same integral type.  The result is a vector that contains twice as many 
-elements, of an integral type whose size is half as wide.  In both cases
-the elements of the two vectors are demoted and merged (concatenated) to form
-the output vector.
+This nodes represent packing of elements of the two input vectors into the
+output vector using saturation.  Input operands are vectors that contain
+the same number of elements of the same integral type.  The result is a
+vector that contains twice as many elements of an integral type whose size
+is half as wide.  The elements of the two vectors are demoted and merged
+(concatenated) to form the output vector.
 
 @item VEC_EXTRACT_EVEN_EXPR
 @item VEC_EXTRACT_ODD_EXPR
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 123966)
+++ gcc/doc/md.texi	(working copy)
@@ -3591,15 +3591,21 @@
 Operand 0 is where the resulting shifted vector is stored.
 The output and input vectors should have the same modes.
 
-@cindex @code{vec_pack_mod_@var{m}} instruction pattern
+@cindex @code{vec_pack_trunc_@var{m}} instruction pattern
+@item @samp{vec_pack_trunc_@var{m}}
+Narrow (demote) and merge the elements of two vectors. Operands 1 and 2
+are vectors of the same mode having N integral or floating point elements
+of size S.  Operand 0 is the resulting vector in which 2*N elements of
+size N/2 are concatenated after narrowing them down using truncation.
+
 @cindex @code{vec_pack_ssat_@var{m}} instruction pattern
 @cindex @code{vec_pack_usat_@var{m}} instruction pattern
-@item @samp{vec_pack_mod_@var{m}}, @samp{vec_pack_ssat_@var{m}}, @samp{vec_pack_usat_@var{m}}
-Narrow (demote) and merge the elements of two vectors.
-Operands 1 and 2 are vectors of the same mode.
+@item @samp{vec_pack_ssat_@var{m}}, @samp{vec_pack_usat_@var{m}}
+Narrow (demote) and merge the elements of two vectors.  Operands 1 and 2
+are vectors of the same mode having N integral elements of size S.
 Operand 0 is the resulting vector in which the elements of the two input
-vectors are concatenated after narrowing them down using modulo arithmetic or
-signed/unsigned saturating arithmetic.
+vectors are concatenated after narrowing them down using signed/unsigned
+saturating arithmetic.
 
 @cindex @code{vec_unpacks_hi_@var{m}} instruction pattern
 @cindex @code{vec_unpacks_lo_@var{m}} instruction pattern
@@ -3607,19 +3613,29 @@
 @cindex @code{vec_unpacku_lo_@var{m}} instruction pattern
 @item @samp{vec_unpacks_hi_@var{m}}, @samp{vec_unpacks_lo_@var{m}}, @samp{vec_unpacku_hi_@var{m}}, @samp{vec_unpacku_lo_@var{m}}
 Extract and widen (promote) the high/low part of a vector of signed/unsigned
-elements. The input vector (operand 1) has N signed/unsigned elements of size S. 
-Using sign/zero extension widen (promote) the high/low elements of the vector,
-and place the resulting N/2 values of size 2*S in the output vector (operand 0).
+integral elements.  The input vector (operand 1) has N signed/unsigned
+elements of size S.  Widen (promote) the high/low elements of the vector
+using sign/zero extension and place the resulting N/2 values of size 2*S in
+the output vector (operand 0).
 
+@cindex @code{vec_unpack_hi_@var{m}} instruction pattern
+@cindex @code{vec_unpack_lo_@var{m}} instruction pattern
+@item @samp{vec_unpack_hi_@var{m}}, @samp{vec_unpack_lo_@var{m}}
+Extract and widen (promote) the high/low part of a vector of floating point
+elements.  The input vector (operand 1) has N floating point elements of
+size S.  Widen (promote) the high/low elements of the vector using
+floating point extension and place the resulting N/2 values of size 2*S in
+the output vector (operand 0).
+
 @cindex @code{vec_widen_umult_hi_@var{m}} instruction pattern
 @cindex @code{vec_widen_umult_lo__@var{m}} instruction pattern
 @cindex @code{vec_widen_smult_hi_@var{m}} instruction pattern
 @cindex @code{vec_widen_smult_lo_@var{m}} instruction pattern
 @item @samp{vec_widen_umult_hi_@var{m}}, @samp{vec_widen_umult_lo_@var{m}}, @samp{vec_widen_smult_hi_@var{m}}, @samp{vec_widen_smult_lo_@var{m}}
-Signed/Unsigned widening multiplication. 
-The two inputs (operands 1 and 2) are vectors with N 
-signed/unsigned elements of size S. Multiply the high/low elements of the two 
-vectors, and put the N/2 products of size 2*S in the output vector (operand 0). 
+Signed/Unsigned widening multiplication.  The two inputs (operands 1 and 2)
+are vectors with N signed/unsigned elements of size S.  Multiply the high/low
+elements of the two vectors, and put the N/2 products of size 2*S in the
+output vector (operand 0).
 
 @cindex @code{mulhisi3} instruction pattern
 @item @samp{mulhisi3}
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 123966)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -1943,8 +1943,8 @@
       pp_string (buffer, " > ");
       break;
 
-    case VEC_PACK_MOD_EXPR:
-      pp_string (buffer, " VEC_PACK_MOD_EXPR < ");
+    case VEC_PACK_TRUNC_EXPR:
+      pp_string (buffer, " VEC_PACK_TRUNC_EXPR < ");
       dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
       pp_string (buffer, ", ");
       dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
@@ -2348,7 +2348,7 @@
     case VEC_RSHIFT_EXPR:
     case VEC_UNPACK_HI_EXPR:
     case VEC_UNPACK_LO_EXPR:
-    case VEC_PACK_MOD_EXPR:
+    case VEC_PACK_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
       return 16;
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 123966)
+++ gcc/optabs.c	(working copy)
@@ -333,19 +333,23 @@
 	vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
 
     case VEC_UNPACK_HI_EXPR:
-      return TYPE_UNSIGNED (type) ? 
-	vec_unpacku_hi_optab : vec_unpacks_hi_optab;
+      return FLOAT_TYPE_P (type)
+	? vec_unpack_hi_optab
+	: TYPE_UNSIGNED (type)
+	? vec_unpacku_hi_optab : vec_unpacks_hi_optab;
 
     case VEC_UNPACK_LO_EXPR:
-      return TYPE_UNSIGNED (type) ? 
+      return FLOAT_TYPE_P (type)
+	? vec_unpack_lo_optab
+	: TYPE_UNSIGNED (type) ? 
 	vec_unpacku_lo_optab : vec_unpacks_lo_optab;
 
-    case VEC_PACK_MOD_EXPR:
-      return vec_pack_mod_optab;
-                                                                                
+    case VEC_PACK_TRUNC_EXPR:
+      return vec_pack_trunc_optab;
+
     case VEC_PACK_SAT_EXPR:
       return TYPE_UNSIGNED (type) ? vec_pack_usat_optab : vec_pack_ssat_optab;
-                                                                                
+
     default:
       break;
     }
@@ -1373,7 +1377,7 @@
 	  && mode1 != VOIDmode)
 	xop1 = copy_to_mode_reg (mode1, xop1);
 
-      if (binoptab == vec_pack_mod_optab 
+      if (binoptab == vec_pack_trunc_optab 
 	  || binoptab == vec_pack_usat_optab
           || binoptab == vec_pack_ssat_optab)
 	{
@@ -5560,7 +5564,9 @@
   vec_unpacks_lo_optab = init_optab (UNKNOWN);
   vec_unpacku_hi_optab = init_optab (UNKNOWN);
   vec_unpacku_lo_optab = init_optab (UNKNOWN);
-  vec_pack_mod_optab = init_optab (UNKNOWN);
+  vec_unpack_hi_optab = init_optab (UNKNOWN);
+  vec_unpack_lo_optab = init_optab (UNKNOWN);
+  vec_pack_trunc_optab = init_optab (UNKNOWN);
   vec_pack_usat_optab = init_optab (UNKNOWN);
   vec_pack_ssat_optab = init_optab (UNKNOWN);
 
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 123966)
+++ gcc/optabs.h	(working copy)
@@ -284,8 +284,12 @@
   OTI_vec_unpacks_lo,
   OTI_vec_unpacku_hi,
   OTI_vec_unpacku_lo,
+  /* Extend and truncate the high/low part of a vector of a floating point 
+     elements.  */
+  OTI_vec_unpack_hi,
+  OTI_vec_unpack_lo,
   /* Narrow (demote) and merge the elements of two vectors.  */
-  OTI_vec_pack_mod,
+  OTI_vec_pack_trunc,
   OTI_vec_pack_usat,
   OTI_vec_pack_ssat,
 
@@ -404,7 +408,7 @@
 #define reduc_umin_optab (optab_table[OTI_reduc_umin])
 #define reduc_splus_optab (optab_table[OTI_reduc_splus])
 #define reduc_uplus_optab (optab_table[OTI_reduc_uplus])
-                                                                                
+
 #define ssum_widen_optab (optab_table[OTI_ssum_widen])
 #define usum_widen_optab (optab_table[OTI_usum_widen])
 #define sdot_prod_optab (optab_table[OTI_sdot_prod])
@@ -425,13 +429,15 @@
 #define vec_widen_smult_hi_optab (optab_table[OTI_vec_widen_smult_hi])
 #define vec_widen_smult_lo_optab (optab_table[OTI_vec_widen_smult_lo])
 #define vec_unpacks_hi_optab (optab_table[OTI_vec_unpacks_hi])
+#define vec_unpacks_lo_optab (optab_table[OTI_vec_unpacks_lo])
 #define vec_unpacku_hi_optab (optab_table[OTI_vec_unpacku_hi])
-#define vec_unpacks_lo_optab (optab_table[OTI_vec_unpacks_lo])
 #define vec_unpacku_lo_optab (optab_table[OTI_vec_unpacku_lo])
-#define vec_pack_mod_optab (optab_table[OTI_vec_pack_mod])
+#define vec_unpack_hi_optab (optab_table[OTI_vec_unpack_hi])
+#define vec_unpack_lo_optab (optab_table[OTI_vec_unpack_lo])
+#define vec_pack_trunc_optab (optab_table[OTI_vec_pack_trunc])
 #define vec_pack_ssat_optab (optab_table[OTI_vec_pack_ssat])
 #define vec_pack_usat_optab (optab_table[OTI_vec_pack_usat])
-                                                                                
+
 #define powi_optab (optab_table[OTI_powi])
 
 /* Conversion optabs have their own table and indexes.  */
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 123966)
+++ gcc/genopinit.c	(working copy)
@@ -229,8 +229,11 @@
   "vec_unpacks_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacks_lo_$a$)",
   "vec_unpacku_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacku_hi_$a$)",
   "vec_unpacku_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacku_lo_$a$)",
-  "vec_pack_mod_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_mod_$a$)",
-  "vec_pack_ssat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_ssat_$a$)",  "vec_pack_usat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_usat_$a$)"
+  "vec_unpack_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpack_hi_$a$)",
+  "vec_unpack_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpack_lo_$a$)",
+  "vec_pack_trunc_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_trunc_$a$)",
+  "vec_pack_ssat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_ssat_$a$)",
+  "vec_pack_usat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_usat_$a$)"
 };
 
 static void gen_insn (rtx);
Index: gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c	(revision 0)
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_double } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int
+main1 ()
+{
+  int i;
+  float fb[N] = {0.4,3.5,6.6,9.4,12.5,15.6,18.4,21.5,24.6,27.4,30.5,33.6,36.4,39.5,42.6,45.4,0.5,3.6,6.4,9.5,12.6,15.4,18.5,21.6,24.4,27.5,30.6,33.4,36.5,39.6,42.4,45.5};
+  double da[N];
+
+  /* float -> double */
+  for (i = 0; i < N; i++)
+    {
+      da[i] = (double) fb[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (da[i] != (double) fb[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c	(revision 0)
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_double } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int
+main1 ()
+{
+  int i;
+  double db[N] = {0.4,3.5,6.6,9.4,12.5,15.6,18.4,21.5,24.6,27.4,30.5,33.6,36.4,39.5,42.6,45.4,0.5,3.6,6.4,9.5,12.6,15.4,18.5,21.6,24.4,27.5,30.6,33.4,36.5,39.6,42.4,45.5};
+  float fa[N];
+
+  /* double -> float */
+  for (i = 0; i < N; i++)
+    {
+      fa[i] = (float) db[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (fa[i] != (float) db[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 123966)
+++ gcc/expr.c	(working copy)
@@ -8926,7 +8926,7 @@
 	return target;
       }
 
-    case VEC_PACK_MOD_EXPR:
+    case VEC_PACK_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
       {
 	mode = TYPE_MODE (TREE_TYPE (TREE_OPERAND (exp, 0)));
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 123966)
+++ gcc/tree.def	(working copy)
@@ -1093,12 +1093,12 @@
 DEFTREECODE (VEC_UNPACK_LO_EXPR, "vec_unpack_lo_expr", tcc_unary, 1)
 
 /* Pack (demote/narrow and merge) the elements of the two input vectors
-   into the output vector, using modulo/saturating arithmetic.
+   into the output vector using truncation/saturation.
    The elements of the input vectors are twice the size of the elements of the
    output vector.  This is used to support type demotion.  */
-DEFTREECODE (VEC_PACK_MOD_EXPR, "vec_pack_mod_expr", tcc_binary, 2)
+DEFTREECODE (VEC_PACK_TRUNC_EXPR, "vec_pack_trunc_expr", tcc_binary, 2)
 DEFTREECODE (VEC_PACK_SAT_EXPR, "vec_pack_sat_expr", tcc_binary, 2)
-                                                                                
+
 /* Extract even/odd fields from vectors.  */
 DEFTREECODE (VEC_EXTRACT_EVEN_EXPR, "vec_extracteven_expr", tcc_binary, 2)
 DEFTREECODE (VEC_EXTRACT_ODD_EXPR, "vec_extractodd_expr", tcc_binary, 2)
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 123966)
+++ gcc/tree-inline.c	(working copy)
@@ -2149,7 +2149,7 @@
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_UNPACK_HI_EXPR:
     case VEC_UNPACK_LO_EXPR:
-    case VEC_PACK_MOD_EXPR:
+    case VEC_PACK_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
 
     case WIDEN_MULT_EXPR:
Index: gcc/tree-vect-transform.c
===================================================================
--- gcc/tree-vect-transform.c	(revision 123966)
+++ gcc/tree-vect-transform.c	(working copy)
@@ -2591,10 +2591,13 @@
   ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
   gcc_assert (ncopies >= 1);
 
-  if (! INTEGRAL_TYPE_P (scalar_type)
-      || !INTEGRAL_TYPE_P (TREE_TYPE (op0)))
+  if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
+	  && INTEGRAL_TYPE_P (TREE_TYPE (op0)))
+	 || (SCALAR_FLOAT_TYPE_P (TREE_TYPE (scalar_dest))
+	     && SCALAR_FLOAT_TYPE_P (TREE_TYPE (op0))
+	     && code == NOP_EXPR)))
     return false;
-                                                                                
+
   /* Check the operands of the operation.  */
   if (!vect_is_simple_use (op0, loop_vinfo, &def_stmt, &def, &dt0))
     {
@@ -2604,11 +2607,11 @@
     }
                                                                                 
   /* Supportable by target?  */
-  code = VEC_PACK_MOD_EXPR;
-  optab = optab_for_tree_code (VEC_PACK_MOD_EXPR, vectype_in);
+  code = VEC_PACK_TRUNC_EXPR;
+  optab = optab_for_tree_code (code, vectype_in);
   if (!optab)
     return false;
-                                                                                
+
   vec_mode = TYPE_MODE (vectype_in);
   if (optab->handlers[(int) vec_mode].insn_code == CODE_FOR_nothing)
     return false;
@@ -2798,8 +2801,11 @@
   if (nunits_out != nunits_in / 2) /* FORNOW */
     return false;
 
-  if (! INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
-      || !INTEGRAL_TYPE_P (TREE_TYPE (op0))) 
+  if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
+	  && INTEGRAL_TYPE_P (TREE_TYPE (op0)))
+	 || (SCALAR_FLOAT_TYPE_P (TREE_TYPE (scalar_dest))
+	     && SCALAR_FLOAT_TYPE_P (TREE_TYPE (op0))
+	     && code == NOP_EXPR)))
     return false;
 
   /* Check the operands of the operation.  */
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 123966)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -421,7 +421,7 @@
       || code == VEC_WIDEN_MULT_LO_EXPR
       || code == VEC_UNPACK_HI_EXPR
       || code == VEC_UNPACK_LO_EXPR
-      || code == VEC_PACK_MOD_EXPR
+      || code == VEC_PACK_TRUNC_EXPR
       || code == VEC_PACK_SAT_EXPR)
     type = TREE_TYPE (TREE_OPERAND (rhs, 0));
 
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 123966)
+++ gcc/config/i386/sse.md	(working copy)
@@ -2150,6 +2150,42 @@
    (set_attr "mode" "V2DF")
    (set_attr "amdfam10_decode" "direct")])
 
+(define_expand "vec_unpack_hi_v4sf"
+  [(set (match_operand:V2DF 0 "register_operand" "")
+	(float_extend:V2DF
+	  (vec_select:V2SF
+	    (match_operand:V4SF 1 "register_operand" "")
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_SSE2"
+{
+  emit_insn (gen_sse_movhlps (operands[1], operands[1], operands[1]));
+})
+
+(define_expand "vec_unpack_lo_v4sf"
+  [(set (match_operand:V2DF 0 "register_operand" "")
+	(float_extend:V2DF
+	  (vec_select:V2SF
+	    (match_operand:V4SF 1 "nonimmediate_operand" "")
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_SSE2")
+
+(define_expand "vec_pack_trunc_v2df"
+  [(match_operand:V4SF 0 "register_operand" "")
+   (match_operand:V2DF 1 "nonimmediate_operand" "")
+   (match_operand:V2DF 2 "nonimmediate_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx r1, r2;
+
+  r1 = gen_reg_rtx (V4SFmode);
+  r2 = gen_reg_rtx (V4SFmode);
+
+  emit_insn (gen_sse2_cvtpd2ps (r1, operands[1]));
+  emit_insn (gen_sse2_cvtpd2ps (r2, operands[2]));
+  emit_insn (gen_sse_movlhps (operands[0], r1, r2));
+  DONE;
+})
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel double-precision floating point element swizzling
@@ -3420,7 +3456,7 @@
 ;;       h3 = aeimquy2bfjnrvz3
 ;;       l3 = cgkosw04dhlptx15
 ;;   result = bdfhjlnprtvxz135
-(define_expand "vec_pack_mod_v8hi"
+(define_expand "vec_pack_trunc_v8hi"
   [(match_operand:V16QI 0 "register_operand" "")
    (match_operand:V8HI 1 "register_operand" "")
    (match_operand:V8HI 2 "register_operand" "")]
@@ -3455,7 +3491,7 @@
 ;;       h2 = aeimbfjn
 ;;       l2 = cgkodhlp
 ;;   result = bdfhjlnp
-(define_expand "vec_pack_mod_v4si"
+(define_expand "vec_pack_trunc_v4si"
   [(match_operand:V8HI 0 "register_operand" "")
    (match_operand:V4SI 1 "register_operand" "")
    (match_operand:V4SI 2 "register_operand" "")]
@@ -3484,7 +3520,7 @@
 ;;      h1 = aebf
 ;;      l1 = cgdh
 ;;  result = bdfh
-(define_expand "vec_pack_mod_v2di"
+(define_expand "vec_pack_trunc_v2di"
   [(match_operand:V4SI 0 "register_operand" "")
    (match_operand:V2DI 1 "register_operand" "")
    (match_operand:V2DI 2 "register_operand" "")]
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 123966)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -2603,7 +2603,7 @@
   DONE;
 }")
 
-(define_expand "vec_pack_mod_v8hi"
+(define_expand "vec_pack_trunc_v8hi"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
         (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
                        (match_operand:V8HI 2 "register_operand" "v")]
@@ -2615,7 +2615,7 @@
   DONE;
 }")
                                                                                 
-(define_expand "vec_pack_mod_v4si"
+(define_expand "vec_pack_trunc_v4si"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
                       (match_operand:V4SI 2 "register_operand" "v")]

Follow-Ups:
- Re: [PATCH, vectorizer]: Take2: Vectorize FP conversions
  - From: Dorit Nuzman

References:
- [PATCH, vectorizer]: Take2: Vectorize FP conversions
  - From: Uros Bizjak

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]