[PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

Richard Biener rguenther@suse.de
Wed Jun 19 09:02:00 GMT 2019


On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
>vec_shl_optab
>has been removed as unused, because we only used vec_shr_optab for the
>reductions.
>Without this patch the vect-simd-*.c tests can be vectorized just fine
>for SSE4 and above, but can't be with SSE2.  As the comment in
>tree-vect-stmts.c tries to explain, for the inclusive scan operation we
>want (when using V8SImode vectors):
>       _30 = MEM <vector(8) int> [(int *)&D.2043];
>       _31 = MEM <vector(8) int> [(int *)&D.2042];
>       _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
>       _33 = _31 + _32;
> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
>       _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
>       _35 = _33 + _34;
>    // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>       //         _31[1]+.._31[4], ... _31[4]+.._31[7] };
>       _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
>       _37 = _35 + _36;
>    // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>       //         _31[0]+.._31[4], ... _31[0]+.._31[7] };
>       _38 = _30 + _37;
>       _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>       MEM <vector(8) int> [(int *)&D.2043] = _39;
>       MEM <vector(8) int> [(int *)&D.2042] = _38;  */
>For V4SImode vectors that would be VEC_PERM_EXPR <x, init, { 4, 0, 1, 2
>}>,
>VEC_PERM_EXPR <x2, init, { 4, 5, 0, 1 }> and
>VEC_PERM_EXPR <x3, init, { 3, 3, 3, 3 }> etc.
>Unfortunately, SSE2 can't do the VEC_PERM_EXPR <x, init, { 4, 0, 1, 2
>}>
>permutation (the other two it can do).  Well, to be precise, it can do
>it
>using the vector left shift which has been removed as unused, provided
>that init is initializer_zerop (shifting all zeros from the left).
>init usually is all zeros, that is the neutral element of additive
>reductions and couple of others too, in the unlikely case that some
>other
>reduction is used with scan (multiplication, minimum, maximum, bitwise
>and),
>we can use a VEC_COND_EXPR with constant first argument, i.e. a blend
>or
>and/or.
>
>So, this patch reintroduces vec_shl_optab (most backends actually have
>those
>patterns already) and handles its expansion and vector generic lowering
>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the first
>operand is initializer_zerop and third operand starts with a few
>numbers
>smaller than number of elements (doesn't matter which one, as all
>elements
>are same - zero) followed by nelts, nelts+1, nelts+2, ...
>Unlike vec_shr_optab which has zero as the second operand, this one has
>it
>as first operand, because VEC_PERM_EXPR canonicalization wants to have
>first element selector smaller than number of elements.  And unlike
>vec_shr_optab, where we also have a fallback in have_whole_vector_shift
>using normal permutations, this one doesn't need it, that "fallback" is
>tried
>first before vec_shl_optab.
>
>For the vec_shl_optab checks, it tests only for constant number of
>elements
>vectors, not really sure if our VECTOR_CST encoding can express the
>left
>shifts in any way nor whether SVE supports those (I see aarch64 has
>vec_shl_insert but that is just a fixed shift by element bits and
>shifts in
>a scalar rather than zeros).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok. 

Richard. 

>2019-06-19  Jakub Jelinek  <jakub@redhat.com>
>
>	* doc/md.texi: Document vec_shl_<mode> pattern.
>	* optabs.def (vec_shl_optab): New optab.
>	* optabs.c (shift_amt_for_vec_perm_mask): Add shift_optab
>	argument, if == vec_shl_optab, check for left whole vector shift
>	pattern rather than right shift.
>	(expand_vec_perm_const): Add vec_shl_optab support.
>	* optabs-query.c (can_vec_perm_var_p): Mention also vec_shl optab
>	in the comment.
>	* tree-vect-generic.c (lower_vec_perm): Support permutations which
>	can be handled by vec_shl_optab.
>	* tree-vect-stmts.c (scan_store_can_perm_p): New function.
>	(check_scan_store): Use it.
>	(vectorizable_scan_store): If target can't do normal permutations,
>	try to use whole vector left shifts and if needed a VEC_COND_EXPR
>	after it.
>	* config/i386/sse.md (vec_shl_<mode>): New expander.
>
>	* gcc.dg/vect/vect-simd-8.c: If main is defined, don't include
>	tree-vect.h nor call check_vect.
>	* gcc.dg/vect/vect-simd-9.c: Likewise.
>	* gcc.dg/vect/vect-simd-10.c: New test.
>	* gcc.target/i386/sse2-vect-simd-8.c: New test.
>	* gcc.target/i386/sse2-vect-simd-9.c: New test.
>	* gcc.target/i386/sse2-vect-simd-10.c: New test.
>	* gcc.target/i386/avx2-vect-simd-8.c: New test.
>	* gcc.target/i386/avx2-vect-simd-9.c: New test.
>	* gcc.target/i386/avx2-vect-simd-10.c: New test.
>	* gcc.target/i386/avx512f-vect-simd-8.c: New test.
>	* gcc.target/i386/avx512f-vect-simd-9.c: New test.
>	* gcc.target/i386/avx512f-vect-simd-10.c: New test.
>
>--- gcc/doc/md.texi.jj	2019-06-13 00:35:43.518942525 +0200
>+++ gcc/doc/md.texi	2019-06-18 15:32:38.496629946 +0200
>@@ -5454,6 +5454,14 @@ in operand 2.  Store the result in vecto
> 0 and 1 have mode @var{m} and operand 2 has the mode appropriate for
> one element of @var{m}.
> 
>+@cindex @code{vec_shl_@var{m}} instruction pattern
>+@item @samp{vec_shl_@var{m}}
>+Whole vector left shift in bits, i.e.@: away from element 0.
>+Operand 1 is a vector to be shifted.
>+Operand 2 is an integer shift amount in bits.
>+Operand 0 is where the resulting shifted vector is stored.
>+The output and input vectors should have the same modes.
>+
> @cindex @code{vec_shr_@var{m}} instruction pattern
> @item @samp{vec_shr_@var{m}}
> Whole vector right shift in bits, i.e.@: towards element 0.
>--- gcc/optabs.def.jj	2019-02-11 11:38:08.263617017 +0100
>+++ gcc/optabs.def	2019-06-18 14:56:57.934971410 +0200
>@@ -348,6 +348,7 @@ OPTAB_D (vec_packu_float_optab, "vec_pac
> OPTAB_D (vec_perm_optab, "vec_perm$a")
> OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
> OPTAB_D (vec_set_optab, "vec_set$a")
>+OPTAB_D (vec_shl_optab, "vec_shl_$a")
> OPTAB_D (vec_shr_optab, "vec_shr_$a")
>OPTAB_D (vec_unpack_sfix_trunc_hi_optab, "vec_unpack_sfix_trunc_hi_$a")
>OPTAB_D (vec_unpack_sfix_trunc_lo_optab, "vec_unpack_sfix_trunc_lo_$a")
>--- gcc/optabs.c.jj	2019-02-13 13:11:47.927612362 +0100
>+++ gcc/optabs.c	2019-06-18 16:45:29.347895585 +0200
>@@ -5444,19 +5444,45 @@ vector_compare_rtx (machine_mode cmp_mod
> }
> 
> /* Check if vec_perm mask SEL is a constant equivalent to a shift of
>-   the first vec_perm operand, assuming the second operand is a
>constant
>-   vector of zeros.  Return the shift distance in bits if so, or
>NULL_RTX
>-   if the vec_perm is not a shift.  MODE is the mode of the value
>being
>-   shifted.  */
>+   the first vec_perm operand, assuming the second operand (for left
>shift
>+   first operand) is a constant vector of zeros.  Return the shift
>distance
>+   in bits if so, or NULL_RTX if the vec_perm is not a shift.  MODE is
>the
>+   mode of the value being shifted.  SHIFT_OPTAB is vec_shr_optab for
>right
>+   shift or vec_shl_optab for left shift.  */
> static rtx
>-shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices
>&sel)
>+shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices
>&sel,
>+			     optab shift_optab)
> {
>   unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
>   poly_int64 first = sel[0];
>   if (maybe_ge (sel[0], GET_MODE_NUNITS (mode)))
>     return NULL_RTX;
> 
>-  if (!sel.series_p (0, 1, first, 1))
>+  if (shift_optab == vec_shl_optab)
>+    {
>+      unsigned int nelt;
>+      if (!GET_MODE_NUNITS (mode).is_constant (&nelt))
>+	return NULL_RTX;
>+      unsigned firstidx = 0;
>+      for (unsigned int i = 0; i < nelt; i++)
>+	{
>+	  if (known_eq (sel[i], nelt))
>+	    {
>+	      if (i == 0 || firstidx)
>+		return NULL_RTX;
>+	      firstidx = i;
>+	    }
>+	  else if (firstidx
>+		   ? maybe_ne (sel[i], nelt + i - firstidx)
>+		   : maybe_ge (sel[i], nelt))
>+	    return NULL_RTX;
>+	}
>+
>+      if (firstidx == 0)
>+	return NULL_RTX;
>+      first = firstidx;
>+    }
>+  else if (!sel.series_p (0, 1, first, 1))
>     {
>       unsigned int nelt;
>       if (!GET_MODE_NUNITS (mode).is_constant (&nelt))
>@@ -5544,25 +5570,37 @@ expand_vec_perm_const (machine_mode mode
>      target instruction.  */
>   vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
> 
>-  /* See if this can be handled with a vec_shr.  We only do this if
>the
>-     second vector is all zeroes.  */
>-  insn_code shift_code = optab_handler (vec_shr_optab, mode);
>-  insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
>-			     ? optab_handler (vec_shr_optab, qimode)
>-			     : CODE_FOR_nothing);
>-
>-  if (v1 == CONST0_RTX (GET_MODE (v1))
>-      && (shift_code != CODE_FOR_nothing
>-	  || shift_code_qi != CODE_FOR_nothing))
>+  /* See if this can be handled with a vec_shr or vec_shl.  We only do
>this
>+     if the second (for vec_shr) or first (for vec_shl) vector is all
>+     zeroes.  */
>+  insn_code shift_code = CODE_FOR_nothing;
>+  insn_code shift_code_qi = CODE_FOR_nothing;
>+  optab shift_optab = unknown_optab;
>+  rtx v2 = v0;
>+  if (v1 == CONST0_RTX (GET_MODE (v1)))
>+    shift_optab = vec_shr_optab;
>+  else if (v0 == CONST0_RTX (GET_MODE (v0)))
>+    {
>+      shift_optab = vec_shl_optab;
>+      v2 = v1;
>+    }
>+  if (shift_optab != unknown_optab)
>+    {
>+      shift_code = optab_handler (shift_optab, mode);
>+      shift_code_qi = ((qimode != VOIDmode && qimode != mode)
>+		       ? optab_handler (shift_optab, qimode)
>+		       : CODE_FOR_nothing);
>+    }
>+  if (shift_code != CODE_FOR_nothing || shift_code_qi !=
>CODE_FOR_nothing)
>     {
>-      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
>+      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices,
>shift_optab);
>       if (shift_amt)
> 	{
> 	  struct expand_operand ops[3];
> 	  if (shift_code != CODE_FOR_nothing)
> 	    {
> 	      create_output_operand (&ops[0], target, mode);
>-	      create_input_operand (&ops[1], v0, mode);
>+	      create_input_operand (&ops[1], v2, mode);
>	      create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
> 	      if (maybe_expand_insn (shift_code, 3, ops))
> 		return ops[0].value;
>@@ -5571,7 +5609,7 @@ expand_vec_perm_const (machine_mode mode
> 	    {
> 	      rtx tmp = gen_reg_rtx (qimode);
> 	      create_output_operand (&ops[0], tmp, qimode);
>-	      create_input_operand (&ops[1], gen_lowpart (qimode, v0),
>qimode);
>+	      create_input_operand (&ops[1], gen_lowpart (qimode, v2),
>qimode);
>	      create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
> 	      if (maybe_expand_insn (shift_code_qi, 3, ops))
> 		return gen_lowpart (mode, ops[0].value);
>--- gcc/optabs-query.c.jj	2019-05-20 11:40:16.691121967 +0200
>+++ gcc/optabs-query.c	2019-06-18 15:26:53.028980804 +0200
>@@ -415,8 +415,9 @@ can_vec_perm_var_p (machine_mode mode)
>    permute (if the target supports that).
> 
> Note that additional permutations representing whole-vector shifts may
>-   also be handled via the vec_shr optab, but only where the second
>input
>-   vector is entirely constant zeroes; this case is not dealt with
>here.  */
>+   also be handled via the vec_shr or vec_shl optab, but only where
>the
>+   second input vector is entirely constant zeroes; this case is not
>dealt
>+   with here.  */
> 
> bool
> can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel,
>--- gcc/tree-vect-generic.c.jj	2019-01-07 09:47:32.988518893 +0100
>+++ gcc/tree-vect-generic.c	2019-06-18 16:35:29.033319526 +0200
>@@ -1367,6 +1367,32 @@ lower_vec_perm (gimple_stmt_iterator *gs
> 	      return;
> 	    }
> 	}
>+      /* And similarly vec_shl pattern.  */
>+      if (optab_handler (vec_shl_optab, TYPE_MODE (vect_type))
>+	  != CODE_FOR_nothing
>+	  && TREE_CODE (vec0) == VECTOR_CST
>+	  && initializer_zerop (vec0))
>+	{
>+	  unsigned int first = 0;
>+	  for (i = 0; i < elements; ++i)
>+	    if (known_eq (poly_uint64 (indices[i]), elements))
>+	      {
>+		if (i == 0 || first)
>+		  break;
>+		first = i;
>+	      }
>+	    else if (first
>+		     ? maybe_ne (poly_uint64 (indices[i]),
>+					      elements + i - first)
>+		     : maybe_ge (poly_uint64 (indices[i]), elements))
>+	      break;
>+	  if (i == elements)
>+	    {
>+	      gimple_assign_set_rhs3 (stmt, mask);
>+	      update_stmt (stmt);
>+	      return;
>+	    }
>+	}
>     }
>   else if (can_vec_perm_var_p (TYPE_MODE (vect_type)))
>     return;
>--- gcc/tree-vect-stmts.c.jj	2019-06-17 23:18:53.620850072 +0200
>+++ gcc/tree-vect-stmts.c	2019-06-18 17:43:27.484350807 +0200
>@@ -6356,6 +6356,71 @@ scan_operand_equal_p (tree ref1, tree re
> 
> /* Function check_scan_store.
> 
>+   Verify if we can perform the needed permutations or whole vector
>shifts.
>+   Return -1 on failure, otherwise exact log2 of vectype's nunits.  */
>+
>+static int
>+scan_store_can_perm_p (tree vectype, tree init, int
>*use_whole_vector_p = NULL)
>+{
>+  enum machine_mode vec_mode = TYPE_MODE (vectype);
>+  unsigned HOST_WIDE_INT nunits;
>+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
>+    return -1;
>+  int units_log2 = exact_log2 (nunits);
>+  if (units_log2 <= 0)
>+    return -1;
>+
>+  int i;
>+  for (i = 0; i <= units_log2; ++i)
>+    {
>+      unsigned HOST_WIDE_INT j, k;
>+      vec_perm_builder sel (nunits, nunits, 1);
>+      sel.quick_grow (nunits);
>+      if (i == 0)
>+	{
>+	  for (j = 0; j < nunits; ++j)
>+	    sel[j] = nunits - 1;
>+	}
>+      else
>+	{
>+	  for (j = 0; j < (HOST_WIDE_INT_1U << (i - 1)); ++j)
>+	    sel[j] = j;
>+	  for (k = 0; j < nunits; ++j, ++k)
>+	    sel[j] = nunits + k;
>+	}
>+      vec_perm_indices indices (sel, i == 0 ? 1 : 2, nunits);
>+      if (!can_vec_perm_const_p (vec_mode, indices))
>+	break;
>+    }
>+
>+  if (i == 0)
>+    return -1;
>+
>+  if (i <= units_log2)
>+    {
>+      if (optab_handler (vec_shl_optab, vec_mode) == CODE_FOR_nothing)
>+	return -1;
>+      int kind = 1;
>+      /* Whole vector shifts shift in zeros, so if init is all zero
>constant,
>+	 there is no need to do anything further.  */
>+      if ((TREE_CODE (init) != INTEGER_CST
>+	   && TREE_CODE (init) != REAL_CST)
>+	  || !initializer_zerop (init))
>+	{
>+	  tree masktype = build_same_sized_truth_vector_type (vectype);
>+	  if (!expand_vec_cond_expr_p (vectype, masktype, VECTOR_CST))
>+	    return -1;
>+	  kind = 2;
>+	}
>+      if (use_whole_vector_p)
>+	*use_whole_vector_p = kind;
>+    }
>+  return units_log2;
>+}
>+
>+
>+/* Function check_scan_store.
>+
> Check magic stores for #pragma omp scan {in,ex}clusive reductions.  */
> 
> static bool
>@@ -6596,34 +6661,9 @@ check_scan_store (stmt_vec_info stmt_inf
>   if (!optab || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
>     goto fail;
> 
>-  unsigned HOST_WIDE_INT nunits;
>-  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
>+  int units_log2 = scan_store_can_perm_p (vectype, *init);
>+  if (units_log2 == -1)
>     goto fail;
>-  int units_log2 = exact_log2 (nunits);
>-  if (units_log2 <= 0)
>-    goto fail;
>-
>-  for (int i = 0; i <= units_log2; ++i)
>-    {
>-      unsigned HOST_WIDE_INT j, k;
>-      vec_perm_builder sel (nunits, nunits, 1);
>-      sel.quick_grow (nunits);
>-      if (i == units_log2)
>-	{
>-	  for (j = 0; j < nunits; ++j)
>-	    sel[j] = nunits - 1;
>-	}
>-      else
>-	{
>-	  for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>-	    sel[j] = nunits + j;
>-	  for (k = 0; j < nunits; ++j, ++k)
>-	    sel[j] = k;
>-	}
>-      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
>-      if (!can_vec_perm_const_p (vec_mode, indices))
>-	goto fail;
>-    }
> 
>   return true;
> }
>@@ -6686,7 +6726,8 @@ vectorizable_scan_store (stmt_vec_info s
>   unsigned HOST_WIDE_INT nunits;
>   if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
>     gcc_unreachable ();
>-  int units_log2 = exact_log2 (nunits);
>+  int use_whole_vector_p = 0;
>+  int units_log2 = scan_store_can_perm_p (vectype, *init,
>&use_whole_vector_p);
>   gcc_assert (units_log2 > 0);
>   auto_vec<tree, 16> perms;
>   perms.quick_grow (units_log2 + 1);
>@@ -6696,21 +6737,25 @@ vectorizable_scan_store (stmt_vec_info s
>       vec_perm_builder sel (nunits, nunits, 1);
>       sel.quick_grow (nunits);
>       if (i == units_log2)
>-	{
>-	  for (j = 0; j < nunits; ++j)
>-	    sel[j] = nunits - 1;
>-	}
>-      else
>-	{
>-	  for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>-	    sel[j] = nunits + j;
>-	  for (k = 0; j < nunits; ++j, ++k)
>-	    sel[j] = k;
>-	}
>+	for (j = 0; j < nunits; ++j)
>+	  sel[j] = nunits - 1;
>+	else
>+	  {
>+	    for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>+	      sel[j] = j;
>+	    for (k = 0; j < nunits; ++j, ++k)
>+	      sel[j] = nunits + k;
>+	  }
>       vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
>-      perms[i] = vect_gen_perm_mask_checked (vectype, indices);
>+      if (use_whole_vector_p && i < units_log2)
>+	perms[i] = vect_gen_perm_mask_any (vectype, indices);
>+      else
>+	perms[i] = vect_gen_perm_mask_checked (vectype, indices);
>     }
> 
>+  tree zero_vec = use_whole_vector_p ? build_zero_cst (vectype) :
>NULL_TREE;
>+  tree masktype = (use_whole_vector_p == 2
>+		   ? build_same_sized_truth_vector_type (vectype) : NULL_TREE);
>   stmt_vec_info prev_stmt_info = NULL;
>   tree vec_oprnd1 = NULL_TREE;
>   tree vec_oprnd2 = NULL_TREE;
>@@ -6742,8 +6787,9 @@ vectorizable_scan_store (stmt_vec_info s
>       for (int i = 0; i < units_log2; ++i)
> 	{
> 	  tree new_temp = make_ssa_name (vectype);
>-	  gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, v,
>-					   vec_oprnd1, perms[i]);
>+	  gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR,
>+					   zero_vec ? zero_vec : vec_oprnd1, v,
>+					   perms[i]);
> 	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> 	  if (prev_stmt_info == NULL)
> 	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
>@@ -6751,6 +6797,25 @@ vectorizable_scan_store (stmt_vec_info s
> 	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> 	  prev_stmt_info = new_stmt_info;
> 
>+	  if (use_whole_vector_p == 2)
>+	    {
>+	      /* Whole vector shift shifted in zero bits, but if *init
>+		 is not initializer_zerop, we need to replace those elements
>+		 with elements from vec_oprnd1.  */
>+	      tree_vector_builder vb (masktype, nunits, 1);
>+	      for (unsigned HOST_WIDE_INT k = 0; k < nunits; ++k)
>+		vb.quick_push (k < (HOST_WIDE_INT_1U << i)
>+			       ? boolean_false_node : boolean_true_node);
>+
>+	      tree new_temp2 = make_ssa_name (vectype);
>+	      g = gimple_build_assign (new_temp2, VEC_COND_EXPR, vb.build (),
>+				       new_temp, vec_oprnd1);
>+	      new_stmt_info = vect_finish_stmt_generation (stmt_info, g,
>gsi);
>+	      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
>+	      prev_stmt_info = new_stmt_info;
>+	      new_temp = new_temp2;
>+	    }
>+
> 	  tree new_temp2 = make_ssa_name (vectype);
> 	  g = gimple_build_assign (new_temp2, code, v, new_temp);
> 	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
>--- gcc/config/i386/sse.md.jj	2019-06-17 23:18:26.821267440 +0200
>+++ gcc/config/i386/sse.md	2019-06-18 15:37:28.342043528 +0200
>@@ -11758,6 +11758,19 @@ (define_insn "<shift_insn><mode>3<mask_n
>    (set_attr "mode" "<sseinsnmode>")])
> 
> 
>+(define_expand "vec_shl_<mode>"
>+  [(set (match_dup 3)
>+	(ashift:V1TI
>+	 (match_operand:VI_128 1 "register_operand")
>+	 (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
>+   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
>+  "TARGET_SSE2"
>+{
>+  operands[1] = gen_lowpart (V1TImode, operands[1]);
>+  operands[3] = gen_reg_rtx (V1TImode);
>+  operands[4] = gen_lowpart (<MODE>mode, operands[3]);
>+})
>+
> (define_expand "vec_shr_<mode>"
>   [(set (match_dup 3)
> 	(lshiftrt:V1TI
>--- gcc/testsuite/gcc.dg/vect/vect-simd-8.c.jj	2019-06-17
>23:18:53.621850057 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-8.c	2019-06-18
>18:02:09.428798006 +0200
>@@ -3,7 +3,9 @@
> /* { dg-additional-options "-mavx" { target avx_runtime } } */
>/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2 "vect"
>{ target i?86-*-* x86_64-*-* } } } */
> 
>+#ifndef main
> #include "tree-vect.h"
>+#endif
> 
> int r, a[1024], b[1024];
> 
>@@ -63,7 +65,9 @@ int
> main ()
> {
>   int s = 0;
>+#ifndef main
>   check_vect ();
>+#endif
>   for (int i = 0; i < 1024; ++i)
>     {
>       a[i] = i;
>--- gcc/testsuite/gcc.dg/vect/vect-simd-9.c.jj	2019-06-17
>23:18:53.621850057 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-9.c	2019-06-18
>18:02:34.649406773 +0200
>@@ -3,7 +3,9 @@
> /* { dg-additional-options "-mavx" { target avx_runtime } } */
>/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2 "vect"
>{ target i?86-*-* x86_64-*-* } } } */
> 
>+#ifndef main
> #include "tree-vect.h"
>+#endif
> 
> int r, a[1024], b[1024];
> 
>@@ -65,7 +67,9 @@ int
> main ()
> {
>   int s = 0;
>+#ifndef main
>   check_vect ();
>+#endif
>   for (int i = 0; i < 1024; ++i)
>     {
>       a[i] = i;
>--- gcc/testsuite/gcc.dg/vect/vect-simd-10.c.jj	2019-06-18
>18:37:30.742838613 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-10.c	2019-06-18
>19:44:20.614082076 +0200
>@@ -0,0 +1,96 @@
>+/* { dg-require-effective-target size32plus } */
>+/* { dg-additional-options "-fopenmp-simd" } */
>+/* { dg-additional-options "-mavx" { target avx_runtime } } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" { target i?86-*-* x86_64-*-* } } } */
>+
>+#ifndef main
>+#include "tree-vect.h"
>+#endif
>+
>+float r = 1.0f, a[1024], b[1024];
>+
>+__attribute__((noipa)) void
>+foo (float *a, float *b)
>+{
>+  #pragma omp simd reduction (inscan, *:r)
>+  for (int i = 0; i < 1024; i++)
>+    {
>+      r *= a[i];
>+      #pragma omp scan inclusive(r)
>+      b[i] = r;
>+    }
>+}
>+
>+__attribute__((noipa)) float
>+bar (void)
>+{
>+  float s = -__builtin_inff ();
>+  #pragma omp simd reduction (inscan, max:s)
>+  for (int i = 0; i < 1024; i++)
>+    {
>+      s = s > a[i] ? s : a[i];
>+      #pragma omp scan inclusive(s)
>+      b[i] = s;
>+    }
>+  return s;
>+}
>+
>+int
>+main ()
>+{
>+  float s = 1.0f;
>+#ifndef main
>+  check_vect ();
>+#endif
>+  for (int i = 0; i < 1024; ++i)
>+    {
>+      if (i < 80)
>+	a[i] = (i & 1) ? 0.25f : 0.5f;
>+      else if (i < 200)
>+	a[i] = (i % 3) == 0 ? 2.0f : (i % 3) == 1 ? 4.0f : 1.0f;
>+      else if (i < 280)
>+	a[i] = (i & 1) ? 0.25f : 0.5f;
>+      else if (i < 380)
>+	a[i] = (i % 3) == 0 ? 2.0f : (i % 3) == 1 ? 4.0f : 1.0f;
>+      else
>+	switch (i % 6)
>+	  {
>+	  case 0: a[i] = 0.25f; break;
>+	  case 1: a[i] = 2.0f; break;
>+	  case 2: a[i] = -1.0f; break;
>+	  case 3: a[i] = -4.0f; break;
>+	  case 4: a[i] = 0.5f; break;
>+	  case 5: a[i] = 1.0f; break;
>+	  default: a[i] = 0.0f; break;
>+	  }
>+      b[i] = -19.0f;
>+      asm ("" : "+g" (i));
>+    }
>+  foo (a, b);
>+  if (r * 16384.0f != 0.125f)
>+    abort ();
>+  float m = -175.25f;
>+  for (int i = 0; i < 1024; ++i)
>+    {
>+      s *= a[i];
>+      if (b[i] != s)
>+	abort ();
>+      else
>+	{
>+	  a[i] = m - ((i % 3) == 1 ? 2.0f : (i % 3) == 2 ? 4.0f : 0.0f);
>+	  b[i] = -231.75f;
>+	  m += 0.75f;
>+	}
>+    }
>+  if (bar () != 592.0f)
>+    abort ();
>+  s = -__builtin_inff ();
>+  for (int i = 0; i < 1024; ++i)
>+    {
>+      if (s < a[i])
>+	s = a[i];
>+      if (b[i] != s)
>+	abort ();
>+    }
>+  return 0;
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-8.c.jj	2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-8.c	2019-06-18
>18:19:48.417341734 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+sse2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-9.c.jj	2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-9.c	2019-06-18
>18:20:05.770072628 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+sse2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-10.c.jj	2019-06-18
>19:46:09.015410603 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-10.c	2019-06-18
>19:50:31.621361409 +0200
>@@ -0,0 +1,15 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+sse2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-8.c.jj	2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-8.c	2019-06-18
>18:19:40.310467451 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+avx2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-9.c.jj	2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-9.c	2019-06-18
>18:19:56.479216712 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+avx2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-10.c.jj	2019-06-18
>19:50:47.692113611 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-10.c	2019-06-18
>19:50:56.180982721 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+avx2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-8.c.jj	2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-8.c	2019-06-18
>18:19:44.364404586 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-9.c.jj	2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-9.c	2019-06-18
>18:20:00.884148400 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-10.c.jj	2019-06-18
>19:51:12.309734025 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-10.c	2019-06-18
>19:51:18.285641883 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+  do_main ();
>+}
>
>	Jakub



More information about the Gcc-patches mailing list