This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Fix PR90510, VEC_PERM -> BIT_INSERT folding


On Sun, 19 May 2019, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > This adds, incrementally ontop of moving VEC_PERM_EXPR folding
> > to match.pd (but not incremental patch - sorry...), folding
> > of single-element insert permutations to BIT_INSERT_EXPR.
> >
> > Things are quite awkward with the new poly-int vec-perm stuff
> > so effectively this doesn't work for SVE and is still very
> > ugly.  I wonder how to make it generic enough so SVE would
> > be happy and / or how to make the code prettier.
> >
> > I also can't find a helper to read a specific vector element
> > from a VECTOR_CST/CONSTRUCTOR (can I even do that "generally"
> > with a poly-int index?!), but there surely must be one.
> 
> Yeah, would be nice to have a helper that handles both VECTOR_CST
> and CONSTRUCTOR, even if just for constant indices.
> 
> CONSTRUCTORs are still very much fixed-length, so it wouldn't really
> make sense to fold a poly_int index at compile time.  The only case we
> could handle is when the index is known to be beyond the last nonzero
> element.
> 
> We could fold some poly_int VECTOR_CST_ELT indices to poly_ints, but
> it'd depend on the values involved.
> 
> Indexing specific poly_int elements of a single vector is a bit dubious
> in length-agnostic code though.  Not saying it'll never be useful, but
> it's certainly not a native SVE operation.  So I think even for SVE,
> constant VECTOR_CST/CONSTRUCTOR elements are the only interesting case
> for now.
> 
> > [...]
> > @@ -11774,61 +11777,7 @@ fold_ternary_loc (location_t loc, enum t
> >  	  poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> >  	  bool single_arg = (op0 == op1);
> >  	  vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
> > -
> > -	  /* Check for cases that fold to OP0 or OP1 in their original
> > -	     element order.  */
> > -	  if (sel.series_p (0, 1, 0, 1))
> > -	    return op0;
> > -	  if (sel.series_p (0, 1, nelts, 1))
> > -	    return op1;
> > -
> > -	  if (!single_arg)
> > -	    {
> > -	      if (sel.all_from_input_p (0))
> > -		op1 = op0;
> > -	      else if (sel.all_from_input_p (1))
> > -		{
> > -		  op0 = op1;
> > -		  sel.rotate_inputs (1);
> > -		}
> > -	    }
> 
> Since this isn't an incremental patch... :-)
> 
> One of the holes of the current code is that we still allow two
> permute indices for the same permutation, e.g.  { 4, 1, 2, 3 } and
> { 0, 5, 6, 7 }.  IMO we should canonicalize it so that the first index
> selects from the first vector.  So maybe the above should be:
> 
> 	  if (!single_arg)
> 	    {
> 	      if (known_ge (sel[0], nunits))
> 		{
> 		  std::swap (op0, op1);
> 		  sel.rotate_inputs (1);
> 		}
> 	      if (sel.all_from_input_p (0))
> 		op1 = op0;
> 	    }
> 
> > [...]
> > +	/* See if the permutation is performing a single element
> > +	   insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
> > +	   in that case.  */
> > +	unsigned HOST_WIDE_INT cnelts;
> > +        if ((TREE_CODE (cop0) == VECTOR_CST
> > +	     || TREE_CODE (cop0) == CONSTRUCTOR
> > +	     || TREE_CODE (cop1) == VECTOR_CST
> > +	     || TREE_CODE (cop1) == CONSTRUCTOR)
> > +	    && nelts.is_constant (&cnelts))
> > +          {
> > +	    unsigned first = 0, first_oo = 0, first_i;
> > +	    unsigned second = 0, second_oo = 0, second_i;
> > +	    HOST_WIDE_INT idx;
> > +	    for (unsigned HOST_WIDE_INT i = 0; i < cnelts; ++i)
> > +	      if (!sel[i].is_constant (&idx))
> > +	        {
> > +		  first = second = 0;
> > +		  break;
> > +		}
> > +	      else if ((unsigned HOST_WIDE_INT)idx < cnelts)
> > +	        {
> > +		  first_i = i;
> > +		  first++;
> > +		  first_oo += (unsigned HOST_WIDE_INT)idx != i;
> > +		}
> > +	      else
> > +	        {
> > +		  second_i = i;
> > +	          second++;
> > +		  second_oo += (unsigned HOST_WIDE_INT)idx != i + cnelts;
> > +		}
> 
> This won't handle the case in which the inserted element comes from
> the same vector.
> 
> If we add the extra canonicalization above, we'd only ever be inserting
> into the second vector at element 0.   The test for that would be:
> 
>    if (sel.series_p (1, 1, nelts + 1, 1))
>      // insert sel[0] into index 0 of the second vector
> 
> I think the SVE-friendly way of doing the check for the first vector
> would be:
> 
>    unsigned int encoded_nelts = sel.encoding ().encoded_nelts ();
>    unsigned int i = 0;
>    for (; i < encoded_nelts; ++i)
>      if (maybe_ne (sel[i], i))
>        break;
>    if (i < encoded_nelts && sel.series_p (i + 1, 1, i + 1, 1))
>      // insert sel[i] into index i of the first vector
> 
> Although that's two searches, one of them will halt very early.
> 
> The SVE-friendly way of getting the VECTOR_CST/CONSTRUCTOR element would be:
> 
>   poly_uint64 idx = sel[i];
>   unsigned HOST_WIDE_INT const_idx;
>   if (known_le (idx, nelts) && idx.is_constant (&const_idx))
>     // const_idx from first vector
>   else if (known_ge (idx, nelts) && (idx - nelts).is_constant (&const_idx))
>     // const_idx from second vector
>   else
>     // bail out
> 
> ...which is a bit ugly, I admit.

So in the end it's all still a bit smaller than before.  I've split
out a fold_read_from_vector routine.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-05-20  Richard Biener  <rguenther@suse.de>

	PR middle-end/90510
	* fold-const.c (fold_read_from_vector): New function.
	* fold-const.h (fold_read_from_vector): Declare.
	* match.pd (VEC_PERM_EXPR): Build BIT_INSERT_EXPRs for
	single-element insert permutations.  Canonicalize selector
	further and fix issue with last commit.

	* gcc.target/i386/pr90510.c: New testcase.

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 271404)
+++ gcc/fold-const.c	(working copy)
@@ -13793,6 +13793,28 @@ fold_read_from_constant_string (tree exp
   return NULL;
 }
 
+/* Folds a read from vector element at IDX of vector ARG.  */
+
+tree
+fold_read_from_vector (tree arg, poly_int64 idx)
+{
+  HOST_WIDE_INT i;
+  if (known_lt (idx, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)))
+      && known_ge (idx, 0)
+      && idx.is_constant (&i))
+    {
+      if (TREE_CODE (arg) == VECTOR_CST)
+	return VECTOR_CST_ELT (arg, i);
+      else if (TREE_CODE (arg) == CONSTRUCTOR)
+	{
+	  if (i >= CONSTRUCTOR_NELTS (arg))
+	    return build_zero_cst (TREE_TYPE (TREE_TYPE (arg)));
+	  return CONSTRUCTOR_ELT (arg, i)->value;
+	}
+    }
+  return NULL_TREE;
+}
+
 /* Return the tree for neg (ARG0) when ARG0 is known to be either
    an integer constant, real, or fixed-point constant.
 
Index: gcc/fold-const.h
===================================================================
--- gcc/fold-const.h	(revision 271404)
+++ gcc/fold-const.h	(working copy)
@@ -100,6 +100,7 @@ extern tree fold_bit_and_mask (tree, tre
 			       tree, enum tree_code, tree, tree,
 			       tree, enum tree_code, tree, tree, tree *);
 extern tree fold_read_from_constant_string (tree);
+extern tree fold_read_from_vector (tree, poly_int64);
 #if GCC_VEC_PERN_INDICES_H
 extern tree fold_vec_perm (tree, tree, tree, const vec_perm_indices &);
 #endif
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	(revision 271404)
+++ gcc/match.pd	(working copy)
@@ -5406,6 +5406,11 @@ (define_operator_list COND_TERNARY
 	       op0 = op1;
 	       sel.rotate_inputs (1);
 	     }
+	   else if (known_ge (sel[0], nelts))
+	     {
+	       std::swap (op0, op1);
+	       sel.rotate_inputs (1);
+	     }
          }
        gassign *def;
        tree cop0 = op0, cop1 = op1;
@@ -5429,9 +5434,47 @@ (define_operator_list COND_TERNARY
      (with
       {
 	bool changed = (op0 == op1 && !single_arg);
+	tree ins = NULL_TREE;
+	unsigned at = 0;
+
+	/* See if the permutation is performing a single element
+	   insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
+	   in that case.  But only if the vector mode is supported,
+	   otherwise this is invalid GIMPLE.  */
+	unsigned HOST_WIDE_INT cnelts;
+        if (TYPE_MODE (type) != BLKmode
+	    && (TREE_CODE (cop0) == VECTOR_CST
+		|| TREE_CODE (cop0) == CONSTRUCTOR
+		|| TREE_CODE (cop1) == VECTOR_CST
+		|| TREE_CODE (cop1) == CONSTRUCTOR))
+          {
+	    if (sel.series_p (1, 1, nelts + 1, 1))
+	      {
+	        /* After canonicalizing the first elt to come from the
+		   first vector we only can insert the first elt from
+		   the first vector.  */
+	        at = 0;
+		ins = fold_read_from_vector (cop0, 0);
+	        op0 = op1;
+	      }
+	    else
+	      {
+	        unsigned int encoded_nelts = sel.encoding ().encoded_nelts ();
+		for (at = 0; at < encoded_nelts; ++at)
+		  if (maybe_ne (sel[at], at))
+		    break;
+		if (at < encoded_nelts && sel.series_p (at + 1, 1, at + 1, 1))
+		  {
+		    if (known_lt (at, nelts))
+		      ins = fold_read_from_vector (cop0, sel[at]);
+		    else
+		      ins = fold_read_from_vector (cop1, sel[at] - nelts);
+		  }
+	      }
+	  }
 
 	/* Generate a canonical form of the selector.  */
-	if (sel.encoding () != builder)
+	if (!ins && sel.encoding () != builder)
 	  {
 	    /* Some targets are deficient and fail to expand a single
 	       argument permutation while still allowing an equivalent
@@ -5450,10 +5493,12 @@ (define_operator_list COND_TERNARY
 		     so use the preferred form.  */
 		  op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel);
 	      }
-	    /* Differences in the encoder do not necessarily mean
-	       differences in the resulting vector.  */
-	    changed = !operand_equal_p (op2, oldop2, 0);
+	    if (!operand_equal_p (op2, oldop2, 0))
+	      changed = true;
 	  }
       }
-      (if (changed)
-       (vec_perm { op0; } { op1; } { op2; })))))))))
+      (if (ins)
+       (bit_insert { op0; } { ins; }
+         { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)))); })
+       (if (changed)
+        (vec_perm { op0; } { op1; } { op2; }))))))))))
Index: gcc/testsuite/gcc.target/i386/pr90510.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr90510.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr90510.c	(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
+
+typedef double __v2df __attribute__ ((__vector_size__ (16)));
+typedef long __v2di __attribute__ ((__vector_size__ (16)));
+
+__v2df
+_mm_add_sd_A (__v2df x, __v2df y)
+{
+  double tem = x[0] + y[0];
+  return __builtin_shuffle ( x, (__v2df) { tem, tem }, (__v2di) { 2, 1 } );
+}
+
+__v2df
+_mm_add_sd_B (__v2df x, __v2df y)
+{
+  __v2df z = { (x[0] + y[0]), x[1] };
+  return z;
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 2 "optimized" } } */
+/* { dg-final { scan-assembler-not "unpck" } } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]