This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix up VEC_INTERLEAVE_*_EXPR folding and expansion for big endian (PR tree-optimization/51074)


On Thu, 1 Dec 2011, Jakub Jelinek wrote:

> On Thu, Dec 01, 2011 at 07:57:48AM -0800, Richard Henderson wrote:
> > On 12/01/2011 03:21 AM, Richard Guenther wrote:
> > > Yes, sorry - I'm recovering from a 3 week e-mail lag ;)  I agree
> > > using VEC_PERM_EXPR would be best - but that would also affect
> > > backend patterns.  Can we have a middle-ground that leaves those
> > > untouched?  We're still in stage 3, so fixing the bug with using
> > > VEC_PERM_EXPR sounds appealing to me ;)
> > 
> > If we agree that we want to fix this with vec_perm_expr, then we need a
> > relatively minor patch to the vectorizer, and cleanups in the targets.
> > 
> > In particular, powerpc, spu, and ia64 will need to recognize various
> > constant permutations so that they can continue using the specialized
> > instructions for interleave.  This shouldn't be particularly difficult; a
> > few testcases added to make sure we don't regress to full permutation
> > wouldn't be amiss.
> > 
> > The x86 port is the only one that really does aggressive constant
> > permutation pattern recognition atm.  That is, of course, because the ISA
> > support for permutation there is all over the map and we had no choice.
> > 
> > I've already zapped the target patterns that expanded interleave/even_odd
> > back into a permuation operation.
> > 
> > If we think this is ok for stage3, we can certainly give it a whack.  I'll
> > take care of the backends if Jakub takes care of the vectorizer?
> 
> Here is the vectorizer part (untested so far) + some small i386 tweaks.
> This patch as is regresses code quality for powerpc/ia64/sparc/mips
> (I don't think spu has vec_interleave* patterns in *.md).
> 
> If it works out, I guess we could also zap VEC_EXTRACT_{EVEN,ODD}_EXPR
> similarly.

Yeah, I think it's a good cleaup opportunity.

Thanks,
Richard.

> 2011-12-01  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* tree.def (VEC_INTERLEAVE_HIGH_EXPR, VEC_INTERLEAVE_LOW_EXPR): Remove.
> 	* gimple-pretty-print.c (dump_binary_rhs): Don't handle
> 	VEC_INTERLEAVE_HIGH_EXPR and VEC_INTERLEAVE_LOW_EXPR.
> 	* expr.c (expand_expr_real_2): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_binary): Likewise.
> 	* cfgexpand.c (expand_debug_expr): Likewise.
> 	* tree-inline.c (estimate_operator_cost): Likewise.
> 	* tree-pretty-print.c (dump_generic_node): Likewise.
> 	* tree-vect-generic.c (expand_vector_operations_1): Likewise.
> 	* fold-const.c (fold_binary_loc): Likewise.
> 	* doc/generic.texi (VEC_INTERLEAVE_HIGH_EXPR,
> 	VEC_INTERLEAVE_LOW_EXPR): Remove documentation.
> 	* optabs.c (optab_for_tree_code): Don't handle
> 	VEC_INTERLEAVE_HIGH_EXPR and VEC_INTERLEAVE_LOW_EXPR.
> 	(expand_binop, init_optabs): Remove vec_interleave_high_optab
> 	and vec_interleave_low_optab.
> 	* genopinit.c (optabs): Likewise.
> 	* optabs.h (OTI_vec_interleave_high, OTI_vec_interleave_low): Remove.
> 	(vec_interleave_high_optab, vec_interleave_low_optab): Remove.
> 	* doc/md.texi (vec_interleave_high, vec_interleave_low): Remove
> 	documentation.
> 	* tree-vect-stmts.c (gen_perm_mask): Renamed to...
> 	(vect_gen_perm_mask): ... this.  No longer static.
> 	(perm_mask_for_reverse, vectorizable_load): Adjust callers.
> 	* tree-vectorizer.h (vect_gen_perm_mask): New prototype.
> 	* tree-vect-data-refs.c (vect_strided_store_supported): Don't try
> 	VEC_INTERLEAVE_*_EXPR, use can_vec_perm_p instead of
> 	can_vec_perm_for_code_p.
> 	(vect_permute_store_chain): Generate VEC_PERM_EXPR with interleaving
> 	masks instead of VEC_INTERLEAVE_HIGH_EXPR and VEC_INTERLEAVE_LOW_EXPR.
> 	* config/i386/i386.c (expand_vec_perm_interleave2): If
> 	expand_vec_perm_interleave3 would handle it, return false.
> 	(expand_vec_perm_broadcast_1): Don't use vec_interleave_*_optab.
> 
> --- gcc/tree.def.jj	2011-12-01 11:44:55.000000000 +0100
> +++ gcc/tree.def	2011-12-01 13:37:32.071771156 +0100
> @@ -1192,10 +1192,6 @@ DEFTREECODE (VEC_PACK_FIX_TRUNC_EXPR, "v
>  DEFTREECODE (VEC_EXTRACT_EVEN_EXPR, "vec_extract_even_expr", tcc_binary, 2)
>  DEFTREECODE (VEC_EXTRACT_ODD_EXPR, "vec_extract_odd_expr", tcc_binary, 2)
>  
> -/* Merge input vectors interleaving their fields.  */
> -DEFTREECODE (VEC_INTERLEAVE_HIGH_EXPR, "vec_interleave_high_expr", tcc_binary, 2)
> -DEFTREECODE (VEC_INTERLEAVE_LOW_EXPR, "vec_interleave_low_expr", tcc_binary, 2)
> -
>  /* Widening vector shift left in bits.
>     Operand 0 is a vector to be shifted with N elements of size S.
>     Operand 1 is an integer shift amount in bits.
> --- gcc/gimple-pretty-print.c.jj	2011-12-01 11:44:54.000000000 +0100
> +++ gcc/gimple-pretty-print.c	2011-12-01 13:39:26.611099281 +0100
> @@ -347,8 +347,6 @@ dump_binary_rhs (pretty_printer *buffer,
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_EXTRACT_EVEN_EXPR:
>      case VEC_EXTRACT_ODD_EXPR:
> -    case VEC_INTERLEAVE_HIGH_EXPR:
> -    case VEC_INTERLEAVE_LOW_EXPR:
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>        for (p = tree_code_name [(int) code]; *p; p++)
> --- gcc/expr.c.jj	2011-12-01 11:44:53.000000000 +0100
> +++ gcc/expr.c	2011-12-01 13:38:24.887461805 +0100
> @@ -8668,8 +8668,6 @@ expand_expr_real_2 (sepops ops, rtx targ
>  
>      case VEC_EXTRACT_EVEN_EXPR:
>      case VEC_EXTRACT_ODD_EXPR:
> -    case VEC_INTERLEAVE_HIGH_EXPR:
> -    case VEC_INTERLEAVE_LOW_EXPR:
>        goto binop;
>  
>      case VEC_LSHIFT_EXPR:
> --- gcc/tree-cfg.c.jj	2011-12-01 11:44:58.000000000 +0100
> +++ gcc/tree-cfg.c	2011-12-01 13:59:00.162192709 +0100
> @@ -3703,8 +3703,6 @@ do_pointer_plus_expr_check:
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_EXTRACT_EVEN_EXPR:
>      case VEC_EXTRACT_ODD_EXPR:
> -    case VEC_INTERLEAVE_HIGH_EXPR:
> -    case VEC_INTERLEAVE_LOW_EXPR:
>        /* FIXME.  */
>        return false;
>  
> --- gcc/cfgexpand.c.jj	2011-12-01 12:37:57.000000000 +0100
> +++ gcc/cfgexpand.c	2011-12-01 13:38:04.380581793 +0100
> @@ -3448,8 +3448,6 @@ expand_debug_expr (tree exp)
>      case VEC_COND_EXPR:
>      case VEC_EXTRACT_EVEN_EXPR:
>      case VEC_EXTRACT_ODD_EXPR:
> -    case VEC_INTERLEAVE_HIGH_EXPR:
> -    case VEC_INTERLEAVE_LOW_EXPR:
>      case VEC_LSHIFT_EXPR:
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_PACK_SAT_EXPR:
> --- gcc/tree-inline.c.jj	2011-12-01 11:44:57.000000000 +0100
> +++ gcc/tree-inline.c	2011-12-01 13:59:12.573120076 +0100
> @@ -3401,8 +3401,6 @@ estimate_operator_cost (enum tree_code c
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_EXTRACT_EVEN_EXPR:
>      case VEC_EXTRACT_ODD_EXPR:
> -    case VEC_INTERLEAVE_HIGH_EXPR:
> -    case VEC_INTERLEAVE_LOW_EXPR:
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>  
> --- gcc/tree-pretty-print.c.jj	2011-12-01 11:44:53.000000000 +0100
> +++ gcc/tree-pretty-print.c	2011-12-01 13:59:40.219957523 +0100
> @@ -2404,22 +2404,6 @@ dump_generic_node (pretty_printer *buffe
>        pp_string (buffer, " > ");
>        break;
>  
> -    case VEC_INTERLEAVE_HIGH_EXPR:
> -      pp_string (buffer, " VEC_INTERLEAVE_HIGH_EXPR < ");
> -      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
> -      pp_string (buffer, ", ");
> -      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
> -      pp_string (buffer, " > ");
> -      break;
> -
> -    case VEC_INTERLEAVE_LOW_EXPR:
> -      pp_string (buffer, " VEC_INTERLEAVE_LOW_EXPR < ");
> -      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
> -      pp_string (buffer, ", ");
> -      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
> -      pp_string (buffer, " > ");
> -      break;
> -
>      default:
>        NIY;
>      }
> --- gcc/tree-vect-generic.c.jj	2011-12-01 11:44:58.000000000 +0100
> +++ gcc/tree-vect-generic.c	2011-12-01 14:05:29.685879992 +0100
> @@ -776,9 +776,7 @@ expand_vector_operations_1 (gimple_stmt_
>    /* These are only created by the vectorizer, after having queried
>       the target support.  It's more than just looking at the optab,
>       and there's no need to do it again.  */
> -  if (code == VEC_INTERLEAVE_HIGH_EXPR
> -      || code == VEC_INTERLEAVE_LOW_EXPR
> -      || code == VEC_EXTRACT_EVEN_EXPR
> +  if (code == VEC_EXTRACT_EVEN_EXPR
>        || code == VEC_EXTRACT_ODD_EXPR)
>      return;
>  
> --- gcc/fold-const.c.jj	2011-11-28 17:58:04.000000000 +0100
> +++ gcc/fold-const.c	2011-12-01 13:39:11.516188707 +0100
> @@ -13463,8 +13463,6 @@ fold_binary_loc (location_t loc,
>  
>      case VEC_EXTRACT_EVEN_EXPR:
>      case VEC_EXTRACT_ODD_EXPR:
> -    case VEC_INTERLEAVE_HIGH_EXPR:
> -    case VEC_INTERLEAVE_LOW_EXPR:
>        if ((TREE_CODE (arg0) == VECTOR_CST
>  	   || TREE_CODE (arg0) == CONSTRUCTOR)
>  	  && (TREE_CODE (arg1) == VECTOR_CST
> @@ -13482,14 +13480,6 @@ fold_binary_loc (location_t loc,
>  	      case VEC_EXTRACT_ODD_EXPR:
>  		sel[i] = i * 2 + 1;
>  		break;
> -	      case VEC_INTERLEAVE_HIGH_EXPR:
> -		sel[i] = (i + (BYTES_BIG_ENDIAN ? 0 : nelts)) / 2
> -			 + ((i & 1) ? nelts : 0);
> -		break;
> -	      case VEC_INTERLEAVE_LOW_EXPR:
> -		sel[i] = (i + (BYTES_BIG_ENDIAN ? nelts : 0)) / 2
> -			 + ((i & 1) ? nelts : 0);
> -		break;
>  	      default:
>  		gcc_unreachable ();
>  	      }
> --- gcc/doc/generic.texi.jj	2011-09-02 16:29:21.000000000 +0200
> +++ gcc/doc/generic.texi	2011-12-01 16:09:33.517145316 +0100
> @@ -1697,8 +1697,6 @@ its sole argument yields the representat
>  @tindex VEC_PACK_FIX_TRUNC_EXPR
>  @tindex VEC_EXTRACT_EVEN_EXPR
>  @tindex VEC_EXTRACT_ODD_EXPR
> -@tindex VEC_INTERLEAVE_HIGH_EXPR
> -@tindex VEC_INTERLEAVE_LOW_EXPR
>  
>  @table @code
>  @item VEC_LSHIFT_EXPR
> @@ -1774,17 +1772,6 @@ These nodes represent extracting of the
>  vectors, respectively. Their operands and result are vectors that contain the
>  same number of elements of the same type.
>  
> -@item VEC_INTERLEAVE_HIGH_EXPR
> -@itemx VEC_INTERLEAVE_LOW_EXPR
> -These nodes represent merging and interleaving of the high/low elements of the
> -two input vectors, respectively. The operands and the result are vectors that
> -contain the same number of elements (@code{N}) of the same type.
> -In the case of @code{VEC_INTERLEAVE_HIGH_EXPR}, the high @code{N/2} elements of
> -the first input vector are interleaved with the high @code{N/2} elements of the
> -second input vector. In the case of @code{VEC_INTERLEAVE_LOW_EXPR}, the low
> -@code{N/2} elements of the first input vector are interleaved with the low
> -@code{N/2} elements of the second input vector.
> -
>  @end table
>  
>  
> --- gcc/optabs.c.jj	2011-12-01 11:45:06.000000000 +0100
> +++ gcc/optabs.c	2011-12-01 13:42:03.985176076 +0100
> @@ -553,12 +553,6 @@ optab_for_tree_code (enum tree_code code
>      case VEC_EXTRACT_ODD_EXPR:
>        return vec_extract_odd_optab;
>  
> -    case VEC_INTERLEAVE_HIGH_EXPR:
> -      return vec_interleave_high_optab;
> -
> -    case VEC_INTERLEAVE_LOW_EXPR:
> -      return vec_interleave_low_optab;
> -
>      default:
>        return NULL;
>      }
> @@ -1612,11 +1606,7 @@ expand_binop (enum machine_mode mode, op
>        enum tree_code tcode = ERROR_MARK;
>        rtx sel;
>  
> -      if (binoptab == vec_interleave_high_optab)
> -	tcode = VEC_INTERLEAVE_HIGH_EXPR;
> -      else if (binoptab == vec_interleave_low_optab)
> -	tcode = VEC_INTERLEAVE_LOW_EXPR;
> -      else if (binoptab == vec_extract_even_optab)
> +      if (binoptab == vec_extract_even_optab)
>  	tcode = VEC_EXTRACT_EVEN_EXPR;
>        else if (binoptab == vec_extract_odd_optab)
>  	tcode = VEC_EXTRACT_ODD_EXPR;
> @@ -6271,8 +6261,6 @@ init_optabs (void)
>    init_optab (vec_extract_optab, UNKNOWN);
>    init_optab (vec_extract_even_optab, UNKNOWN);
>    init_optab (vec_extract_odd_optab, UNKNOWN);
> -  init_optab (vec_interleave_high_optab, UNKNOWN);
> -  init_optab (vec_interleave_low_optab, UNKNOWN);
>    init_optab (vec_set_optab, UNKNOWN);
>    init_optab (vec_init_optab, UNKNOWN);
>    init_optab (vec_shl_optab, UNKNOWN);
> @@ -6880,8 +6868,7 @@ can_vec_perm_p (enum machine_mode mode,
>    return true;
>  }
>  
> -/* Return true if we can implement VEC_INTERLEAVE_{HIGH,LOW}_EXPR or
> -   VEC_EXTRACT_{EVEN,ODD}_EXPR with VEC_PERM_EXPR for this target.
> +/* Return true if we can implement with VEC_PERM_EXPR for this target.
>     If PSEL is non-null, return the selector for the permutation.  */
>  
>  bool
> @@ -6931,17 +6918,6 @@ can_vec_perm_for_code_p (enum tree_code
>  	    data[i] = i * 2 + alt;
>  	  break;
>  
> -	case VEC_INTERLEAVE_HIGH_EXPR:
> -	case VEC_INTERLEAVE_LOW_EXPR:
> -	  if ((BYTES_BIG_ENDIAN != 0) ^ (code == VEC_INTERLEAVE_HIGH_EXPR))
> -	    alt = nelt / 2;
> -	  for (i = 0; i < nelt / 2; ++i)
> -	    {
> -	      data[i * 2] = i + alt;
> -	      data[i * 2 + 1] = i + nelt + alt;
> -	    }
> -	  break;
> -
>  	default:
>  	  gcc_unreachable ();
>  	}
> --- gcc/genopinit.c.jj	2011-12-01 11:44:53.000000000 +0100
> +++ gcc/genopinit.c	2011-12-01 13:58:15.124456917 +0100
> @@ -1,6 +1,6 @@
>  /* Generate code to initialize optabs from machine description.
>     Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
> -   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2010
> +   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2010, 2011
>     Free Software Foundation, Inc.
>  
>  This file is part of GCC.
> @@ -269,8 +269,6 @@ static const char * const optabs[] =
>    "set_optab_handler (vec_extract_optab, $A, CODE_FOR_$(vec_extract$a$))",
>    "set_optab_handler (vec_extract_even_optab, $A, CODE_FOR_$(vec_extract_even$a$))",
>    "set_optab_handler (vec_extract_odd_optab, $A, CODE_FOR_$(vec_extract_odd$a$))",
> -  "set_optab_handler (vec_interleave_high_optab, $A, CODE_FOR_$(vec_interleave_high$a$))",
> -  "set_optab_handler (vec_interleave_low_optab, $A, CODE_FOR_$(vec_interleave_low$a$))",
>    "set_optab_handler (vec_init_optab, $A, CODE_FOR_$(vec_init$a$))",
>    "set_optab_handler (vec_shl_optab, $A, CODE_FOR_$(vec_shl_$a$))",
>    "set_optab_handler (vec_shr_optab, $A, CODE_FOR_$(vec_shr_$a$))",
> --- gcc/optabs.h.jj	2011-12-01 11:44:53.000000000 +0100
> +++ gcc/optabs.h	2011-12-01 13:42:31.086016331 +0100
> @@ -335,9 +335,6 @@ enum optab_index
>    /* Extract even/odd fields of vector operands.  */
>    OTI_vec_extract_even,
>    OTI_vec_extract_odd,
> -  /* Interleave fields of vector operands.  */
> -  OTI_vec_interleave_high,
> -  OTI_vec_interleave_low,
>    /* Initialize vector operand.  */
>    OTI_vec_init,
>    /* Whole vector shift. The shift amount is in bits.  */
> @@ -564,8 +561,6 @@ enum optab_index
>  #define vec_extract_optab (&optab_table[OTI_vec_extract])
>  #define vec_extract_even_optab (&optab_table[OTI_vec_extract_even])
>  #define vec_extract_odd_optab (&optab_table[OTI_vec_extract_odd])
> -#define vec_interleave_high_optab (&optab_table[OTI_vec_interleave_high])
> -#define vec_interleave_low_optab (&optab_table[OTI_vec_interleave_low])
>  #define vec_init_optab (&optab_table[OTI_vec_init])
>  #define vec_shl_optab (&optab_table[OTI_vec_shl])
>  #define vec_shr_optab (&optab_table[OTI_vec_shr])
> --- gcc/doc/md.texi.jj	2011-12-01 11:45:01.000000000 +0100
> +++ gcc/doc/md.texi	2011-12-01 16:09:59.915980186 +0100
> @@ -4159,20 +4159,6 @@ The odd elements of operand 2 are concat
>  1 in their original order. The result is stored in operand 0.
>  The output and input vectors should have the same modes.
>  
> -@cindex @code{vec_interleave_high@var{m}} instruction pattern
> -@item @samp{vec_interleave_high@var{m}}
> -Merge high elements of the two input vectors into the output vector. The output
> -and input vectors should have the same modes (@code{N} elements). The high
> -@code{N/2} elements of the first input vector are interleaved with the high
> -@code{N/2} elements of the second input vector.
> -
> -@cindex @code{vec_interleave_low@var{m}} instruction pattern
> -@item @samp{vec_interleave_low@var{m}}
> -Merge low elements of the two input vectors into the output vector. The output
> -and input vectors should have the same modes (@code{N} elements). The low
> -@code{N/2} elements of the first input vector are interleaved with the low
> -@code{N/2} elements of the second input vector.
> -
>  @cindex @code{vec_init@var{m}} instruction pattern
>  @item @samp{vec_init@var{m}}
>  Initialize the vector to given values.  Operand 0 is the vector to initialize
> --- gcc/tree-vect-stmts.c.jj	2011-12-01 11:44:57.000000000 +0100
> +++ gcc/tree-vect-stmts.c	2011-12-01 14:29:32.660382553 +0100
> @@ -3828,8 +3828,8 @@ vectorizable_store (gimple stmt, gimple_
>  
>       Then permutation statements are generated:
>  
> -        VS5: vx5 = VEC_INTERLEAVE_HIGH_EXPR < vx0, vx3 >
> -        VS6: vx6 = VEC_INTERLEAVE_LOW_EXPR < vx0, vx3 >
> +	VS5: vx5 = VEC_PERM_EXPR < vx0, vx3, {0, 8, 1, 9, 2, 10, 3, 11} >
> +	VS6: vx6 = VEC_PERM_EXPR < vx0, vx3, {4, 12, 5, 13, 6, 14, 7, 15} >
>  	...
>  
>       And they are put in STMT_VINFO_VEC_STMT of the corresponding scalar stmts
> @@ -4026,8 +4026,8 @@ vectorizable_store (gimple stmt, gimple_
>     the VECTOR_CST mask that implements the permutation of the
>     vector elements.  If that is impossible to do, returns NULL.  */
>  
> -static tree
> -gen_perm_mask (tree vectype, unsigned char *sel)
> +tree
> +vect_gen_perm_mask (tree vectype, unsigned char *sel)
>  {
>    tree mask_elt_type, mask_type, mask_vec;
>    int i, nunits;
> @@ -4067,7 +4067,7 @@ perm_mask_for_reverse (tree vectype)
>    for (i = 0; i < nunits; ++i)
>      sel[i] = nunits - 1 - i;
>  
> -  return gen_perm_mask (vectype, sel);
> +  return vect_gen_perm_mask (vectype, sel);
>  }
>  
>  /* Given a vector variable X and Y, that was generated for the scalar
> @@ -4314,7 +4314,7 @@ vectorizable_load (gimple stmt, gimple_s
>  	  for (i = 0; i < gather_off_nunits; ++i)
>  	    sel[i] = i | nunits;
>  
> -	  perm_mask = gen_perm_mask (gather_off_vectype, sel);
> +	  perm_mask = vect_gen_perm_mask (gather_off_vectype, sel);
>  	  gcc_assert (perm_mask != NULL_TREE);
>  	}
>        else if (nunits == gather_off_nunits * 2)
> @@ -4326,7 +4326,7 @@ vectorizable_load (gimple stmt, gimple_s
>  	    sel[i] = i < gather_off_nunits
>  		     ? i : i + nunits - gather_off_nunits;
>  
> -	  perm_mask = gen_perm_mask (vectype, sel);
> +	  perm_mask = vect_gen_perm_mask (vectype, sel);
>  	  gcc_assert (perm_mask != NULL_TREE);
>  	  ncopies *= 2;
>  	}
> --- gcc/tree-vectorizer.h.jj	2011-12-01 11:44:54.000000000 +0100
> +++ gcc/tree-vectorizer.h	2011-12-01 14:30:02.651203205 +0100
> @@ -848,6 +848,7 @@ extern void vect_get_store_cost (struct
>  extern bool vect_supportable_shift (enum tree_code, tree);
>  extern void vect_get_vec_defs (tree, tree, gimple, VEC (tree, heap) **,
>  			       VEC (tree, heap) **, slp_tree, int);
> +extern tree vect_gen_perm_mask (tree, unsigned char *);
>  
>  /* In tree-vect-data-refs.c.  */
>  extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
> --- gcc/tree-vect-data-refs.c.jj	2011-12-01 11:44:54.000000000 +0100
> +++ gcc/tree-vect-data-refs.c	2011-12-01 14:45:59.248565611 +0100
> @@ -3780,7 +3780,6 @@ vect_create_destination_var (tree scalar
>  bool
>  vect_strided_store_supported (tree vectype, unsigned HOST_WIDE_INT count)
>  {
> -  optab ih_optab, il_optab;
>    enum machine_mode mode;
>  
>    mode = TYPE_MODE (vectype);
> @@ -3795,18 +3794,23 @@ vect_strided_store_supported (tree vecty
>      }
>  
>    /* Check that the operation is supported.  */
> -  ih_optab = optab_for_tree_code (VEC_INTERLEAVE_HIGH_EXPR,
> -				  vectype, optab_default);
> -  il_optab = optab_for_tree_code (VEC_INTERLEAVE_LOW_EXPR,
> -				  vectype, optab_default);
> -  if (il_optab && ih_optab
> -      && optab_handler (ih_optab, mode) != CODE_FOR_nothing
> -      && optab_handler (il_optab, mode) != CODE_FOR_nothing)
> -    return true;
> -
> -  if (can_vec_perm_for_code_p (VEC_INTERLEAVE_HIGH_EXPR, mode, NULL)
> -      && can_vec_perm_for_code_p (VEC_INTERLEAVE_LOW_EXPR, mode, NULL))
> -    return true;
> +  if (VECTOR_MODE_P (mode))
> +    {
> +      unsigned int i, nelt = GET_MODE_NUNITS (mode);
> +      unsigned char *sel = XALLOCAVEC (unsigned char, nelt);
> +      for (i = 0; i < nelt / 2; i++)
> +	{
> +	  sel[i * 2] = i;
> +	  sel[i * 2 + 1] = i + nelt;
> +	}
> +      if (can_vec_perm_p (mode, false, sel))
> +	{
> +	  for (i = 0; i < nelt; i++)
> +	    sel[i] += nelt / 2;
> +	  if (can_vec_perm_p (mode, false, sel))
> +	    return true;
> +	}
> +    }
>  
>    if (vect_print_dump_info (REPORT_DETAILS))
>      fprintf (vect_dump, "interleave op not supported by target.");
> @@ -3897,14 +3901,25 @@ vect_permute_store_chain (VEC(tree,heap)
>    tree perm_dest, vect1, vect2, high, low;
>    gimple perm_stmt;
>    tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt));
> +  tree perm_mask_low, perm_mask_high;
>    int i;
> -  unsigned int j;
> -  enum tree_code high_code, low_code;
> +  unsigned int j, nelt = GET_MODE_NUNITS (TYPE_MODE (vectype));
> +  unsigned char *sel = XALLOCAVEC (unsigned char, nelt);
>  
>    gcc_assert (vect_strided_store_supported (vectype, length));
>  
>    *result_chain = VEC_copy (tree, heap, dr_chain);
>  
> +  for (i = 0; i < nelt / 2; i++)
> +    {
> +      sel[i * 2] = i;
> +      sel[i * 2 + 1] = i + nelt;
> +    }
> +  perm_mask_high = vect_gen_perm_mask (vectype, sel);
> +  for (i = 0; i < nelt; i++)
> +    sel[i] += nelt / 2;
> +  perm_mask_low = vect_gen_perm_mask (vectype, sel);
> +
>    for (i = 0; i < exact_log2 (length); i++)
>      {
>        for (j = 0; j < length/2; j++)
> @@ -3913,42 +3928,27 @@ vect_permute_store_chain (VEC(tree,heap)
>  	  vect2 = VEC_index (tree, dr_chain, j+length/2);
>  
>  	  /* Create interleaving stmt:
> -	     in the case of big endian:
> -                                high = interleave_high (vect1, vect2)
> -             and in the case of little endian:
> -                                high = interleave_low (vect1, vect2).  */
> +	     high = VEC_PERM_EXPR <vect1, vect2, {0, nelt, 1, nelt+1, ...}>  */
>  	  perm_dest = create_tmp_var (vectype, "vect_inter_high");
>  	  DECL_GIMPLE_REG_P (perm_dest) = 1;
>  	  add_referenced_var (perm_dest);
> -          if (BYTES_BIG_ENDIAN)
> -	    {
> -	      high_code = VEC_INTERLEAVE_HIGH_EXPR;
> -	      low_code = VEC_INTERLEAVE_LOW_EXPR;
> -	    }
> -	  else
> -	    {
> -	      low_code = VEC_INTERLEAVE_HIGH_EXPR;
> -	      high_code = VEC_INTERLEAVE_LOW_EXPR;
> -	    }
> -	  perm_stmt = gimple_build_assign_with_ops (high_code, perm_dest,
> -						    vect1, vect2);
> -	  high = make_ssa_name (perm_dest, perm_stmt);
> -	  gimple_assign_set_lhs (perm_stmt, high);
> +	  high = make_ssa_name (perm_dest, NULL);
> +	  perm_stmt
> +	    = gimple_build_assign_with_ops3 (VEC_PERM_EXPR, high,
> +					     vect1, vect2, perm_mask_high);
>  	  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
>  	  VEC_replace (tree, *result_chain, 2*j, high);
>  
>  	  /* Create interleaving stmt:
> -             in the case of big endian:
> -                               low  = interleave_low (vect1, vect2)
> -             and in the case of little endian:
> -                               low  = interleave_high (vect1, vect2).  */
> +	     low = VEC_PERM_EXPR <vect1, vect2, {nelt/2, nelt*3/2, nelt/2+1,
> +						 nelt*3/2+1, ...}>  */
>  	  perm_dest = create_tmp_var (vectype, "vect_inter_low");
>  	  DECL_GIMPLE_REG_P (perm_dest) = 1;
>  	  add_referenced_var (perm_dest);
> -	  perm_stmt = gimple_build_assign_with_ops (low_code, perm_dest,
> -						    vect1, vect2);
> -	  low = make_ssa_name (perm_dest, perm_stmt);
> -	  gimple_assign_set_lhs (perm_stmt, low);
> +	  low = make_ssa_name (perm_dest, NULL);
> +	  perm_stmt
> +	    = gimple_build_assign_with_ops3 (VEC_PERM_EXPR, low,
> +					     vect1, vect2, perm_mask_low);
>  	  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
>  	  VEC_replace (tree, *result_chain, 2*j+1, low);
>  	}
> --- gcc/config/i386/i386.c.jj	2011-12-01 11:44:59.000000000 +0100
> +++ gcc/config/i386/i386.c	2011-12-01 16:07:48.684802498 +0100
> @@ -36013,6 +36013,8 @@ expand_vec_perm_palignr (struct expand_v
>    return ok;
>  }
>  
> +static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d);
> +
>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
>     a two vector permutation into a single vector permutation by using
>     an interleave operation to merge the vectors.  */
> @@ -36039,6 +36041,17 @@ expand_vec_perm_interleave2 (struct expa
>        /* For 32-byte modes allow even d->op0 == d->op1.
>  	 The lack of cross-lane shuffling in some instructions
>  	 might prevent a single insn shuffle.  */
> +      dfinal = *d;
> +      dfinal.testing_p = true;
> +      /* If expand_vec_perm_interleave3 can expand this into
> +	 a 3 insn sequence, give up and let it be expanded as
> +	 3 insn sequence.  While that is one insn longer,
> +	 it doesn't need a memory operand and in the common
> +	 case that both interleave low and high permutations
> +	 with the same operands are adjacent needs 4 insns
> +	 for both after CSE.  */
> +      if (expand_vec_perm_interleave3 (&dfinal))
> +	return false;
>      }
>    else
>      return false;
> @@ -36878,18 +36891,23 @@ expand_vec_perm_broadcast_1 (struct expa
>  	 stopping once we have promoted to V4SImode and then use pshufd.  */
>        do
>  	{
> -	  optab otab = vec_interleave_low_optab;
> +	  rtx dest;
> +	  rtx (*gen) (rtx, rtx, rtx)
> +	    = vmode == V16QImode ? gen_vec_interleave_lowv16qi
> +				 : gen_vec_interleave_lowv8hi;
>  
>  	  if (elt >= nelt2)
>  	    {
> -	      otab = vec_interleave_high_optab;
> +	      gen = vmode == V16QImode ? gen_vec_interleave_highv16qi
> +				       : gen_vec_interleave_highv8hi;
>  	      elt -= nelt2;
>  	    }
>  	  nelt2 /= 2;
>  
> -	  op0 = expand_binop (vmode, otab, op0, op0, NULL, 0, OPTAB_DIRECT);
> +	  dest = gen_reg_rtx (vmode);
> +	  emit_insn (gen (dest, op0, op0));
>  	  vmode = get_mode_wider_vector (vmode);
> -	  op0 = gen_lowpart (vmode, op0);
> +	  op0 = gen_lowpart (vmode, dest);
>  	}
>        while (vmode != V4SImode);
>  
> 
> 	Jakub
> 
> 

-- 
Richard Guenther <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]