This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix up VEC_INTERLEAVE_*_EXPR folding and expansion for big endian (PR tree-optimization/51074)
On Thu, 1 Dec 2011, Jakub Jelinek wrote:
> On Thu, Dec 01, 2011 at 07:57:48AM -0800, Richard Henderson wrote:
> > On 12/01/2011 03:21 AM, Richard Guenther wrote:
> > > Yes, sorry - I'm recovering from a 3 week e-mail lag ;) I agree
> > > using VEC_PERM_EXPR would be best - but that would also affect
> > > backend patterns. Can we have a middle-ground that leaves those
> > > untouched? We're still in stage 3, so fixing the bug with using
> > > VEC_PERM_EXPR sounds appealing to me ;)
> >
> > If we agree that we want to fix this with vec_perm_expr, then we need a
> > relatively minor patch to the vectorizer, and cleanups in the targets.
> >
> > In particular, powerpc, spu, and ia64 will need to recognize various
> > constant permutations so that they can continue using the specialized
> > instructions for interleave. This shouldn't be particularly difficult; a
> > few testcases added to make sure we don't regress to full permutation
> > wouldn't be amiss.
> >
> > The x86 port is the only one that really does aggressive constant
> > permutation pattern recognition atm. That is, of course, because the ISA
> > support for permutation there is all over the map and we had no choice.
> >
> > I've already zapped the target patterns that expanded interleave/even_odd
> > back into a permuation operation.
> >
> > If we think this is ok for stage3, we can certainly give it a whack. I'll
> > take care of the backends if Jakub takes care of the vectorizer?
>
> Here is the vectorizer part (untested so far) + some small i386 tweaks.
> This patch as is regresses code quality for powerpc/ia64/sparc/mips
> (I don't think spu has vec_interleave* patterns in *.md).
>
> If it works out, I guess we could also zap VEC_EXTRACT_{EVEN,ODD}_EXPR
> similarly.
Yeah, I think it's a good cleaup opportunity.
Thanks,
Richard.
> 2011-12-01 Jakub Jelinek <jakub@redhat.com>
>
> * tree.def (VEC_INTERLEAVE_HIGH_EXPR, VEC_INTERLEAVE_LOW_EXPR): Remove.
> * gimple-pretty-print.c (dump_binary_rhs): Don't handle
> VEC_INTERLEAVE_HIGH_EXPR and VEC_INTERLEAVE_LOW_EXPR.
> * expr.c (expand_expr_real_2): Likewise.
> * tree-cfg.c (verify_gimple_assign_binary): Likewise.
> * cfgexpand.c (expand_debug_expr): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * tree-vect-generic.c (expand_vector_operations_1): Likewise.
> * fold-const.c (fold_binary_loc): Likewise.
> * doc/generic.texi (VEC_INTERLEAVE_HIGH_EXPR,
> VEC_INTERLEAVE_LOW_EXPR): Remove documentation.
> * optabs.c (optab_for_tree_code): Don't handle
> VEC_INTERLEAVE_HIGH_EXPR and VEC_INTERLEAVE_LOW_EXPR.
> (expand_binop, init_optabs): Remove vec_interleave_high_optab
> and vec_interleave_low_optab.
> * genopinit.c (optabs): Likewise.
> * optabs.h (OTI_vec_interleave_high, OTI_vec_interleave_low): Remove.
> (vec_interleave_high_optab, vec_interleave_low_optab): Remove.
> * doc/md.texi (vec_interleave_high, vec_interleave_low): Remove
> documentation.
> * tree-vect-stmts.c (gen_perm_mask): Renamed to...
> (vect_gen_perm_mask): ... this. No longer static.
> (perm_mask_for_reverse, vectorizable_load): Adjust callers.
> * tree-vectorizer.h (vect_gen_perm_mask): New prototype.
> * tree-vect-data-refs.c (vect_strided_store_supported): Don't try
> VEC_INTERLEAVE_*_EXPR, use can_vec_perm_p instead of
> can_vec_perm_for_code_p.
> (vect_permute_store_chain): Generate VEC_PERM_EXPR with interleaving
> masks instead of VEC_INTERLEAVE_HIGH_EXPR and VEC_INTERLEAVE_LOW_EXPR.
> * config/i386/i386.c (expand_vec_perm_interleave2): If
> expand_vec_perm_interleave3 would handle it, return false.
> (expand_vec_perm_broadcast_1): Don't use vec_interleave_*_optab.
>
> --- gcc/tree.def.jj 2011-12-01 11:44:55.000000000 +0100
> +++ gcc/tree.def 2011-12-01 13:37:32.071771156 +0100
> @@ -1192,10 +1192,6 @@ DEFTREECODE (VEC_PACK_FIX_TRUNC_EXPR, "v
> DEFTREECODE (VEC_EXTRACT_EVEN_EXPR, "vec_extract_even_expr", tcc_binary, 2)
> DEFTREECODE (VEC_EXTRACT_ODD_EXPR, "vec_extract_odd_expr", tcc_binary, 2)
>
> -/* Merge input vectors interleaving their fields. */
> -DEFTREECODE (VEC_INTERLEAVE_HIGH_EXPR, "vec_interleave_high_expr", tcc_binary, 2)
> -DEFTREECODE (VEC_INTERLEAVE_LOW_EXPR, "vec_interleave_low_expr", tcc_binary, 2)
> -
> /* Widening vector shift left in bits.
> Operand 0 is a vector to be shifted with N elements of size S.
> Operand 1 is an integer shift amount in bits.
> --- gcc/gimple-pretty-print.c.jj 2011-12-01 11:44:54.000000000 +0100
> +++ gcc/gimple-pretty-print.c 2011-12-01 13:39:26.611099281 +0100
> @@ -347,8 +347,6 @@ dump_binary_rhs (pretty_printer *buffer,
> case VEC_PACK_FIX_TRUNC_EXPR:
> case VEC_EXTRACT_EVEN_EXPR:
> case VEC_EXTRACT_ODD_EXPR:
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - case VEC_INTERLEAVE_LOW_EXPR:
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> for (p = tree_code_name [(int) code]; *p; p++)
> --- gcc/expr.c.jj 2011-12-01 11:44:53.000000000 +0100
> +++ gcc/expr.c 2011-12-01 13:38:24.887461805 +0100
> @@ -8668,8 +8668,6 @@ expand_expr_real_2 (sepops ops, rtx targ
>
> case VEC_EXTRACT_EVEN_EXPR:
> case VEC_EXTRACT_ODD_EXPR:
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - case VEC_INTERLEAVE_LOW_EXPR:
> goto binop;
>
> case VEC_LSHIFT_EXPR:
> --- gcc/tree-cfg.c.jj 2011-12-01 11:44:58.000000000 +0100
> +++ gcc/tree-cfg.c 2011-12-01 13:59:00.162192709 +0100
> @@ -3703,8 +3703,6 @@ do_pointer_plus_expr_check:
> case VEC_PACK_FIX_TRUNC_EXPR:
> case VEC_EXTRACT_EVEN_EXPR:
> case VEC_EXTRACT_ODD_EXPR:
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - case VEC_INTERLEAVE_LOW_EXPR:
> /* FIXME. */
> return false;
>
> --- gcc/cfgexpand.c.jj 2011-12-01 12:37:57.000000000 +0100
> +++ gcc/cfgexpand.c 2011-12-01 13:38:04.380581793 +0100
> @@ -3448,8 +3448,6 @@ expand_debug_expr (tree exp)
> case VEC_COND_EXPR:
> case VEC_EXTRACT_EVEN_EXPR:
> case VEC_EXTRACT_ODD_EXPR:
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - case VEC_INTERLEAVE_LOW_EXPR:
> case VEC_LSHIFT_EXPR:
> case VEC_PACK_FIX_TRUNC_EXPR:
> case VEC_PACK_SAT_EXPR:
> --- gcc/tree-inline.c.jj 2011-12-01 11:44:57.000000000 +0100
> +++ gcc/tree-inline.c 2011-12-01 13:59:12.573120076 +0100
> @@ -3401,8 +3401,6 @@ estimate_operator_cost (enum tree_code c
> case VEC_PACK_FIX_TRUNC_EXPR:
> case VEC_EXTRACT_EVEN_EXPR:
> case VEC_EXTRACT_ODD_EXPR:
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - case VEC_INTERLEAVE_LOW_EXPR:
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
>
> --- gcc/tree-pretty-print.c.jj 2011-12-01 11:44:53.000000000 +0100
> +++ gcc/tree-pretty-print.c 2011-12-01 13:59:40.219957523 +0100
> @@ -2404,22 +2404,6 @@ dump_generic_node (pretty_printer *buffe
> pp_string (buffer, " > ");
> break;
>
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - pp_string (buffer, " VEC_INTERLEAVE_HIGH_EXPR < ");
> - dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
> - pp_string (buffer, ", ");
> - dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
> - pp_string (buffer, " > ");
> - break;
> -
> - case VEC_INTERLEAVE_LOW_EXPR:
> - pp_string (buffer, " VEC_INTERLEAVE_LOW_EXPR < ");
> - dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
> - pp_string (buffer, ", ");
> - dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
> - pp_string (buffer, " > ");
> - break;
> -
> default:
> NIY;
> }
> --- gcc/tree-vect-generic.c.jj 2011-12-01 11:44:58.000000000 +0100
> +++ gcc/tree-vect-generic.c 2011-12-01 14:05:29.685879992 +0100
> @@ -776,9 +776,7 @@ expand_vector_operations_1 (gimple_stmt_
> /* These are only created by the vectorizer, after having queried
> the target support. It's more than just looking at the optab,
> and there's no need to do it again. */
> - if (code == VEC_INTERLEAVE_HIGH_EXPR
> - || code == VEC_INTERLEAVE_LOW_EXPR
> - || code == VEC_EXTRACT_EVEN_EXPR
> + if (code == VEC_EXTRACT_EVEN_EXPR
> || code == VEC_EXTRACT_ODD_EXPR)
> return;
>
> --- gcc/fold-const.c.jj 2011-11-28 17:58:04.000000000 +0100
> +++ gcc/fold-const.c 2011-12-01 13:39:11.516188707 +0100
> @@ -13463,8 +13463,6 @@ fold_binary_loc (location_t loc,
>
> case VEC_EXTRACT_EVEN_EXPR:
> case VEC_EXTRACT_ODD_EXPR:
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - case VEC_INTERLEAVE_LOW_EXPR:
> if ((TREE_CODE (arg0) == VECTOR_CST
> || TREE_CODE (arg0) == CONSTRUCTOR)
> && (TREE_CODE (arg1) == VECTOR_CST
> @@ -13482,14 +13480,6 @@ fold_binary_loc (location_t loc,
> case VEC_EXTRACT_ODD_EXPR:
> sel[i] = i * 2 + 1;
> break;
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - sel[i] = (i + (BYTES_BIG_ENDIAN ? 0 : nelts)) / 2
> - + ((i & 1) ? nelts : 0);
> - break;
> - case VEC_INTERLEAVE_LOW_EXPR:
> - sel[i] = (i + (BYTES_BIG_ENDIAN ? nelts : 0)) / 2
> - + ((i & 1) ? nelts : 0);
> - break;
> default:
> gcc_unreachable ();
> }
> --- gcc/doc/generic.texi.jj 2011-09-02 16:29:21.000000000 +0200
> +++ gcc/doc/generic.texi 2011-12-01 16:09:33.517145316 +0100
> @@ -1697,8 +1697,6 @@ its sole argument yields the representat
> @tindex VEC_PACK_FIX_TRUNC_EXPR
> @tindex VEC_EXTRACT_EVEN_EXPR
> @tindex VEC_EXTRACT_ODD_EXPR
> -@tindex VEC_INTERLEAVE_HIGH_EXPR
> -@tindex VEC_INTERLEAVE_LOW_EXPR
>
> @table @code
> @item VEC_LSHIFT_EXPR
> @@ -1774,17 +1772,6 @@ These nodes represent extracting of the
> vectors, respectively. Their operands and result are vectors that contain the
> same number of elements of the same type.
>
> -@item VEC_INTERLEAVE_HIGH_EXPR
> -@itemx VEC_INTERLEAVE_LOW_EXPR
> -These nodes represent merging and interleaving of the high/low elements of the
> -two input vectors, respectively. The operands and the result are vectors that
> -contain the same number of elements (@code{N}) of the same type.
> -In the case of @code{VEC_INTERLEAVE_HIGH_EXPR}, the high @code{N/2} elements of
> -the first input vector are interleaved with the high @code{N/2} elements of the
> -second input vector. In the case of @code{VEC_INTERLEAVE_LOW_EXPR}, the low
> -@code{N/2} elements of the first input vector are interleaved with the low
> -@code{N/2} elements of the second input vector.
> -
> @end table
>
>
> --- gcc/optabs.c.jj 2011-12-01 11:45:06.000000000 +0100
> +++ gcc/optabs.c 2011-12-01 13:42:03.985176076 +0100
> @@ -553,12 +553,6 @@ optab_for_tree_code (enum tree_code code
> case VEC_EXTRACT_ODD_EXPR:
> return vec_extract_odd_optab;
>
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - return vec_interleave_high_optab;
> -
> - case VEC_INTERLEAVE_LOW_EXPR:
> - return vec_interleave_low_optab;
> -
> default:
> return NULL;
> }
> @@ -1612,11 +1606,7 @@ expand_binop (enum machine_mode mode, op
> enum tree_code tcode = ERROR_MARK;
> rtx sel;
>
> - if (binoptab == vec_interleave_high_optab)
> - tcode = VEC_INTERLEAVE_HIGH_EXPR;
> - else if (binoptab == vec_interleave_low_optab)
> - tcode = VEC_INTERLEAVE_LOW_EXPR;
> - else if (binoptab == vec_extract_even_optab)
> + if (binoptab == vec_extract_even_optab)
> tcode = VEC_EXTRACT_EVEN_EXPR;
> else if (binoptab == vec_extract_odd_optab)
> tcode = VEC_EXTRACT_ODD_EXPR;
> @@ -6271,8 +6261,6 @@ init_optabs (void)
> init_optab (vec_extract_optab, UNKNOWN);
> init_optab (vec_extract_even_optab, UNKNOWN);
> init_optab (vec_extract_odd_optab, UNKNOWN);
> - init_optab (vec_interleave_high_optab, UNKNOWN);
> - init_optab (vec_interleave_low_optab, UNKNOWN);
> init_optab (vec_set_optab, UNKNOWN);
> init_optab (vec_init_optab, UNKNOWN);
> init_optab (vec_shl_optab, UNKNOWN);
> @@ -6880,8 +6868,7 @@ can_vec_perm_p (enum machine_mode mode,
> return true;
> }
>
> -/* Return true if we can implement VEC_INTERLEAVE_{HIGH,LOW}_EXPR or
> - VEC_EXTRACT_{EVEN,ODD}_EXPR with VEC_PERM_EXPR for this target.
> +/* Return true if we can implement with VEC_PERM_EXPR for this target.
> If PSEL is non-null, return the selector for the permutation. */
>
> bool
> @@ -6931,17 +6918,6 @@ can_vec_perm_for_code_p (enum tree_code
> data[i] = i * 2 + alt;
> break;
>
> - case VEC_INTERLEAVE_HIGH_EXPR:
> - case VEC_INTERLEAVE_LOW_EXPR:
> - if ((BYTES_BIG_ENDIAN != 0) ^ (code == VEC_INTERLEAVE_HIGH_EXPR))
> - alt = nelt / 2;
> - for (i = 0; i < nelt / 2; ++i)
> - {
> - data[i * 2] = i + alt;
> - data[i * 2 + 1] = i + nelt + alt;
> - }
> - break;
> -
> default:
> gcc_unreachable ();
> }
> --- gcc/genopinit.c.jj 2011-12-01 11:44:53.000000000 +0100
> +++ gcc/genopinit.c 2011-12-01 13:58:15.124456917 +0100
> @@ -1,6 +1,6 @@
> /* Generate code to initialize optabs from machine description.
> Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
> - 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2010
> + 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2010, 2011
> Free Software Foundation, Inc.
>
> This file is part of GCC.
> @@ -269,8 +269,6 @@ static const char * const optabs[] =
> "set_optab_handler (vec_extract_optab, $A, CODE_FOR_$(vec_extract$a$))",
> "set_optab_handler (vec_extract_even_optab, $A, CODE_FOR_$(vec_extract_even$a$))",
> "set_optab_handler (vec_extract_odd_optab, $A, CODE_FOR_$(vec_extract_odd$a$))",
> - "set_optab_handler (vec_interleave_high_optab, $A, CODE_FOR_$(vec_interleave_high$a$))",
> - "set_optab_handler (vec_interleave_low_optab, $A, CODE_FOR_$(vec_interleave_low$a$))",
> "set_optab_handler (vec_init_optab, $A, CODE_FOR_$(vec_init$a$))",
> "set_optab_handler (vec_shl_optab, $A, CODE_FOR_$(vec_shl_$a$))",
> "set_optab_handler (vec_shr_optab, $A, CODE_FOR_$(vec_shr_$a$))",
> --- gcc/optabs.h.jj 2011-12-01 11:44:53.000000000 +0100
> +++ gcc/optabs.h 2011-12-01 13:42:31.086016331 +0100
> @@ -335,9 +335,6 @@ enum optab_index
> /* Extract even/odd fields of vector operands. */
> OTI_vec_extract_even,
> OTI_vec_extract_odd,
> - /* Interleave fields of vector operands. */
> - OTI_vec_interleave_high,
> - OTI_vec_interleave_low,
> /* Initialize vector operand. */
> OTI_vec_init,
> /* Whole vector shift. The shift amount is in bits. */
> @@ -564,8 +561,6 @@ enum optab_index
> #define vec_extract_optab (&optab_table[OTI_vec_extract])
> #define vec_extract_even_optab (&optab_table[OTI_vec_extract_even])
> #define vec_extract_odd_optab (&optab_table[OTI_vec_extract_odd])
> -#define vec_interleave_high_optab (&optab_table[OTI_vec_interleave_high])
> -#define vec_interleave_low_optab (&optab_table[OTI_vec_interleave_low])
> #define vec_init_optab (&optab_table[OTI_vec_init])
> #define vec_shl_optab (&optab_table[OTI_vec_shl])
> #define vec_shr_optab (&optab_table[OTI_vec_shr])
> --- gcc/doc/md.texi.jj 2011-12-01 11:45:01.000000000 +0100
> +++ gcc/doc/md.texi 2011-12-01 16:09:59.915980186 +0100
> @@ -4159,20 +4159,6 @@ The odd elements of operand 2 are concat
> 1 in their original order. The result is stored in operand 0.
> The output and input vectors should have the same modes.
>
> -@cindex @code{vec_interleave_high@var{m}} instruction pattern
> -@item @samp{vec_interleave_high@var{m}}
> -Merge high elements of the two input vectors into the output vector. The output
> -and input vectors should have the same modes (@code{N} elements). The high
> -@code{N/2} elements of the first input vector are interleaved with the high
> -@code{N/2} elements of the second input vector.
> -
> -@cindex @code{vec_interleave_low@var{m}} instruction pattern
> -@item @samp{vec_interleave_low@var{m}}
> -Merge low elements of the two input vectors into the output vector. The output
> -and input vectors should have the same modes (@code{N} elements). The low
> -@code{N/2} elements of the first input vector are interleaved with the low
> -@code{N/2} elements of the second input vector.
> -
> @cindex @code{vec_init@var{m}} instruction pattern
> @item @samp{vec_init@var{m}}
> Initialize the vector to given values. Operand 0 is the vector to initialize
> --- gcc/tree-vect-stmts.c.jj 2011-12-01 11:44:57.000000000 +0100
> +++ gcc/tree-vect-stmts.c 2011-12-01 14:29:32.660382553 +0100
> @@ -3828,8 +3828,8 @@ vectorizable_store (gimple stmt, gimple_
>
> Then permutation statements are generated:
>
> - VS5: vx5 = VEC_INTERLEAVE_HIGH_EXPR < vx0, vx3 >
> - VS6: vx6 = VEC_INTERLEAVE_LOW_EXPR < vx0, vx3 >
> + VS5: vx5 = VEC_PERM_EXPR < vx0, vx3, {0, 8, 1, 9, 2, 10, 3, 11} >
> + VS6: vx6 = VEC_PERM_EXPR < vx0, vx3, {4, 12, 5, 13, 6, 14, 7, 15} >
> ...
>
> And they are put in STMT_VINFO_VEC_STMT of the corresponding scalar stmts
> @@ -4026,8 +4026,8 @@ vectorizable_store (gimple stmt, gimple_
> the VECTOR_CST mask that implements the permutation of the
> vector elements. If that is impossible to do, returns NULL. */
>
> -static tree
> -gen_perm_mask (tree vectype, unsigned char *sel)
> +tree
> +vect_gen_perm_mask (tree vectype, unsigned char *sel)
> {
> tree mask_elt_type, mask_type, mask_vec;
> int i, nunits;
> @@ -4067,7 +4067,7 @@ perm_mask_for_reverse (tree vectype)
> for (i = 0; i < nunits; ++i)
> sel[i] = nunits - 1 - i;
>
> - return gen_perm_mask (vectype, sel);
> + return vect_gen_perm_mask (vectype, sel);
> }
>
> /* Given a vector variable X and Y, that was generated for the scalar
> @@ -4314,7 +4314,7 @@ vectorizable_load (gimple stmt, gimple_s
> for (i = 0; i < gather_off_nunits; ++i)
> sel[i] = i | nunits;
>
> - perm_mask = gen_perm_mask (gather_off_vectype, sel);
> + perm_mask = vect_gen_perm_mask (gather_off_vectype, sel);
> gcc_assert (perm_mask != NULL_TREE);
> }
> else if (nunits == gather_off_nunits * 2)
> @@ -4326,7 +4326,7 @@ vectorizable_load (gimple stmt, gimple_s
> sel[i] = i < gather_off_nunits
> ? i : i + nunits - gather_off_nunits;
>
> - perm_mask = gen_perm_mask (vectype, sel);
> + perm_mask = vect_gen_perm_mask (vectype, sel);
> gcc_assert (perm_mask != NULL_TREE);
> ncopies *= 2;
> }
> --- gcc/tree-vectorizer.h.jj 2011-12-01 11:44:54.000000000 +0100
> +++ gcc/tree-vectorizer.h 2011-12-01 14:30:02.651203205 +0100
> @@ -848,6 +848,7 @@ extern void vect_get_store_cost (struct
> extern bool vect_supportable_shift (enum tree_code, tree);
> extern void vect_get_vec_defs (tree, tree, gimple, VEC (tree, heap) **,
> VEC (tree, heap) **, slp_tree, int);
> +extern tree vect_gen_perm_mask (tree, unsigned char *);
>
> /* In tree-vect-data-refs.c. */
> extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
> --- gcc/tree-vect-data-refs.c.jj 2011-12-01 11:44:54.000000000 +0100
> +++ gcc/tree-vect-data-refs.c 2011-12-01 14:45:59.248565611 +0100
> @@ -3780,7 +3780,6 @@ vect_create_destination_var (tree scalar
> bool
> vect_strided_store_supported (tree vectype, unsigned HOST_WIDE_INT count)
> {
> - optab ih_optab, il_optab;
> enum machine_mode mode;
>
> mode = TYPE_MODE (vectype);
> @@ -3795,18 +3794,23 @@ vect_strided_store_supported (tree vecty
> }
>
> /* Check that the operation is supported. */
> - ih_optab = optab_for_tree_code (VEC_INTERLEAVE_HIGH_EXPR,
> - vectype, optab_default);
> - il_optab = optab_for_tree_code (VEC_INTERLEAVE_LOW_EXPR,
> - vectype, optab_default);
> - if (il_optab && ih_optab
> - && optab_handler (ih_optab, mode) != CODE_FOR_nothing
> - && optab_handler (il_optab, mode) != CODE_FOR_nothing)
> - return true;
> -
> - if (can_vec_perm_for_code_p (VEC_INTERLEAVE_HIGH_EXPR, mode, NULL)
> - && can_vec_perm_for_code_p (VEC_INTERLEAVE_LOW_EXPR, mode, NULL))
> - return true;
> + if (VECTOR_MODE_P (mode))
> + {
> + unsigned int i, nelt = GET_MODE_NUNITS (mode);
> + unsigned char *sel = XALLOCAVEC (unsigned char, nelt);
> + for (i = 0; i < nelt / 2; i++)
> + {
> + sel[i * 2] = i;
> + sel[i * 2 + 1] = i + nelt;
> + }
> + if (can_vec_perm_p (mode, false, sel))
> + {
> + for (i = 0; i < nelt; i++)
> + sel[i] += nelt / 2;
> + if (can_vec_perm_p (mode, false, sel))
> + return true;
> + }
> + }
>
> if (vect_print_dump_info (REPORT_DETAILS))
> fprintf (vect_dump, "interleave op not supported by target.");
> @@ -3897,14 +3901,25 @@ vect_permute_store_chain (VEC(tree,heap)
> tree perm_dest, vect1, vect2, high, low;
> gimple perm_stmt;
> tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt));
> + tree perm_mask_low, perm_mask_high;
> int i;
> - unsigned int j;
> - enum tree_code high_code, low_code;
> + unsigned int j, nelt = GET_MODE_NUNITS (TYPE_MODE (vectype));
> + unsigned char *sel = XALLOCAVEC (unsigned char, nelt);
>
> gcc_assert (vect_strided_store_supported (vectype, length));
>
> *result_chain = VEC_copy (tree, heap, dr_chain);
>
> + for (i = 0; i < nelt / 2; i++)
> + {
> + sel[i * 2] = i;
> + sel[i * 2 + 1] = i + nelt;
> + }
> + perm_mask_high = vect_gen_perm_mask (vectype, sel);
> + for (i = 0; i < nelt; i++)
> + sel[i] += nelt / 2;
> + perm_mask_low = vect_gen_perm_mask (vectype, sel);
> +
> for (i = 0; i < exact_log2 (length); i++)
> {
> for (j = 0; j < length/2; j++)
> @@ -3913,42 +3928,27 @@ vect_permute_store_chain (VEC(tree,heap)
> vect2 = VEC_index (tree, dr_chain, j+length/2);
>
> /* Create interleaving stmt:
> - in the case of big endian:
> - high = interleave_high (vect1, vect2)
> - and in the case of little endian:
> - high = interleave_low (vect1, vect2). */
> + high = VEC_PERM_EXPR <vect1, vect2, {0, nelt, 1, nelt+1, ...}> */
> perm_dest = create_tmp_var (vectype, "vect_inter_high");
> DECL_GIMPLE_REG_P (perm_dest) = 1;
> add_referenced_var (perm_dest);
> - if (BYTES_BIG_ENDIAN)
> - {
> - high_code = VEC_INTERLEAVE_HIGH_EXPR;
> - low_code = VEC_INTERLEAVE_LOW_EXPR;
> - }
> - else
> - {
> - low_code = VEC_INTERLEAVE_HIGH_EXPR;
> - high_code = VEC_INTERLEAVE_LOW_EXPR;
> - }
> - perm_stmt = gimple_build_assign_with_ops (high_code, perm_dest,
> - vect1, vect2);
> - high = make_ssa_name (perm_dest, perm_stmt);
> - gimple_assign_set_lhs (perm_stmt, high);
> + high = make_ssa_name (perm_dest, NULL);
> + perm_stmt
> + = gimple_build_assign_with_ops3 (VEC_PERM_EXPR, high,
> + vect1, vect2, perm_mask_high);
> vect_finish_stmt_generation (stmt, perm_stmt, gsi);
> VEC_replace (tree, *result_chain, 2*j, high);
>
> /* Create interleaving stmt:
> - in the case of big endian:
> - low = interleave_low (vect1, vect2)
> - and in the case of little endian:
> - low = interleave_high (vect1, vect2). */
> + low = VEC_PERM_EXPR <vect1, vect2, {nelt/2, nelt*3/2, nelt/2+1,
> + nelt*3/2+1, ...}> */
> perm_dest = create_tmp_var (vectype, "vect_inter_low");
> DECL_GIMPLE_REG_P (perm_dest) = 1;
> add_referenced_var (perm_dest);
> - perm_stmt = gimple_build_assign_with_ops (low_code, perm_dest,
> - vect1, vect2);
> - low = make_ssa_name (perm_dest, perm_stmt);
> - gimple_assign_set_lhs (perm_stmt, low);
> + low = make_ssa_name (perm_dest, NULL);
> + perm_stmt
> + = gimple_build_assign_with_ops3 (VEC_PERM_EXPR, low,
> + vect1, vect2, perm_mask_low);
> vect_finish_stmt_generation (stmt, perm_stmt, gsi);
> VEC_replace (tree, *result_chain, 2*j+1, low);
> }
> --- gcc/config/i386/i386.c.jj 2011-12-01 11:44:59.000000000 +0100
> +++ gcc/config/i386/i386.c 2011-12-01 16:07:48.684802498 +0100
> @@ -36013,6 +36013,8 @@ expand_vec_perm_palignr (struct expand_v
> return ok;
> }
>
> +static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d);
> +
> /* A subroutine of ix86_expand_vec_perm_builtin_1. Try to simplify
> a two vector permutation into a single vector permutation by using
> an interleave operation to merge the vectors. */
> @@ -36039,6 +36041,17 @@ expand_vec_perm_interleave2 (struct expa
> /* For 32-byte modes allow even d->op0 == d->op1.
> The lack of cross-lane shuffling in some instructions
> might prevent a single insn shuffle. */
> + dfinal = *d;
> + dfinal.testing_p = true;
> + /* If expand_vec_perm_interleave3 can expand this into
> + a 3 insn sequence, give up and let it be expanded as
> + 3 insn sequence. While that is one insn longer,
> + it doesn't need a memory operand and in the common
> + case that both interleave low and high permutations
> + with the same operands are adjacent needs 4 insns
> + for both after CSE. */
> + if (expand_vec_perm_interleave3 (&dfinal))
> + return false;
> }
> else
> return false;
> @@ -36878,18 +36891,23 @@ expand_vec_perm_broadcast_1 (struct expa
> stopping once we have promoted to V4SImode and then use pshufd. */
> do
> {
> - optab otab = vec_interleave_low_optab;
> + rtx dest;
> + rtx (*gen) (rtx, rtx, rtx)
> + = vmode == V16QImode ? gen_vec_interleave_lowv16qi
> + : gen_vec_interleave_lowv8hi;
>
> if (elt >= nelt2)
> {
> - otab = vec_interleave_high_optab;
> + gen = vmode == V16QImode ? gen_vec_interleave_highv16qi
> + : gen_vec_interleave_highv8hi;
> elt -= nelt2;
> }
> nelt2 /= 2;
>
> - op0 = expand_binop (vmode, otab, op0, op0, NULL, 0, OPTAB_DIRECT);
> + dest = gen_reg_rtx (vmode);
> + emit_insn (gen (dest, op0, op0));
> vmode = get_mode_wider_vector (vmode);
> - op0 = gen_lowpart (vmode, op0);
> + op0 = gen_lowpart (vmode, dest);
> }
> while (vmode != V4SImode);
>
>
> Jakub
>
>
--
Richard Guenther <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer