This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [05/13] Remove vec_perm_const optab


On Sun, Dec 10, 2017 at 12:16 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> One of the changes needed for variable-length VEC_PERM_EXPRs -- and for
> long fixed-length VEC_PERM_EXPRs -- is the ability to use constant
> selectors that wouldn't fit in the vectors being permuted.  E.g. a
> permute on two V256QIs can't be done using a V256QI selector.
>
> At the moment constant permutes use two interfaces:
> targetm.vectorizer.vec_perm_const_ok for testing whether a permute is
> valid and the vec_perm_const optab for actually emitting the permute.
> The former gets passed a vec<> selector and the latter an rtx selector.
> Most ports share a lot of code between the hook and the optab, with a
> wrapper function for each interface.
>
> We could try to keep that interface and require ports to define wider
> vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or
> V256SI in the example above).  But building a CONST_VECTOR rtx seems a bit
> pointless here, since the expand code only creates the CONST_VECTOR in
> order to call the optab, and the first thing the target does is take
> the CONST_VECTOR apart again.
>
> The easiest approach therefore seemed to be to remove the optab and
> reuse the target hook to emit the code.  One potential drawback is that
> it's no longer possible to use match_operand predicates to force
> operands into the required form, but in practice all targets want
> register operands anyway.
>
> The patch also changes vec_perm_indices into a class that provides
> some simple routines for handling permutations.  A later patch will
> flesh this out and get rid of auto_vec_perm_indices, but I didn't
> want to do all that in this patch and make it more complicated than
> it already is.
>
>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * Makefile.in (OBJS): Add vec-perm-indices.o.
>         * vec-perm-indices.h: New file.
>         * vec-perm-indices.c: Likewise.
>         * target.h (vec_perm_indices): Replace with a forward class
>         declaration.
>         (auto_vec_perm_indices): Move to vec-perm-indices.h.
>         * optabs.h: Include vec-perm-indices.h.
>         (expand_vec_perm): Delete.
>         (selector_fits_mode_p, expand_vec_perm_var): Declare.
>         (expand_vec_perm_const): Declare.
>         * target.def (vec_perm_const_ok): Replace with...
>         (vec_perm_const): ...this new hook.
>         * doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with...
>         (TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook.
>         * doc/tm.texi: Regenerate.
>         * optabs.def (vec_perm_const): Delete.
>         * doc/md.texi (vec_perm_const): Likewise.
>         (vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST.
>         * expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than
>         expand_vec_perm for constant permutation vectors.  Assert that
>         the mode of variable permutation vectors is the integer equivalent
>         of the mode that is being permuted.
>         * optabs-query.h (selector_fits_mode_p): Declare.
>         * optabs-query.c: Include vec-perm-indices.h.
>         (can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const
>         is defined, instead of checking whether the vec_perm_const_optab
>         exists.  Use targetm.vectorize.vec_perm_const instead of
>         targetm.vectorize.vec_perm_const_ok.  Check whether the indices
>         fit in the vector mode before using a variable permute.
>         * optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a
>         vec_perm_indices instead of an rtx.
>         (expand_vec_perm): Replace with...
>         (expand_vec_perm_const): ...this new function.  Take the selector
>         as a vec_perm_indices rather than an rtx.  Also take the mode of
>         the selector.  Update call to shift_amt_for_vec_perm_mask.
>         Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab.
>         Use vec_perm_indices::new_expanded_vector to expand the original
>         selector into bytes.  Check whether the indices fit in the vector
>         mode before using a variable permute.
>         (expand_vec_perm_var): Make global.
>         (expand_mult_highpart): Use expand_vec_perm_const.
>         * fold-const.c: Includes vec-perm-indices.h.
>         * tree-ssa-forwprop.c: Likewise.
>         * tree-vect-data-refs.c: Likewise.
>         * tree-vect-generic.c: Likewise.
>         * tree-vect-loop.c: Likewise.
>         * tree-vect-slp.c: Likewise.
>         * tree-vect-stmts.c: Likewise.
>         * config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const):
>         Delete.
>         * config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete.
>         * config/aarch64/aarch64.c (aarch64_expand_vec_perm_const)
>         (aarch64_vectorize_vec_perm_const_ok): Fuse into...
>         (aarch64_vectorize_vec_perm_const): ...this new function.
>         (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>         * config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete.
>         * config/arm/vec-common.md (vec_perm_const<mode>): Delete.
>         * config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>         (arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge
>         into...
>         (arm_vectorize_vec_perm_const): ...this new function.  Explicitly
>         check for NEON modes.
>         * config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete.
>         * config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete.
>         * config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment.
>         (ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge
>         into...
>         (ix86_vectorize_vec_perm_const): ...this new function.  Incorporate
>         the old VEC_PERM_CONST conditions.
>         * config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete.
>         * config/ia64/vect.md (vec_perm_const<mode>): Delete.
>         * config/ia64/ia64.c (ia64_expand_vec_perm_const)
>         (ia64_vectorize_vec_perm_const_ok): Merge into...
>         (ia64_vectorize_vec_perm_const): ...this new function.
>         * config/mips/loongson.md (vec_perm_const<mode>): Delete.
>         * config/mips/mips-msa.md (vec_perm_const<mode>): Delete.
>         * config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete.
>         * config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete.
>         * config/mips/mips.c (mips_expand_vec_perm_const)
>         (mips_vectorize_vec_perm_const_ok): Merge into...
>         (mips_vectorize_vec_perm_const): ...this new function.
>         * config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete.
>         * config/powerpcspe/paired.md (vec_perm_constv2sf): Delete.
>         * config/powerpcspe/spe.md (vec_perm_constv2si): Delete.
>         * config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete.
>         * config/powerpcspe/powerpcspe-protos.h (altivec_expand_vec_perm_const)
>         (rs6000_expand_vec_perm_const): Delete.
>         * config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK):
>         Delete.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>         (altivec_expand_vec_perm_const_le): Take each operand individually.
>         Operate on constant selectors rather than rtxes.
>         (altivec_expand_vec_perm_const): Likewise.  Update call to
>         altivec_expand_vec_perm_const_le.
>         (rs6000_expand_vec_perm_const): Delete.
>         (rs6000_vectorize_vec_perm_const_ok): Delete.
>         (rs6000_vectorize_vec_perm_const): New function.
>         (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
>         an element count and rtx array.
>         (rs6000_expand_extract_even): Update call accordingly.
>         (rs6000_expand_interleave): Likewise.
>         * config/rs6000/altivec.md (vec_perm_constv16qi): Delete.
>         * config/rs6000/paired.md (vec_perm_constv2sf): Delete.
>         * config/rs6000/vsx.md (vec_perm_const<mode>): Delete.
>         * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const)
>         (rs6000_expand_vec_perm_const): Delete.
>         * config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>         (altivec_expand_vec_perm_const_le): Take each operand individually.
>         Operate on constant selectors rather than rtxes.
>         (altivec_expand_vec_perm_const): Likewise.  Update call to
>         altivec_expand_vec_perm_const_le.
>         (rs6000_expand_vec_perm_const): Delete.
>         (rs6000_vectorize_vec_perm_const_ok): Delete.
>         (rs6000_vectorize_vec_perm_const): New function.  Remove stray
>         reference to the SPE evmerge intructions.
>         (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
>         an element count and rtx array.
>         (rs6000_expand_extract_even): Update call accordingly.
>         (rs6000_expand_interleave): Likewise.
>         * config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of...
>         * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this
>         new function.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/Makefile.in     2017-12-09 22:47:27.854318082 +0000
> @@ -1584,6 +1584,7 @@ OBJS = \
>         var-tracking.o \
>         varasm.o \
>         varpool.o \
> +       vec-perm-indices.o \
>         vmsdbgout.o \
>         vr-values.o \
>         vtable-verify.o \
> Index: gcc/vec-perm-indices.h
> ===================================================================
> --- /dev/null   2017-12-09 13:59:56.352713187 +0000
> +++ gcc/vec-perm-indices.h      2017-12-09 22:47:27.885318101 +0000
> @@ -0,0 +1,49 @@
> +/* A representation of vector permutation indices.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_VEC_PERN_INDICES_H
> +#define GCC_VEC_PERN_INDICES_H 1
> +
> +/* This class represents a constant permutation vector, such as that used
> +   as the final operand to a VEC_PERM_EXPR.  */
> +class vec_perm_indices : public auto_vec<unsigned short, 32>
> +{
> +  typedef unsigned short element_type;
> +  typedef auto_vec<element_type, 32> parent_type;
> +
> +public:
> +  vec_perm_indices () {}
> +  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
> +
> +  void new_expanded_vector (const vec_perm_indices &, unsigned int);
> +
> +  bool all_in_range_p (element_type, element_type) const;
> +
> +private:
> +  vec_perm_indices (const vec_perm_indices &);
> +};
> +
> +/* Temporary.  */
> +typedef vec_perm_indices vec_perm_builder;
> +typedef vec_perm_indices auto_vec_perm_indices;
> +
> +bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
> +rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
> +
> +#endif
> Index: gcc/vec-perm-indices.c
> ===================================================================
> --- /dev/null   2017-12-09 13:59:56.352713187 +0000
> +++ gcc/vec-perm-indices.c      2017-12-09 22:47:27.885318101 +0000
> @@ -0,0 +1,93 @@
> +/* A representation of vector permutation indices.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "vec-perm-indices.h"
> +#include "tree.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +
> +/* Switch to a new permutation vector that selects the same input elements
> +   as ORIG, but with each element split into FACTOR pieces.  For example,
> +   if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
> +   { 2, 3, 4, 5, 0, 1, 6, 7 }.  */
> +
> +void
> +vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
> +                                      unsigned int factor)
> +{
> +  truncate (0);
> +  reserve (orig.length () * factor);
> +  for (unsigned int i = 0; i < orig.length (); ++i)
> +    {
> +      element_type base = orig[i] * factor;

No check whether this overflows unsigned short?  (not that this is likely)

> +      for (unsigned int j = 0; j < factor; ++j)
> +       quick_push (base + j);
> +    }
> +}
> +
> +/* Return true if all elements of the permutation vector are in the range
> +   [START, START + SIZE).  */
> +
> +bool
> +vec_perm_indices::all_in_range_p (element_type start, element_type size) const
> +{
> +  for (unsigned int i = 0; i < length (); ++i)
> +    if ((*this)[i] < start || ((*this)[i] - start) >= size)
> +      return false;
> +  return true;
> +}
> +
> +/* Try to read the contents of VECTOR_CST CST as a constant permutation
> +   vector.  Return true and add the elements to BUILDER on success,
> +   otherwise return false without modifying BUILDER.  */
> +
> +bool
> +tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
> +{
> +  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
> +  for (unsigned int i = 0; i < nelts; ++i)
> +    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))

So why specifically shwi and not uhwi?  Shouldn't this also somehow
be checked for IN_RANGE of unsigned short aka vec_perm_indices::element_type?

The rest of the changes look ok, please give target maintainers a
chance to review.

Thanks,
Richard.


> +      return false;
> +
> +  builder->reserve (nelts);
> +  for (unsigned int i = 0; i < nelts; ++i)
> +    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
> +                        & (2 * nelts - 1));
> +  return true;
> +}
> +
> +/* Return a CONST_VECTOR of mode MODE that contains the elements of
> +   INDICES.  */
> +
> +rtx
> +vec_perm_indices_to_rtx (machine_mode mode, const vec_perm_indices &indices)
> +{
> +  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> +             && GET_MODE_NUNITS (mode) == indices.length ());
> +  unsigned int nelts = indices.length ();
> +  rtvec v = rtvec_alloc (nelts);
> +  for (unsigned int i = 0; i < nelts; ++i)
> +    RTVEC_ELT (v, i) = gen_int_mode (indices[i], GET_MODE_INNER (mode));
> +  return gen_rtx_CONST_VECTOR (mode, v);
> +}
> Index: gcc/target.h
> ===================================================================
> --- gcc/target.h        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/target.h        2017-12-09 22:47:27.882318099 +0000
> @@ -193,13 +193,7 @@ enum vect_cost_model_location {
>    vect_epilogue = 2
>  };
>
> -/* The type to use for vector permutes with a constant permute vector.
> -   Each entry is an index into the concatenated input vectors.  */
> -typedef vec<unsigned short> vec_perm_indices;
> -
> -/* Same, but can be used to construct local permute vectors that are
> -   automatically freed.  */
> -typedef auto_vec<unsigned short, 32> auto_vec_perm_indices;
> +class vec_perm_indices;
>
>  /* The target structure.  This holds all the backend hooks.  */
>  #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/optabs.h        2017-12-09 22:47:27.882318099 +0000
> @@ -22,6 +22,7 @@ #define GCC_OPTABS_H
>
>  #include "optabs-query.h"
>  #include "optabs-libfuncs.h"
> +#include "vec-perm-indices.h"
>
>  /* Generate code for a widening multiply.  */
>  extern rtx expand_widening_mult (machine_mode, rtx, rtx, rtx, int, optab);
> @@ -307,7 +308,9 @@ extern int have_insn_for (enum rtx_code,
>  extern rtx_insn *gen_cond_trap (enum rtx_code, rtx, rtx, rtx);
>
>  /* Generate code for VEC_PERM_EXPR.  */
> -extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
> +extern rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
> +extern rtx expand_vec_perm_const (machine_mode, rtx, rtx,
> +                                 const vec_perm_builder &, machine_mode, rtx);
>
>  /* Generate code for vector comparison.  */
>  extern rtx expand_vec_cmp_expr (tree, tree, rtx);
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/target.def      2017-12-09 22:47:27.882318099 +0000
> @@ -1841,12 +1841,27 @@ DEFHOOK
>   bool, (const_tree type, bool is_packed),
>   default_builtin_vector_alignment_reachable)
>
> -/* Return true if a vector created for vec_perm_const is valid.
> -   A NULL indicates that all constants are valid permutations.  */
>  DEFHOOK
> -(vec_perm_const_ok,
> - "Return true if a vector created for @code{vec_perm_const} is valid.",
> - bool, (machine_mode, vec_perm_indices),
> +(vec_perm_const,
> + "This hook is used to test whether the target can permute up to two\n\
> +vectors of mode @var{mode} using the permutation vector @code{sel}, and\n\
> +also to emit such a permutation.  In the former case @var{in0}, @var{in1}\n\
> +and @var{out} are all null.  In the latter case @var{in0} and @var{in1} are\n\
> +the source vectors and @var{out} is the destination vector; all three are\n\
> +registers of mode @var{mode}.  @var{in1} is the same as @var{in0} if\n\
> +@var{sel} describes a permutation on one vector instead of two.\n\
> +\n\
> +Return true if the operation is possible, emitting instructions for it\n\
> +if rtxes are provided.\n\
> +\n\
> +@cindex @code{vec_perm@var{m}} instruction pattern\n\
> +If the hook returns false for a mode with multibyte elements, GCC will\n\
> +try the equivalent byte operation.  If that also fails, it will try forcing\n\
> +the selector into a register and using the @var{vec_perm@var{mode}}\n\
> +instruction pattern.  There is no need for the hook to handle these two\n\
> +implementation approaches itself.",
> + bool, (machine_mode mode, rtx output, rtx in0, rtx in1,
> +       const vec_perm_indices &sel),
>   NULL)
>
>  /* Return true if the target supports misaligned store/load of a
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/doc/tm.texi.in  2017-12-09 22:47:27.879318098 +0000
> @@ -4079,7 +4079,7 @@ address;  but often a machine-dependent
>
>  @hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
>
> -@hook TARGET_VECTORIZE_VEC_PERM_CONST_OK
> +@hook TARGET_VECTORIZE_VEC_PERM_CONST
>
>  @hook TARGET_VECTORIZE_BUILTIN_CONVERSION
>
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/doc/tm.texi     2017-12-09 22:47:27.878318097 +0000
> @@ -5798,8 +5798,24 @@ correct for most targets.
>  Return true if vector alignment is reachable (by peeling N iterations) for the given scalar type @var{type}.  @var{is_packed} is false if the scalar access using @var{type} is known to be naturally aligned.
>  @end deftypefn
>
> -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK (machine_mode, @var{vec_perm_indices})
> -Return true if a vector created for @code{vec_perm_const} is valid.
> +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST (machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx @var{in1}, const vec_perm_indices @var{&sel})
> +This hook is used to test whether the target can permute up to two
> +vectors of mode @var{mode} using the permutation vector @code{sel}, and
> +also to emit such a permutation.  In the former case @var{in0}, @var{in1}
> +and @var{out} are all null.  In the latter case @var{in0} and @var{in1} are
> +the source vectors and @var{out} is the destination vector; all three are
> +registers of mode @var{mode}.  @var{in1} is the same as @var{in0} if
> +@var{sel} describes a permutation on one vector instead of two.
> +
> +Return true if the operation is possible, emitting instructions for it
> +if rtxes are provided.
> +
> +@cindex @code{vec_perm@var{m}} instruction pattern
> +If the hook returns false for a mode with multibyte elements, GCC will
> +try the equivalent byte operation.  If that also fails, it will try forcing
> +the selector into a register and using the @var{vec_perm@var{mode}}
> +instruction pattern.  There is no need for the hook to handle these two
> +implementation approaches itself.
>  @end deftypefn
>
>  @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_CONVERSION (unsigned @var{code}, tree @var{dest_type}, tree @var{src_type})
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/optabs.def      2017-12-09 22:47:27.882318099 +0000
> @@ -302,7 +302,6 @@ OPTAB_D (vec_pack_ssat_optab, "vec_pack_
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
>  OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
>  OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
> -OPTAB_D (vec_perm_const_optab, "vec_perm_const$a")
>  OPTAB_D (vec_perm_optab, "vec_perm$a")
>  OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
>  OPTAB_D (vec_set_optab, "vec_set$a")
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/doc/md.texi     2017-12-09 22:47:27.877318096 +0000
> @@ -4972,20 +4972,8 @@ where @var{q} is a vector of @code{QImod
>  the middle-end will lower the mode @var{m} @code{VEC_PERM_EXPR} to
>  mode @var{q}.
>
> -@cindex @code{vec_perm_const@var{m}} instruction pattern
> -@item @samp{vec_perm_const@var{m}}
> -Like @samp{vec_perm} except that the permutation is a compile-time
> -constant.  That is, operand 3, the @dfn{selector}, is a @code{CONST_VECTOR}.
> -
> -Some targets cannot perform a permutation with a variable selector,
> -but can efficiently perform a constant permutation.  Further, the
> -target hook @code{vec_perm_ok} is queried to determine if the
> -specific constant permutation is available efficiently; the named
> -pattern is never expanded without @code{vec_perm_ok} returning true.
> -
> -There is no need for a target to supply both @samp{vec_perm@var{m}}
> -and @samp{vec_perm_const@var{m}} if the former can trivially implement
> -the operation with, say, the vector constant loaded into a register.
> +See also @code{TARGET_VECTORIZER_VEC_PERM_CONST}, which performs
> +the analogous operation for constant selectors.
>
>  @cindex @code{push@var{m}1} instruction pattern
>  @item @samp{push@var{m}1}
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/expr.c  2017-12-09 22:47:27.880318098 +0000
> @@ -9439,28 +9439,24 @@ #define REDUCE_BIT_FIELD(expr)  (reduce_b
>        goto binop;
>
>      case VEC_PERM_EXPR:
> -      expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
> -      op2 = expand_normal (treeop2);
> -
> -      /* Careful here: if the target doesn't support integral vector modes,
> -        a constant selection vector could wind up smooshed into a normal
> -        integral constant.  */
> -      if (CONSTANT_P (op2) && !VECTOR_MODE_P (GET_MODE (op2)))
> -       {
> -         tree sel_type = TREE_TYPE (treeop2);
> -         machine_mode vmode
> -           = mode_for_vector (SCALAR_TYPE_MODE (TREE_TYPE (sel_type)),
> -                              TYPE_VECTOR_SUBPARTS (sel_type)).require ();
> -         gcc_assert (GET_MODE_CLASS (vmode) == MODE_VECTOR_INT);
> -         op2 = simplify_subreg (vmode, op2, TYPE_MODE (sel_type), 0);
> -         gcc_assert (op2 && GET_CODE (op2) == CONST_VECTOR);
> -       }
> -      else
> -        gcc_assert (GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT);
> -
> -      temp = expand_vec_perm (mode, op0, op1, op2, target);
> -      gcc_assert (temp);
> -      return temp;
> +      {
> +       expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
> +       vec_perm_builder sel;
> +       if (TREE_CODE (treeop2) == VECTOR_CST
> +           && tree_to_vec_perm_builder (&sel, treeop2))
> +         {
> +           machine_mode sel_mode = TYPE_MODE (TREE_TYPE (treeop2));
> +           temp = expand_vec_perm_const (mode, op0, op1, sel,
> +                                         sel_mode, target);
> +         }
> +       else
> +         {
> +           op2 = expand_normal (treeop2);
> +           temp = expand_vec_perm_var (mode, op0, op1, op2, target);
> +         }
> +       gcc_assert (temp);
> +       return temp;
> +      }
>
>      case DOT_PROD_EXPR:
>        {
> Index: gcc/optabs-query.h
> ===================================================================
> --- gcc/optabs-query.h  2017-12-09 22:47:21.534314227 +0000
> +++ gcc/optabs-query.h  2017-12-09 22:47:27.881318099 +0000
> @@ -175,6 +175,7 @@ enum insn_code can_float_p (machine_mode
>  enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
>  bool can_conditionally_move_p (machine_mode mode);
>  opt_machine_mode qimode_for_vec_perm (machine_mode);
> +bool selector_fits_mode_p (machine_mode, const vec_perm_indices &);
>  bool can_vec_perm_var_p (machine_mode);
>  bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
>                            bool = true);
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2017-12-09 22:47:25.861316866 +0000
> +++ gcc/optabs-query.c  2017-12-09 22:47:27.881318099 +0000
> @@ -28,6 +28,7 @@ Software Foundation; either version 3, o
>  #include "insn-config.h"
>  #include "rtl.h"
>  #include "recog.h"
> +#include "vec-perm-indices.h"
>
>  struct target_optabs default_target_optabs;
>  struct target_optabs *this_fn_optabs = &default_target_optabs;
> @@ -361,6 +362,17 @@ qimode_for_vec_perm (machine_mode mode)
>    return opt_machine_mode ();
>  }
>
> +/* Return true if selector SEL can be represented in the integer
> +   equivalent of vector mode MODE.  */
> +
> +bool
> +selector_fits_mode_p (machine_mode mode, const vec_perm_indices &sel)
> +{
> +  unsigned HOST_WIDE_INT mask = GET_MODE_MASK (GET_MODE_INNER (mode));
> +  return (mask == HOST_WIDE_INT_M1U
> +         || sel.all_in_range_p (0, mask + 1));
> +}
> +
>  /* Return true if VEC_PERM_EXPRs with variable selector operands can be
>     expanded using SIMD extensions of the CPU.  MODE is the mode of the
>     vectors being permuted.  */
> @@ -416,18 +428,22 @@ can_vec_perm_const_p (machine_mode mode,
>      return false;
>
>    /* It's probably cheaper to test for the variable case first.  */
> -  if (allow_variable_p && can_vec_perm_var_p (mode))
> +  if (allow_variable_p
> +      && selector_fits_mode_p (mode, sel)
> +      && can_vec_perm_var_p (mode))
>      return true;
>
> -  if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing)
> +  if (targetm.vectorize.vec_perm_const != NULL)
>      {
> -      if (targetm.vectorize.vec_perm_const_ok == NULL
> -         || targetm.vectorize.vec_perm_const_ok (mode, sel))
> +      if (targetm.vectorize.vec_perm_const (mode, NULL_RTX, NULL_RTX,
> +                                           NULL_RTX, sel))
>         return true;
>
>        /* ??? For completeness, we ought to check the QImode version of
>          vec_perm_const_optab.  But all users of this implicit lowering
> -        feature implement the variable vec_perm_optab.  */
> +        feature implement the variable vec_perm_optab, and the ia64
> +        port specifically doesn't want us to lower V2SF operations
> +        into integer operations.  */
>      }
>
>    return false;
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-09 22:47:25.861316866 +0000
> +++ gcc/optabs.c        2017-12-09 22:47:27.881318099 +0000
> @@ -5367,25 +5367,23 @@ vector_compare_rtx (machine_mode cmp_mod
>    return gen_rtx_fmt_ee (rcode, cmp_mode, ops[0].value, ops[1].value);
>  }
>
> -/* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
> -   vec_perm operand, assuming the second operand is a constant vector of zeroes.
> -   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
> -   shift.  */
> +/* Check if vec_perm mask SEL is a constant equivalent to a shift of
> +   the first vec_perm operand, assuming the second operand is a constant
> +   vector of zeros.  Return the shift distance in bits if so, or NULL_RTX
> +   if the vec_perm is not a shift.  MODE is the mode of the value being
> +   shifted.  */
>  static rtx
> -shift_amt_for_vec_perm_mask (rtx sel)
> +shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices &sel)
>  {
> -  unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
> -  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
> +  unsigned int i, first, nelt = GET_MODE_NUNITS (mode);
> +  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
>
> -  if (GET_CODE (sel) != CONST_VECTOR)
> -    return NULL_RTX;
> -
> -  first = INTVAL (CONST_VECTOR_ELT (sel, 0));
> +  first = sel[0];
>    if (first >= nelt)
>      return NULL_RTX;
>    for (i = 1; i < nelt; i++)
>      {
> -      int idx = INTVAL (CONST_VECTOR_ELT (sel, i));
> +      int idx = sel[i];
>        unsigned int expected = i + first;
>        /* Indices into the second vector are all equivalent.  */
>        if (idx < 0 || (MIN (nelt, (unsigned) idx) != MIN (nelt, expected)))
> @@ -5395,7 +5393,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>    return GEN_INT (first * bitsize);
>  }
>
> -/* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
> +/* A subroutine of expand_vec_perm_var for expanding one vec_perm insn.  */
>
>  static rtx
>  expand_vec_perm_1 (enum insn_code icode, rtx target,
> @@ -5433,38 +5431,32 @@ expand_vec_perm_1 (enum insn_code icode,
>    return NULL_RTX;
>  }
>
> -static rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
> -
>  /* Implement a permutation of vectors v0 and v1 using the permutation
>     vector in SEL and return the result.  Use TARGET to hold the result
>     if nonnull and convenient.
>
> -   MODE is the mode of the vectors being permuted (V0 and V1).  */
> +   MODE is the mode of the vectors being permuted (V0 and V1).  SEL_MODE
> +   is the TYPE_MODE associated with SEL, or BLKmode if SEL isn't known
> +   to have a particular mode.  */
>
>  rtx
> -expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
> +expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
> +                      const vec_perm_builder &sel, machine_mode sel_mode,
> +                      rtx target)
>  {
> -  enum insn_code icode;
> -  machine_mode qimode;
> -  unsigned int i, w, e, u;
> -  rtx tmp, sel_qi = NULL;
> -  rtvec vec;
> -
> -  if (GET_CODE (sel) != CONST_VECTOR)
> -    return expand_vec_perm_var (mode, v0, v1, sel, target);
> -
> -  if (!target || GET_MODE (target) != mode)
> +  if (!target || !register_operand (target, mode))
>      target = gen_reg_rtx (mode);
>
> -  w = GET_MODE_SIZE (mode);
> -  e = GET_MODE_NUNITS (mode);
> -  u = GET_MODE_UNIT_SIZE (mode);
> -
>    /* Set QIMODE to a different vector mode with byte elements.
>       If no such mode, or if MODE already has byte elements, use VOIDmode.  */
> +  machine_mode qimode;
>    if (!qimode_for_vec_perm (mode).exists (&qimode))
>      qimode = VOIDmode;
>
> +  rtx_insn *last = get_last_insn ();
> +
> +  bool single_arg_p = rtx_equal_p (v0, v1);
> +
>    /* See if this can be handled with a vec_shr.  We only do this if the
>       second vector is all zeroes.  */
>    insn_code shift_code = optab_handler (vec_shr_optab, mode);
> @@ -5476,7 +5468,7 @@ expand_vec_perm (machine_mode mode, rtx
>        && (shift_code != CODE_FOR_nothing
>           || shift_code_qi != CODE_FOR_nothing))
>      {
> -      rtx shift_amt = shift_amt_for_vec_perm_mask (sel);
> +      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>        if (shift_amt)
>         {
>           struct expand_operand ops[3];
> @@ -5500,65 +5492,81 @@ expand_vec_perm (machine_mode mode, rtx
>         }
>      }
>
> -  icode = direct_optab_handler (vec_perm_const_optab, mode);
> -  if (icode != CODE_FOR_nothing)
> +  if (targetm.vectorize.vec_perm_const != NULL)
>      {
> -      tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
> -      if (tmp)
> -       return tmp;
> +      v0 = force_reg (mode, v0);
> +      if (single_arg_p)
> +       v1 = v0;
> +      else
> +       v1 = force_reg (mode, v1);
> +
> +      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
> +       return target;
>      }
>
>    /* Fall back to a constant byte-based permutation.  */
> +  vec_perm_indices qimode_indices;
> +  rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
>    if (qimode != VOIDmode)
>      {
> -      vec = rtvec_alloc (w);
> -      for (i = 0; i < e; ++i)
> -       {
> -         unsigned int j, this_e;
> +      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
> +      target_qi = gen_reg_rtx (qimode);
> +      v0_qi = gen_lowpart (qimode, v0);
> +      v1_qi = gen_lowpart (qimode, v1);
> +      if (targetm.vectorize.vec_perm_const != NULL
> +         && targetm.vectorize.vec_perm_const (qimode, target_qi, v0_qi,
> +                                              v1_qi, qimode_indices))
> +       return gen_lowpart (mode, target_qi);
> +    }
>
> -         this_e = INTVAL (CONST_VECTOR_ELT (sel, i));
> -         this_e &= 2 * e - 1;
> -         this_e *= u;
> +  /* Otherwise expand as a fully variable permuation.  */
>
> -         for (j = 0; j < u; ++j)
> -           RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j);
> -       }
> -      sel_qi = gen_rtx_CONST_VECTOR (qimode, vec);
> +  /* The optabs are only defined for selectors with the same width
> +     as the values being permuted.  */
> +  machine_mode required_sel_mode;
> +  if (!mode_for_int_vector (mode).exists (&required_sel_mode)
> +      || !VECTOR_MODE_P (required_sel_mode))
> +    {
> +      delete_insns_since (last);
> +      return NULL_RTX;
> +    }
>
> -      icode = direct_optab_handler (vec_perm_const_optab, qimode);
> -      if (icode != CODE_FOR_nothing)
> +  /* We know that it is semantically valid to treat SEL as having SEL_MODE.
> +     If that isn't the mode we want then we need to prove that using
> +     REQUIRED_SEL_MODE is OK.  */
> +  if (sel_mode != required_sel_mode)
> +    {
> +      if (!selector_fits_mode_p (required_sel_mode, sel))
>         {
> -         tmp = gen_reg_rtx (qimode);
> -         tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
> -                                  gen_lowpart (qimode, v1), sel_qi);
> -         if (tmp)
> -           return gen_lowpart (mode, tmp);
> +         delete_insns_since (last);
> +         return NULL_RTX;
>         }
> +      sel_mode = required_sel_mode;
>      }
>
> -  /* Otherwise expand as a fully variable permuation.  */
> -
> -  icode = direct_optab_handler (vec_perm_optab, mode);
> +  insn_code icode = direct_optab_handler (vec_perm_optab, mode);
>    if (icode != CODE_FOR_nothing)
>      {
> -      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
> +      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
> +      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
>        if (tmp)
>         return tmp;
>      }
>
> -  if (qimode != VOIDmode)
> +  if (qimode != VOIDmode
> +      && selector_fits_mode_p (qimode, qimode_indices))
>      {
>        icode = direct_optab_handler (vec_perm_optab, qimode);
>        if (icode != CODE_FOR_nothing)
>         {
> -         rtx tmp = gen_reg_rtx (qimode);
> -         tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
> -                                  gen_lowpart (qimode, v1), sel_qi);
> +         rtx sel_qi = vec_perm_indices_to_rtx (qimode, qimode_indices);
> +         rtx tmp = expand_vec_perm_1 (icode, target_qi, v0_qi, v1_qi, sel_qi);
>           if (tmp)
>             return gen_lowpart (mode, tmp);
>         }
>      }
>
> +  delete_insns_since (last);
>    return NULL_RTX;
>  }
>
> @@ -5570,7 +5578,7 @@ expand_vec_perm (machine_mode mode, rtx
>     SEL must have the integer equivalent of MODE and is known to be
>     unsuitable for permutes with a constant permutation vector.  */
>
> -static rtx
> +rtx
>  expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
>  {
>    enum insn_code icode;
> @@ -5613,17 +5621,16 @@ expand_vec_perm_var (machine_mode mode,
>    gcc_assert (sel != NULL);
>
>    /* Broadcast the low byte each element into each of its bytes.  */
> -  vec = rtvec_alloc (w);
> +  vec_perm_builder const_sel (w);
>    for (i = 0; i < w; ++i)
>      {
>        int this_e = i / u * u;
>        if (BYTES_BIG_ENDIAN)
>         this_e += u - 1;
> -      RTVEC_ELT (vec, i) = GEN_INT (this_e);
> +      const_sel.quick_push (this_e);
>      }
> -  tmp = gen_rtx_CONST_VECTOR (qimode, vec);
>    sel = gen_lowpart (qimode, sel);
> -  sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
> +  sel = expand_vec_perm_const (qimode, sel, sel, const_sel, qimode, NULL);
>    gcc_assert (sel != NULL);
>
>    /* Add the byte offset to each byte element.  */
> @@ -5797,9 +5804,8 @@ expand_mult_highpart (machine_mode mode,
>    enum insn_code icode;
>    int method, i, nunits;
>    machine_mode wmode;
> -  rtx m1, m2, perm;
> +  rtx m1, m2;
>    optab tab1, tab2;
> -  rtvec v;
>
>    method = can_mult_highpart_p (mode, uns_p);
>    switch (method)
> @@ -5842,21 +5848,20 @@ expand_mult_highpart (machine_mode mode,
>    expand_insn (optab_handler (tab2, mode), 3, eops);
>    m2 = gen_lowpart (mode, eops[0].value);
>
> -  v = rtvec_alloc (nunits);
> +  auto_vec_perm_indices sel (nunits);
>    if (method == 2)
>      {
>        for (i = 0; i < nunits; ++i)
> -       RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
> -                                   + ((i & 1) ? nunits : 0));
> -      perm = gen_rtx_CONST_VECTOR (mode, v);
> +       sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1)
> +                       + ((i & 1) ? nunits : 0));
>      }
>    else
>      {
> -      int base = BYTES_BIG_ENDIAN ? 0 : 1;
> -      perm = gen_const_vec_series (mode, GEN_INT (base), GEN_INT (2));
> +      for (i = 0; i < nunits; ++i)
> +       sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
>      }
>
> -  return expand_vec_perm (mode, m1, m2, perm, target);
> +  return expand_vec_perm_const (mode, m1, m2, sel, BLKmode, target);
>  }
>
>  /* Helper function to find the MODE_CC set in a sync_compare_and_swap
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-12-09 22:47:21.534314227 +0000
> +++ gcc/fold-const.c    2017-12-09 22:47:27.881318099 +0000
> @@ -82,6 +82,7 @@ Software Foundation; either version 3, o
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>  /* Nonzero if we are folding constants inside an initializer; zero
>     otherwise.  */
> Index: gcc/tree-ssa-forwprop.c
> ===================================================================
> --- gcc/tree-ssa-forwprop.c     2017-12-09 22:47:21.534314227 +0000
> +++ gcc/tree-ssa-forwprop.c     2017-12-09 22:47:27.883318100 +0000
> @@ -47,6 +47,7 @@ the Free Software Foundation; either ver
>  #include "cfganal.h"
>  #include "optabs-tree.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>  /* This pass propagates the RHS of assignment statements into use
>     sites of the LHS of the assignment.  It's basically a specialized
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2017-12-09 22:47:21.535314227 +0000
> +++ gcc/tree-vect-data-refs.c   2017-12-09 22:47:27.883318100 +0000
> @@ -52,6 +52,7 @@ Software Foundation; either version 3, o
>  #include "params.h"
>  #include "tree-cfg.h"
>  #include "tree-hash-traits.h"
> +#include "vec-perm-indices.h"
>
>  /* Return true if load- or store-lanes optab OPTAB is implemented for
>     COUNT vectors of type VECTYPE.  NAME is the name of OPTAB.  */
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-12-09 22:47:21.535314227 +0000
> +++ gcc/tree-vect-generic.c     2017-12-09 22:47:27.883318100 +0000
> @@ -38,6 +38,7 @@ Free Software Foundation; either version
>  #include "gimplify.h"
>  #include "tree-cfg.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>
>  static void expand_vector_operations_1 (gimple_stmt_iterator *);
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2017-12-09 22:47:21.536314228 +0000
> +++ gcc/tree-vect-loop.c        2017-12-09 22:47:27.884318101 +0000
> @@ -52,6 +52,7 @@ Software Foundation; either version 3, o
>  #include "tree-if-conv.h"
>  #include "internal-fn.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>  /* Loop Vectorization Pass.
>
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2017-12-09 22:47:21.536314228 +0000
> +++ gcc/tree-vect-slp.c 2017-12-09 22:47:27.884318101 +0000
> @@ -42,6 +42,7 @@ Software Foundation; either version 3, o
>  #include "gimple-walk.h"
>  #include "dbgcnt.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>
>  /* Recursively free the memory allocated for the SLP tree rooted at NODE.  */
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-12-09 22:47:21.537314229 +0000
> +++ gcc/tree-vect-stmts.c       2017-12-09 22:47:27.885318101 +0000
> @@ -49,6 +49,7 @@ Software Foundation; either version 3, o
>  #include "builtins.h"
>  #include "internal-fn.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> Index: gcc/config/aarch64/aarch64-protos.h
> ===================================================================
> --- gcc/config/aarch64/aarch64-protos.h 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/aarch64/aarch64-protos.h 2017-12-09 22:47:27.854318082 +0000
> @@ -474,8 +474,6 @@ extern void aarch64_split_combinev16qi (
>  extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
>  extern bool aarch64_madd_needs_nop (rtx_insn *);
>  extern void aarch64_final_prescan_insn (rtx_insn *);
> -extern bool
> -aarch64_expand_vec_perm_const (rtx, rtx, rtx, rtx, unsigned int);
>  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
>  int aarch64_ccmp_mode_to_code (machine_mode mode);
>
> Index: gcc/config/aarch64/aarch64-simd.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-simd.md  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/aarch64/aarch64-simd.md  2017-12-09 22:47:27.854318082 +0000
> @@ -5348,20 +5348,6 @@ (define_expand "aarch64_get_qreg<VSTRUCT
>
>  ;; vec_perm support
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VALL_F16 0 "register_operand")
> -   (match_operand:VALL_F16 1 "register_operand")
> -   (match_operand:VALL_F16 2 "register_operand")
> -   (match_operand:<V_INT_EQUIV> 3)]
> -  "TARGET_SIMD"
> -{
> -  if (aarch64_expand_vec_perm_const (operands[0], operands[1],
> -                                    operands[2], operands[3], <nunits>))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "vec_perm<mode>"
>    [(match_operand:VB 0 "register_operand")
>     (match_operand:VB 1 "register_operand")
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/aarch64/aarch64.c        2017-12-09 22:47:27.856318084 +0000
> @@ -141,8 +141,6 @@ static void aarch64_elf_asm_constructor
>  static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
>  static void aarch64_override_options_after_change (void);
>  static bool aarch64_vector_mode_supported_p (machine_mode);
> -static bool aarch64_vectorize_vec_perm_const_ok (machine_mode,
> -                                                vec_perm_indices);
>  static int aarch64_address_cost (rtx, machine_mode, addr_space_t, bool);
>  static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
>                                                          const_tree type,
> @@ -13626,29 +13624,27 @@ aarch64_expand_vec_perm_const_1 (struct
>    return false;
>  }
>
> -/* Expand a vec_perm_const pattern with the operands given by TARGET,
> -   OP0, OP1 and SEL.  NELT is the number of elements in the vector.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
> -bool
> -aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel,
> -                              unsigned int nelt)
> +static bool
> +aarch64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                                 rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    unsigned int i, which;
>
> +  d.vmode = vmode;
>    d.target = target;
>    d.op0 = op0;
>    d.op1 = op1;
> +  d.testing_p = !target;
>
> -  d.vmode = GET_MODE (target);
> -  gcc_assert (VECTOR_MODE_P (d.vmode));
> -  d.testing_p = false;
> -
> +  /* Calculate whether all elements are in one vector.  */
> +  unsigned int nelt = sel.length ();
>    d.perm.reserve (nelt);
>    for (i = which = 0; i < nelt; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      unsigned int ei = INTVAL (e) & (2 * nelt - 1);
> +      unsigned int ei = sel[i] & (2 * nelt - 1);
>        which |= (ei < nelt ? 1 : 2);
>        d.perm.quick_push (ei);
>      }
> @@ -13660,7 +13656,7 @@ aarch64_expand_vec_perm_const (rtx targe
>
>      case 3:
>        d.one_vector_p = false;
> -      if (!rtx_equal_p (op0, op1))
> +      if (d.testing_p || !rtx_equal_p (op0, op1))
>         break;
>
>        /* The elements of PERM do not suggest that only the first operand
> @@ -13681,37 +13677,8 @@ aarch64_expand_vec_perm_const (rtx targe
>        break;
>      }
>
> -  return aarch64_expand_vec_perm_const_1 (&d);
> -}
> -
> -static bool
> -aarch64_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.testing_p = true;
> -  d.perm.safe_splice (sel);
> -
> -  /* Calculate whether all elements are in one vector.  */
> -  nelt = sel.length ();
> -  for (i = which = 0; i < nelt; ++i)
> -    {
> -      unsigned int e = d.perm[i];
> -      gcc_assert (e < 2 * nelt);
> -      which |= (e < nelt ? 1 : 2);
> -    }
> -
> -  /* If all elements are from the second vector, reindex as if from the
> -     first vector.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> -
> -  /* Check whether the mask can be applied to a single vector.  */
> -  d.one_vector_p = (which != 3);
> +  if (!d.testing_p)
> +    return aarch64_expand_vec_perm_const_1 (&d);
>
>    d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>    d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> @@ -13719,7 +13686,7 @@ aarch64_vectorize_vec_perm_const_ok (mac
>      d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>
>    start_sequence ();
> -  ret = aarch64_expand_vec_perm_const_1 (&d);
> +  bool ret = aarch64_expand_vec_perm_const_1 (&d);
>    end_sequence ();
>
>    return ret;
> @@ -15471,9 +15438,9 @@ #define TARGET_VECTORIZE_VECTOR_ALIGNMEN
>
>  /* vec_perm support.  */
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
> -  aarch64_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST \
> +  aarch64_vectorize_vec_perm_const
>
>  #undef TARGET_INIT_LIBFUNCS
>  #define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs
> Index: gcc/config/arm/arm-protos.h
> ===================================================================
> --- gcc/config/arm/arm-protos.h 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/arm/arm-protos.h 2017-12-09 22:47:27.856318084 +0000
> @@ -357,7 +357,6 @@ extern bool arm_validize_comparison (rtx
>
>  extern bool arm_gen_setmem (rtx *);
>  extern void arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
> -extern bool arm_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
>
>  extern bool arm_autoinc_modes_ok_p (machine_mode, enum arm_auto_incmodes);
>
> Index: gcc/config/arm/vec-common.md
> ===================================================================
> --- gcc/config/arm/vec-common.md        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/arm/vec-common.md        2017-12-09 22:47:27.858318085 +0000
> @@ -109,35 +109,6 @@ (define_expand "umax<mode>3"
>  {
>  })
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VALL 0 "s_register_operand" "")
> -   (match_operand:VALL 1 "s_register_operand" "")
> -   (match_operand:VALL 2 "s_register_operand" "")
> -   (match_operand:<V_cmp_result> 3 "" "")]
> -  "TARGET_NEON
> -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
> -{
> -  if (arm_expand_vec_perm_const (operands[0], operands[1],
> -                                operands[2], operands[3]))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VH 0 "s_register_operand")
> -   (match_operand:VH 1 "s_register_operand")
> -   (match_operand:VH 2 "s_register_operand")
> -   (match_operand:<V_cmp_result> 3)]
> -  "TARGET_NEON"
> -{
> -  if (arm_expand_vec_perm_const (operands[0], operands[1],
> -                                operands[2], operands[3]))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "vec_perm<mode>"
>    [(match_operand:VE 0 "s_register_operand" "")
>     (match_operand:VE 1 "s_register_operand" "")
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/arm/arm.c        2017-12-09 22:47:27.858318085 +0000
> @@ -288,7 +288,8 @@ static int arm_cortex_a5_branch_cost (bo
>  static int arm_cortex_m_branch_cost (bool, bool);
>  static int arm_cortex_m7_branch_cost (bool, bool);
>
> -static bool arm_vectorize_vec_perm_const_ok (machine_mode, vec_perm_indices);
> +static bool arm_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
> +                                         const vec_perm_indices &);
>
>  static bool aarch_macro_fusion_pair_p (rtx_insn*, rtx_insn*);
>
> @@ -734,9 +735,8 @@ #define TARGET_VECTORIZE_SUPPORT_VECTOR_
>  #define TARGET_PREFERRED_RENAME_CLASS \
>    arm_preferred_rename_class
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
> -  arm_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST arm_vectorize_vec_perm_const
>
>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
> @@ -29381,28 +29381,31 @@ arm_expand_vec_perm_const_1 (struct expa
>    return false;
>  }
>
> -/* Expand a vec_perm_const pattern.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
> -bool
> -arm_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel)
> +static bool
> +arm_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0, rtx op1,
> +                             const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    int i, nelt, which;
>
> +  if (!VALID_NEON_DREG_MODE (vmode) && !VALID_NEON_QREG_MODE (vmode))
> +    return false;
> +
>    d.target = target;
>    d.op0 = op0;
>    d.op1 = op1;
>
> -  d.vmode = GET_MODE (target);
> +  d.vmode = vmode;
>    gcc_assert (VECTOR_MODE_P (d.vmode));
> -  d.testing_p = false;
> +  d.testing_p = !target;
>
>    nelt = GET_MODE_NUNITS (d.vmode);
>    d.perm.reserve (nelt);
>    for (i = which = 0; i < nelt; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      int ei = INTVAL (e) & (2 * nelt - 1);
> +      int ei = sel[i] & (2 * nelt - 1);
>        which |= (ei < nelt ? 1 : 2);
>        d.perm.quick_push (ei);
>      }
> @@ -29414,7 +29417,7 @@ arm_expand_vec_perm_const (rtx target, r
>
>      case 3:
>        d.one_vector_p = false;
> -      if (!rtx_equal_p (op0, op1))
> +      if (d.testing_p || !rtx_equal_p (op0, op1))
>         break;
>
>        /* The elements of PERM do not suggest that only the first operand
> @@ -29435,38 +29438,8 @@ arm_expand_vec_perm_const (rtx target, r
>        break;
>      }
>
> -  return arm_expand_vec_perm_const_1 (&d);
> -}
> -
> -/* Implement TARGET_VECTORIZE_VEC_PERM_CONST_OK.  */
> -
> -static bool
> -arm_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.testing_p = true;
> -  d.perm.safe_splice (sel);
> -
> -  /* Categorize the set of elements in the selector.  */
> -  nelt = GET_MODE_NUNITS (d.vmode);
> -  for (i = which = 0; i < nelt; ++i)
> -    {
> -      unsigned int e = d.perm[i];
> -      gcc_assert (e < 2 * nelt);
> -      which |= (e < nelt ? 1 : 2);
> -    }
> -
> -  /* For all elements from second vector, fold the elements to first.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> -
> -  /* Check whether the mask can be applied to the vector type.  */
> -  d.one_vector_p = (which != 3);
> +  if (d.testing_p)
> +    return arm_expand_vec_perm_const_1 (&d);
>
>    d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>    d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> @@ -29474,7 +29447,7 @@ arm_vectorize_vec_perm_const_ok (machine
>      d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>
>    start_sequence ();
> -  ret = arm_expand_vec_perm_const_1 (&d);
> +  bool ret = arm_expand_vec_perm_const_1 (&d);
>    end_sequence ();
>
>    return ret;
> Index: gcc/config/i386/i386-protos.h
> ===================================================================
> --- gcc/config/i386/i386-protos.h       2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/i386/i386-protos.h       2017-12-09 22:47:27.859318085 +0000
> @@ -133,7 +133,6 @@ extern bool ix86_expand_fp_movcc (rtx[])
>  extern bool ix86_expand_fp_vcond (rtx[]);
>  extern bool ix86_expand_int_vcond (rtx[]);
>  extern void ix86_expand_vec_perm (rtx[]);
> -extern bool ix86_expand_vec_perm_const (rtx[]);
>  extern bool ix86_expand_mask_vec_cmp (rtx[]);
>  extern bool ix86_expand_int_vec_cmp (rtx[]);
>  extern bool ix86_expand_fp_vec_cmp (rtx[]);
> Index: gcc/config/i386/sse.md
> ===================================================================
> --- gcc/config/i386/sse.md      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/i386/sse.md      2017-12-09 22:47:27.863318088 +0000
> @@ -11476,30 +11476,6 @@ (define_expand "vec_perm<mode>"
>    DONE;
>  })
>
> -(define_mode_iterator VEC_PERM_CONST
> -  [(V4SF "TARGET_SSE") (V4SI "TARGET_SSE")
> -   (V2DF "TARGET_SSE") (V2DI "TARGET_SSE")
> -   (V16QI "TARGET_SSE2") (V8HI "TARGET_SSE2")
> -   (V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
> -   (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")
> -   (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2")
> -   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
> -   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
> -   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
> -
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VEC_PERM_CONST 0 "register_operand")
> -   (match_operand:VEC_PERM_CONST 1 "register_operand")
> -   (match_operand:VEC_PERM_CONST 2 "register_operand")
> -   (match_operand:<sseintvecmode> 3)]
> -  ""
> -{
> -  if (ix86_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ;;
>  ;; Parallel bitwise logical operations
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/i386/i386.c      2017-12-09 22:47:27.862318087 +0000
> @@ -47588,9 +47588,8 @@ expand_vec_perm_vpshufb4_vpermq2 (struct
>    return true;
>  }
>
> -/* The guts of ix86_expand_vec_perm_const, also used by the ok hook.
> -   With all of the interface bits taken care of, perform the expansion
> -   in D and return true on success.  */
> +/* The guts of ix86_vectorize_vec_perm_const.  With all of the interface bits
> +   taken care of, perform the expansion in D and return true on success.  */
>
>  static bool
>  ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
> @@ -47725,69 +47724,29 @@ canonicalize_perm (struct expand_vec_per
>    return (which == 3);
>  }
>
> -bool
> -ix86_expand_vec_perm_const (rtx operands[4])
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
> +
> +static bool
> +ix86_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                              rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    unsigned char perm[MAX_VECT_LEN];
> -  int i, nelt;
> +  unsigned int i, nelt, which;
>    bool two_args;
> -  rtx sel;
>
> -  d.target = operands[0];
> -  d.op0 = operands[1];
> -  d.op1 = operands[2];
> -  sel = operands[3];
> +  d.target = target;
> +  d.op0 = op0;
> +  d.op1 = op1;
>
> -  d.vmode = GET_MODE (d.target);
> +  d.vmode = vmode;
>    gcc_assert (VECTOR_MODE_P (d.vmode));
>    d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = false;
> +  d.testing_p = !target;
>
> -  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
> -  gcc_assert (XVECLEN (sel, 0) == nelt);
> +  gcc_assert (sel.length () == nelt);
>    gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
>
> -  for (i = 0; i < nelt; ++i)
> -    {
> -      rtx e = XVECEXP (sel, 0, i);
> -      int ei = INTVAL (e) & (2 * nelt - 1);
> -      d.perm[i] = ei;
> -      perm[i] = ei;
> -    }
> -
> -  two_args = canonicalize_perm (&d);
> -
> -  if (ix86_expand_vec_perm_const_1 (&d))
> -    return true;
> -
> -  /* If the selector says both arguments are needed, but the operands are the
> -     same, the above tried to expand with one_operand_p and flattened selector.
> -     If that didn't work, retry without one_operand_p; we succeeded with that
> -     during testing.  */
> -  if (two_args && d.one_operand_p)
> -    {
> -      d.one_operand_p = false;
> -      memcpy (d.perm, perm, sizeof (perm));
> -      return ix86_expand_vec_perm_const_1 (&d);
> -    }
> -
> -  return false;
> -}
> -
> -/* Implement targetm.vectorize.vec_perm_const_ok.  */
> -
> -static bool
> -ix86_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = true;
> -
>    /* Given sufficient ISA support we can just return true here
>       for selected vector modes.  */
>    switch (d.vmode)
> @@ -47796,17 +47755,23 @@ ix86_vectorize_vec_perm_const_ok (machin
>      case E_V16SImode:
>      case E_V8DImode:
>      case E_V8DFmode:
> -      if (TARGET_AVX512F)
> -       /* All implementable with a single vperm[it]2 insn.  */
> +      if (!TARGET_AVX512F)
> +       return false;
> +      /* All implementable with a single vperm[it]2 insn.  */
> +      if (d.testing_p)
>         return true;
>        break;
>      case E_V32HImode:
> -      if (TARGET_AVX512BW)
> +      if (!TARGET_AVX512BW)
> +       return false;
> +      if (d.testing_p)
>         /* All implementable with a single vperm[it]2 insn.  */
>         return true;
>        break;
>      case E_V64QImode:
> -      if (TARGET_AVX512BW)
> +      if (!TARGET_AVX512BW)
> +       return false;
> +      if (d.testing_p)
>         /* Implementable with 2 vperm[it]2, 2 vpshufb and 1 or insn.  */
>         return true;
>        break;
> @@ -47814,73 +47779,108 @@ ix86_vectorize_vec_perm_const_ok (machin
>      case E_V8SFmode:
>      case E_V4DFmode:
>      case E_V4DImode:
> -      if (TARGET_AVX512VL)
> +      if (!TARGET_AVX)
> +       return false;
> +      if (d.testing_p && TARGET_AVX512VL)
>         /* All implementable with a single vperm[it]2 insn.  */
>         return true;
>        break;
>      case E_V16HImode:
> -      if (TARGET_AVX2)
> +      if (!TARGET_SSE2)
> +       return false;
> +      if (d.testing_p && TARGET_AVX2)
>         /* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns.  */
>         return true;
>        break;
>      case E_V32QImode:
> -      if (TARGET_AVX2)
> +      if (!TARGET_SSE2)
> +       return false;
> +      if (d.testing_p && TARGET_AVX2)
>         /* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns.  */
>         return true;
>        break;
> -    case E_V4SImode:
> -    case E_V4SFmode:
>      case E_V8HImode:
>      case E_V16QImode:
> +      if (!TARGET_SSE2)
> +       return false;
> +      /* Fall through.  */
> +    case E_V4SImode:
> +    case E_V4SFmode:
> +      if (!TARGET_SSE)
> +       return false;
>        /* All implementable with a single vpperm insn.  */
> -      if (TARGET_XOP)
> +      if (d.testing_p && TARGET_XOP)
>         return true;
>        /* All implementable with 2 pshufb + 1 ior.  */
> -      if (TARGET_SSSE3)
> +      if (d.testing_p && TARGET_SSSE3)
>         return true;
>        break;
>      case E_V2DImode:
>      case E_V2DFmode:
> +      if (!TARGET_SSE)
> +       return false;
>        /* All implementable with shufpd or unpck[lh]pd.  */
> -      return true;
> +      if (d.testing_p)
> +       return true;
> +      break;
>      default:
>        return false;
>      }
>
> -  /* Extract the values from the vector CST into the permutation
> -     array in D.  */
>    for (i = which = 0; i < nelt; ++i)
>      {
>        unsigned char e = sel[i];
>        gcc_assert (e < 2 * nelt);
>        d.perm[i] = e;
> +      perm[i] = e;
>        which |= (e < nelt ? 1 : 2);
>      }
>
> -  /* For all elements from second vector, fold the elements to first.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> +  if (d.testing_p)
> +    {
> +      /* For all elements from second vector, fold the elements to first.  */
> +      if (which == 2)
> +       for (i = 0; i < nelt; ++i)
> +         d.perm[i] -= nelt;
> +
> +      /* Check whether the mask can be applied to the vector type.  */
> +      d.one_operand_p = (which != 3);
> +
> +      /* Implementable with shufps or pshufd.  */
> +      if (d.one_operand_p && (d.vmode == V4SFmode || d.vmode == V4SImode))
> +       return true;
> +
> +      /* Otherwise we have to go through the motions and see if we can
> +        figure out how to generate the requested permutation.  */
> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> +      if (!d.one_operand_p)
> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> +
> +      start_sequence ();
> +      bool ret = ix86_expand_vec_perm_const_1 (&d);
> +      end_sequence ();
>
> -  /* Check whether the mask can be applied to the vector type.  */
> -  d.one_operand_p = (which != 3);
> +      return ret;
> +    }
>
> -  /* Implementable with shufps or pshufd.  */
> -  if (d.one_operand_p && (d.vmode == V4SFmode || d.vmode == V4SImode))
> +  two_args = canonicalize_perm (&d);
> +
> +  if (ix86_expand_vec_perm_const_1 (&d))
>      return true;
>
> -  /* Otherwise we have to go through the motions and see if we can
> -     figure out how to generate the requested permutation.  */
> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> -  if (!d.one_operand_p)
> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> -
> -  start_sequence ();
> -  ret = ix86_expand_vec_perm_const_1 (&d);
> -  end_sequence ();
> +  /* If the selector says both arguments are needed, but the operands are the
> +     same, the above tried to expand with one_operand_p and flattened selector.
> +     If that didn't work, retry without one_operand_p; we succeeded with that
> +     during testing.  */
> +  if (two_args && d.one_operand_p)
> +    {
> +      d.one_operand_p = false;
> +      memcpy (d.perm, perm, sizeof (perm));
> +      return ix86_expand_vec_perm_const_1 (&d);
> +    }
>
> -  return ret;
> +  return false;
>  }
>
>  void
> @@ -50532,9 +50532,8 @@ #define TARGET_CLASS_LIKELY_SPILLED_P ix
>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
>    ix86_builtin_vectorization_cost
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
> -  ix86_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST ix86_vectorize_vec_perm_const
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
>    ix86_preferred_simd_mode
> Index: gcc/config/ia64/ia64-protos.h
> ===================================================================
> --- gcc/config/ia64/ia64-protos.h       2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/ia64/ia64-protos.h       2017-12-09 22:47:27.864318089 +0000
> @@ -62,7 +62,6 @@ extern const char *get_bundle_name (int)
>  extern const char *output_probe_stack_range (rtx, rtx);
>
>  extern void ia64_expand_vec_perm_even_odd (rtx, rtx, rtx, int);
> -extern bool ia64_expand_vec_perm_const (rtx op[4]);
>  extern void ia64_expand_vec_setv2sf (rtx op[3]);
>  #endif /* RTX_CODE */
>
> Index: gcc/config/ia64/vect.md
> ===================================================================
> --- gcc/config/ia64/vect.md     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/ia64/vect.md     2017-12-09 22:47:27.865318089 +0000
> @@ -1549,19 +1549,6 @@ (define_expand "vec_pack_trunc_v2si"
>    DONE;
>  })
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VEC 0 "register_operand" "")
> -   (match_operand:VEC 1 "register_operand" "")
> -   (match_operand:VEC 2 "register_operand" "")
> -   (match_operand:<vecint> 3 "" "")]
> -  ""
> -{
> -  if (ia64_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;; Missing operations
>  ;; fprcpa
>  ;; fpsqrta
> Index: gcc/config/ia64/ia64.c
> ===================================================================
> --- gcc/config/ia64/ia64.c      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/ia64/ia64.c      2017-12-09 22:47:27.864318089 +0000
> @@ -333,7 +333,8 @@ static fixed_size_mode ia64_get_reg_raw_
>  static section * ia64_hpux_function_section (tree, enum node_frequency,
>                                              bool, bool);
>
> -static bool ia64_vectorize_vec_perm_const_ok (machine_mode, vec_perm_indices);
> +static bool ia64_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
> +                                          const vec_perm_indices &);
>
>  static unsigned int ia64_hard_regno_nregs (unsigned int, machine_mode);
>  static bool ia64_hard_regno_mode_ok (unsigned int, machine_mode);
> @@ -652,8 +653,8 @@ #define TARGET_DELAY_SCHED2 true
>  #undef TARGET_DELAY_VARTRACK
>  #define TARGET_DELAY_VARTRACK true
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK ia64_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST ia64_vectorize_vec_perm_const
>
>  #undef TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P
>  #define TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P ia64_attribute_takes_identifier_p
> @@ -11741,32 +11742,31 @@ ia64_expand_vec_perm_const_1 (struct exp
>    return false;
>  }
>
> -bool
> -ia64_expand_vec_perm_const (rtx operands[4])
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
> +
> +static bool
> +ia64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                              rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    unsigned char perm[MAX_VECT_LEN];
> -  int i, nelt, which;
> -  rtx sel;
> +  unsigned int i, nelt, which;
>
> -  d.target = operands[0];
> -  d.op0 = operands[1];
> -  d.op1 = operands[2];
> -  sel = operands[3];
> +  d.target = target;
> +  d.op0 = op0;
> +  d.op1 = op1;
>
> -  d.vmode = GET_MODE (d.target);
> +  d.vmode = vmode;
>    gcc_assert (VECTOR_MODE_P (d.vmode));
>    d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = false;
> +  d.testing_p = !target;
>
> -  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
> -  gcc_assert (XVECLEN (sel, 0) == nelt);
> +  gcc_assert (sel.length () == nelt);
>    gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
>
>    for (i = which = 0; i < nelt; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      int ei = INTVAL (e) & (2 * nelt - 1);
> +      unsigned int ei = sel[i] & (2 * nelt - 1);
>
>        which |= (ei < nelt ? 1 : 2);
>        d.perm[i] = ei;
> @@ -11779,7 +11779,7 @@ ia64_expand_vec_perm_const (rtx operands
>        gcc_unreachable();
>
>      case 3:
> -      if (!rtx_equal_p (d.op0, d.op1))
> +      if (d.testing_p || !rtx_equal_p (d.op0, d.op1))
>         {
>           d.one_operand_p = false;
>           break;
> @@ -11807,6 +11807,22 @@ ia64_expand_vec_perm_const (rtx operands
>        break;
>      }
>
> +  if (d.testing_p)
> +    {
> +      /* We have to go through the motions and see if we can
> +        figure out how to generate the requested permutation.  */
> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> +      if (!d.one_operand_p)
> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> +
> +      start_sequence ();
> +      bool ret = ia64_expand_vec_perm_const_1 (&d);
> +      end_sequence ();
> +
> +      return ret;
> +    }
> +
>    if (ia64_expand_vec_perm_const_1 (&d))
>      return true;
>
> @@ -11823,51 +11839,6 @@ ia64_expand_vec_perm_const (rtx operands
>    return false;
>  }
>
> -/* Implement targetm.vectorize.vec_perm_const_ok.  */
> -
> -static bool
> -ia64_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = true;
> -
> -  /* Extract the values from the vector CST into the permutation
> -     array in D.  */
> -  for (i = which = 0; i < nelt; ++i)
> -    {
> -      unsigned char e = sel[i];
> -      d.perm[i] = e;
> -      gcc_assert (e < 2 * nelt);
> -      which |= (e < nelt ? 1 : 2);
> -    }
> -
> -  /* For all elements from second vector, fold the elements to first.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> -
> -  /* Check whether the mask can be applied to the vector type.  */
> -  d.one_operand_p = (which != 3);
> -
> -  /* Otherwise we have to go through the motions and see if we can
> -     figure out how to generate the requested permutation.  */
> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> -  if (!d.one_operand_p)
> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> -
> -  start_sequence ();
> -  ret = ia64_expand_vec_perm_const_1 (&d);
> -  end_sequence ();
> -
> -  return ret;
> -}
> -
>  void
>  ia64_expand_vec_setv2sf (rtx operands[3])
>  {
> Index: gcc/config/mips/loongson.md
> ===================================================================
> --- gcc/config/mips/loongson.md 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/loongson.md 2017-12-09 22:47:27.865318089 +0000
> @@ -784,19 +784,6 @@ (define_insn "*loongson_punpcklwd_hi"
>    "punpcklwd\t%0,%1,%2"
>    [(set_attr "type" "fcvt")])
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VWHB 0 "register_operand" "")
> -   (match_operand:VWHB 1 "register_operand" "")
> -   (match_operand:VWHB 2 "register_operand" "")
> -   (match_operand:VWHB 3 "" "")]
> -  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> -{
> -  if (mips_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "vec_unpacks_lo_<mode>"
>    [(match_operand:<V_stretch_half> 0 "register_operand" "")
>     (match_operand:VHB 1 "register_operand" "")]
> Index: gcc/config/mips/mips-msa.md
> ===================================================================
> --- gcc/config/mips/mips-msa.md 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/mips-msa.md 2017-12-09 22:47:27.865318089 +0000
> @@ -558,19 +558,6 @@ (define_insn_and_split "msa_copy_s_<msaf
>    [(set_attr "type" "simd_copy")
>     (set_attr "mode" "<MODE>")])
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:MSA 0 "register_operand")
> -   (match_operand:MSA 1 "register_operand")
> -   (match_operand:MSA 2 "register_operand")
> -   (match_operand:<VIMODE> 3 "")]
> -  "ISA_HAS_MSA"
> -{
> -  if (mips_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "abs<mode>2"
>    [(match_operand:IMSA 0 "register_operand" "=f")
>     (abs:IMSA (match_operand:IMSA 1 "register_operand" "f"))]
> Index: gcc/config/mips/mips-ps-3d.md
> ===================================================================
> --- gcc/config/mips/mips-ps-3d.md       2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/mips-ps-3d.md       2017-12-09 22:47:27.865318089 +0000
> @@ -164,19 +164,6 @@ (define_insn "vec_perm_const_ps"
>    [(set_attr "type" "fmove")
>     (set_attr "mode" "SF")])
>
> -(define_expand "vec_perm_constv2sf"
> -  [(match_operand:V2SF 0 "register_operand" "")
> -   (match_operand:V2SF 1 "register_operand" "")
> -   (match_operand:V2SF 2 "register_operand" "")
> -   (match_operand:V2SI 3 "" "")]
> -  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
> -{
> -  if (mips_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;; Expanders for builtins.  The instruction:
>  ;;
>  ;;     P[UL][UL].PS <result>, <a>, <b>
> Index: gcc/config/mips/mips-protos.h
> ===================================================================
> --- gcc/config/mips/mips-protos.h       2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/mips-protos.h       2017-12-09 22:47:27.865318089 +0000
> @@ -348,7 +348,6 @@ extern void mips_expand_atomic_qihi (uni
>                                      rtx, rtx, rtx, rtx);
>
>  extern void mips_expand_vector_init (rtx, rtx);
> -extern bool mips_expand_vec_perm_const (rtx op[4]);
>  extern void mips_expand_vec_unpack (rtx op[2], bool, bool);
>  extern void mips_expand_vec_reduc (rtx, rtx, rtx (*)(rtx, rtx, rtx));
>  extern void mips_expand_vec_minmax (rtx, rtx, rtx,
> Index: gcc/config/mips/mips.c
> ===================================================================
> --- gcc/config/mips/mips.c      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/mips.c      2017-12-09 22:47:27.867318090 +0000
> @@ -21377,34 +21377,32 @@ mips_expand_vec_perm_const_1 (struct exp
>    return false;
>  }
>
> -/* Expand a vec_perm_const pattern.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
> -bool
> -mips_expand_vec_perm_const (rtx operands[4])
> +static bool
> +mips_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                              rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    int i, nelt, which;
>    unsigned char orig_perm[MAX_VECT_LEN];
> -  rtx sel;
>    bool ok;
>
> -  d.target = operands[0];
> -  d.op0 = operands[1];
> -  d.op1 = operands[2];
> -  sel = operands[3];
> -
> -  d.vmode = GET_MODE (d.target);
> -  gcc_assert (VECTOR_MODE_P (d.vmode));
> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = false;
> +  d.target = target;
> +  d.op0 = op0;
> +  d.op1 = op1;
> +
> +  d.vmode = vmode;
> +  gcc_assert (VECTOR_MODE_P (vmode));
> +  d.nelt = nelt = GET_MODE_NUNITS (vmode);
> +  d.testing_p = !target;
>
>    /* This is overly conservative, but ensures we don't get an
>       uninitialized warning on ORIG_PERM.  */
>    memset (orig_perm, 0, MAX_VECT_LEN);
>    for (i = which = 0; i < nelt; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      int ei = INTVAL (e) & (2 * nelt - 1);
> +      int ei = sel[i] & (2 * nelt - 1);
>        which |= (ei < nelt ? 1 : 2);
>        orig_perm[i] = ei;
>      }
> @@ -21417,7 +21415,7 @@ mips_expand_vec_perm_const (rtx operands
>
>      case 3:
>        d.one_vector_p = false;
> -      if (!rtx_equal_p (d.op0, d.op1))
> +      if (d.testing_p || !rtx_equal_p (d.op0, d.op1))
>         break;
>        /* FALLTHRU */
>
> @@ -21434,6 +21432,19 @@ mips_expand_vec_perm_const (rtx operands
>        break;
>      }
>
> +  if (d.testing_p)
> +    {
> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> +      if (!d.one_vector_p)
> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> +
> +      start_sequence ();
> +      ok = mips_expand_vec_perm_const_1 (&d);
> +      end_sequence ();
> +      return ok;
> +    }
> +
>    ok = mips_expand_vec_perm_const_1 (&d);
>
>    /* If we were given a two-vector permutation which just happened to
> @@ -21445,8 +21456,8 @@ mips_expand_vec_perm_const (rtx operands
>       the original permutation.  */
>    if (!ok && which == 3)
>      {
> -      d.op0 = operands[1];
> -      d.op1 = operands[2];
> +      d.op0 = op0;
> +      d.op1 = op1;
>        d.one_vector_p = false;
>        memcpy (d.perm, orig_perm, MAX_VECT_LEN);
>        ok = mips_expand_vec_perm_const_1 (&d);
> @@ -21466,48 +21477,6 @@ mips_sched_reassociation_width (unsigned
>    return 1;
>  }
>
> -/* Implement TARGET_VECTORIZE_VEC_PERM_CONST_OK.  */
> -
> -static bool
> -mips_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = true;
> -
> -  /* Categorize the set of elements in the selector.  */
> -  for (i = which = 0; i < nelt; ++i)
> -    {
> -      unsigned char e = sel[i];
> -      d.perm[i] = e;
> -      gcc_assert (e < 2 * nelt);
> -      which |= (e < nelt ? 1 : 2);
> -    }
> -
> -  /* For all elements from second vector, fold the elements to first.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> -
> -  /* Check whether the mask can be applied to the vector type.  */
> -  d.one_vector_p = (which != 3);
> -
> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> -  if (!d.one_vector_p)
> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> -
> -  start_sequence ();
> -  ret = mips_expand_vec_perm_const_1 (&d);
> -  end_sequence ();
> -
> -  return ret;
> -}
> -
>  /* Expand an integral vector unpack operation.  */
>
>  void
> @@ -22589,8 +22558,8 @@ #define TARGET_SHIFT_TRUNCATION_MASK mip
>  #undef TARGET_PREPARE_PCH_SAVE
>  #define TARGET_PREPARE_PCH_SAVE mips_prepare_pch_save
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK mips_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST mips_vectorize_vec_perm_const
>
>  #undef TARGET_SCHED_REASSOCIATION_WIDTH
>  #define TARGET_SCHED_REASSOCIATION_WIDTH mips_sched_reassociation_width
> Index: gcc/config/powerpcspe/altivec.md
> ===================================================================
> --- gcc/config/powerpcspe/altivec.md    2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/altivec.md    2017-12-09 22:47:27.867318090 +0000
> @@ -2080,19 +2080,6 @@ (define_expand "vec_permv16qi"
>    }
>  })
>
> -(define_expand "vec_perm_constv16qi"
> -  [(match_operand:V16QI 0 "register_operand" "")
> -   (match_operand:V16QI 1 "register_operand" "")
> -   (match_operand:V16QI 2 "register_operand" "")
> -   (match_operand:V16QI 3 "" "")]
> -  "TARGET_ALTIVEC"
> -{
> -  if (altivec_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_insn "*altivec_vpermr_<mode>_internal"
>    [(set (match_operand:VM 0 "register_operand" "=v,?wo")
>         (unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
> Index: gcc/config/powerpcspe/paired.md
> ===================================================================
> --- gcc/config/powerpcspe/paired.md     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/paired.md     2017-12-09 22:47:27.867318090 +0000
> @@ -313,19 +313,6 @@ (define_insn "paired_merge11"
>    "ps_merge11 %0, %1, %2"
>    [(set_attr "type" "fp")])
>
> -(define_expand "vec_perm_constv2sf"
> -  [(match_operand:V2SF 0 "gpc_reg_operand" "")
> -   (match_operand:V2SF 1 "gpc_reg_operand" "")
> -   (match_operand:V2SF 2 "gpc_reg_operand" "")
> -   (match_operand:V2SI 3 "" "")]
> -  "TARGET_PAIRED_FLOAT"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_insn "paired_sum0"
>    [(set (match_operand:V2SF 0 "gpc_reg_operand" "=f")
>         (vec_concat:V2SF (plus:SF (vec_select:SF
> Index: gcc/config/powerpcspe/spe.md
> ===================================================================
> --- gcc/config/powerpcspe/spe.md        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/spe.md        2017-12-09 22:47:27.871318093 +0000
> @@ -511,19 +511,6 @@ (define_insn "vec_perm10_v2si"
>    [(set_attr "type" "vecsimple")
>     (set_attr  "length" "4")])
>
> -(define_expand "vec_perm_constv2si"
> -  [(match_operand:V2SI 0 "gpc_reg_operand" "")
> -   (match_operand:V2SI 1 "gpc_reg_operand" "")
> -   (match_operand:V2SI 2 "gpc_reg_operand" "")
> -   (match_operand:V2SI 3 "" "")]
> -  "TARGET_SPE"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "spe_evmergehi"
>    [(match_operand:V2SI 0 "register_operand" "")
>     (match_operand:V2SI 1 "register_operand" "")
> Index: gcc/config/powerpcspe/vsx.md
> ===================================================================
> --- gcc/config/powerpcspe/vsx.md        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/vsx.md        2017-12-09 22:47:27.871318093 +0000
> @@ -2543,19 +2543,6 @@ (define_insn "vsx_xxpermdi2_<mode>_1"
>  }
>    [(set_attr "type" "vecperm")])
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VSX_D 0 "vsx_register_operand" "")
> -   (match_operand:VSX_D 1 "vsx_register_operand" "")
> -   (match_operand:VSX_D 2 "vsx_register_operand" "")
> -   (match_operand:V2DI  3 "" "")]
> -  "VECTOR_MEM_VSX_P (<MODE>mode)"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;; Extraction of a single element in a small integer vector.  Until ISA 3.0,
>  ;; none of the small types were allowed in a vector register, so we had to
>  ;; extract to a DImode and either do a direct move or store.
> Index: gcc/config/powerpcspe/powerpcspe-protos.h
> ===================================================================
> --- gcc/config/powerpcspe/powerpcspe-protos.h   2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/powerpcspe-protos.h   2017-12-09 22:47:27.867318090 +0000
> @@ -64,9 +64,7 @@ extern void rs6000_expand_vector_extract
>  extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
>  extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
>  extern void rs6000_split_v4si_init (rtx []);
> -extern bool altivec_expand_vec_perm_const (rtx op[4]);
>  extern void altivec_expand_vec_perm_le (rtx op[4]);
> -extern bool rs6000_expand_vec_perm_const (rtx op[4]);
>  extern void altivec_expand_lvx_be (rtx, rtx, machine_mode, unsigned);
>  extern void altivec_expand_stvx_be (rtx, rtx, machine_mode, unsigned);
>  extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
> Index: gcc/config/powerpcspe/powerpcspe.c
> ===================================================================
> --- gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:47:27.871318093 +0000
> @@ -1936,8 +1936,8 @@ #define TARGET_SET_CURRENT_FUNCTION rs60
>  #undef TARGET_LEGITIMATE_CONSTANT_P
>  #define TARGET_LEGITIMATE_CONSTANT_P rs6000_legitimate_constant_p
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST rs6000_vectorize_vec_perm_const
>
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
> @@ -38311,6 +38311,9 @@ rs6000_emit_parity (rtx dst, rtx src)
>  }
>
>  /* Expand an Altivec constant permutation for little endian mode.
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   SEL specifies the constant permutation vector.
> +
>     There are two issues: First, the two input operands must be
>     swapped so that together they form a double-wide array in LE
>     order.  Second, the vperm instruction has surprising behavior
> @@ -38352,22 +38355,18 @@ rs6000_emit_parity (rtx dst, rtx src)
>
>     vr9  = 00000006 00000004 00000002 00000000.  */
>
> -void
> -altivec_expand_vec_perm_const_le (rtx operands[4])
> +static void
> +altivec_expand_vec_perm_const_le (rtx target, rtx op0, rtx op1,
> +                                 const vec_perm_indices &sel)
>  {
>    unsigned int i;
>    rtx perm[16];
>    rtx constv, unspec;
> -  rtx target = operands[0];
> -  rtx op0 = operands[1];
> -  rtx op1 = operands[2];
> -  rtx sel = operands[3];
>
>    /* Unpack and adjust the constant selector.  */
>    for (i = 0; i < 16; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      unsigned int elt = 31 - (INTVAL (e) & 31);
> +      unsigned int elt = 31 - (sel[i] & 31);
>        perm[i] = GEN_INT (elt);
>      }
>
> @@ -38449,10 +38448,14 @@ altivec_expand_vec_perm_le (rtx operands
>  }
>
>  /* Expand an Altivec constant permutation.  Return true if we match
> -   an efficient implementation; false to fall back to VPERM.  */
> +   an efficient implementation; false to fall back to VPERM.
>
> -bool
> -altivec_expand_vec_perm_const (rtx operands[4])
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   SEL specifies the constant permutation vector.  */
> +
> +static bool
> +altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
> +                              const vec_perm_indices &sel)
>  {
>    struct altivec_perm_insn {
>      HOST_WIDE_INT mask;
> @@ -38496,19 +38499,13 @@ altivec_expand_vec_perm_const (rtx opera
>
>    unsigned int i, j, elt, which;
>    unsigned char perm[16];
> -  rtx target, op0, op1, sel, x;
> +  rtx x;
>    bool one_vec;
>
> -  target = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  sel = operands[3];
> -
>    /* Unpack the constant selector.  */
>    for (i = which = 0; i < 16; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      elt = INTVAL (e) & 31;
> +      elt = sel[i] & 31;
>        which |= (elt < 16 ? 1 : 2);
>        perm[i] = elt;
>      }
> @@ -38664,7 +38661,7 @@ altivec_expand_vec_perm_const (rtx opera
>
>    if (!BYTES_BIG_ENDIAN)
>      {
> -      altivec_expand_vec_perm_const_le (operands);
> +      altivec_expand_vec_perm_const_le (target, op0, op1, sel);
>        return true;
>      }
>
> @@ -38724,60 +38721,54 @@ rs6000_expand_vec_perm_const_1 (rtx targ
>    return true;
>  }
>
> -bool
> -rs6000_expand_vec_perm_const (rtx operands[4])
> -{
> -  rtx target, op0, op1, sel;
> -  unsigned char perm0, perm1;
> -
> -  target = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  sel = operands[3];
> -
> -  /* Unpack the constant selector.  */
> -  perm0 = INTVAL (XVECEXP (sel, 0, 0)) & 3;
> -  perm1 = INTVAL (XVECEXP (sel, 0, 1)) & 3;
> -
> -  return rs6000_expand_vec_perm_const_1 (target, op0, op1, perm0, perm1);
> -}
> -
> -/* Test whether a constant permutation is supported.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
>  static bool
> -rs6000_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> +rs6000_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                                rtx op1, const vec_perm_indices &sel)
>  {
> +  bool testing_p = !target;
> +
>    /* AltiVec (and thus VSX) can handle arbitrary permutations.  */
> -  if (TARGET_ALTIVEC)
> +  if (TARGET_ALTIVEC && testing_p)
>      return true;
>
> -  /* Check for ps_merge* or evmerge* insns.  */
> -  if ((TARGET_PAIRED_FLOAT && vmode == V2SFmode)
> -      || (TARGET_SPE && vmode == V2SImode))
> -    {
> -      rtx op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
> -      rtx op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
> -      return rs6000_expand_vec_perm_const_1 (NULL, op0, op1, sel[0], sel[1]);
> +  /* Check for ps_merge*, evmerge* or xxperm* insns.  */
> +  if ((vmode == V2SFmode && TARGET_PAIRED_FLOAT)
> +      || (vmode == V2SImode && TARGET_SPE)
> +      || ((vmode == V2DFmode || vmode == V2DImode)
> +         && VECTOR_MEM_VSX_P (vmode)))
> +    {
> +      if (testing_p)
> +       {
> +         op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
> +         op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
> +       }
> +      if (rs6000_expand_vec_perm_const_1 (target, op0, op1, sel[0], sel[1]))
> +       return true;
> +    }
> +
> +  if (TARGET_ALTIVEC)
> +    {
> +      /* Force the target-independent code to lower to V16QImode.  */
> +      if (vmode != V16QImode)
> +       return false;
> +      if (altivec_expand_vec_perm_const (target, op0, op1, sel))
> +       return true;
>      }
>
>    return false;
>  }
>
> -/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.  */
> +/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   PERM specifies the constant permutation vector.  */
>
>  static void
>  rs6000_do_expand_vec_perm (rtx target, rtx op0, rtx op1,
> -                          machine_mode vmode, unsigned nelt, rtx perm[])
> +                          machine_mode vmode, const vec_perm_builder &perm)
>  {
> -  machine_mode imode;
> -  rtx x;
> -
> -  imode = vmode;
> -  if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
> -    imode = mode_for_int_vector (vmode).require ();
> -
> -  x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
> -  x = expand_vec_perm (vmode, op0, op1, x, target);
> +  rtx x = expand_vec_perm_const (vmode, op0, op1, perm, BLKmode, target);
>    if (x != target)
>      emit_move_insn (target, x);
>  }
> @@ -38789,12 +38780,12 @@ rs6000_expand_extract_even (rtx target,
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, nelt = GET_MODE_NUNITS (vmode);
> -  rtx perm[16];
> +  vec_perm_builder perm (nelt);
>
>    for (i = 0; i < nelt; i++)
> -    perm[i] = GEN_INT (i * 2);
> +    perm.quick_push (i * 2);
>
> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>  }
>
>  /* Expand a vector interleave operation.  */
> @@ -38804,16 +38795,16 @@ rs6000_expand_interleave (rtx target, rt
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
> -  rtx perm[16];
> +  vec_perm_builder perm (nelt);
>
>    high = (highp ? 0 : nelt / 2);
>    for (i = 0; i < nelt / 2; i++)
>      {
> -      perm[i * 2] = GEN_INT (i + high);
> -      perm[i * 2 + 1] = GEN_INT (i + nelt + high);
> +      perm.quick_push (i + high);
> +      perm.quick_push (i + nelt + high);
>      }
>
> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>  }
>
>  /* Scale a V2DF vector SRC by two to the SCALE and place in TGT.  */
> Index: gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc/config/rs6000/altivec.md        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/altivec.md        2017-12-09 22:47:27.872318093 +0000
> @@ -2198,19 +2198,6 @@ (define_expand "vec_permv16qi"
>    }
>  })
>
> -(define_expand "vec_perm_constv16qi"
> -  [(match_operand:V16QI 0 "register_operand" "")
> -   (match_operand:V16QI 1 "register_operand" "")
> -   (match_operand:V16QI 2 "register_operand" "")
> -   (match_operand:V16QI 3 "" "")]
> -  "TARGET_ALTIVEC"
> -{
> -  if (altivec_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_insn "*altivec_vpermr_<mode>_internal"
>    [(set (match_operand:VM 0 "register_operand" "=v,?wo")
>         (unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
> Index: gcc/config/rs6000/paired.md
> ===================================================================
> --- gcc/config/rs6000/paired.md 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/paired.md 2017-12-09 22:47:27.872318093 +0000
> @@ -313,19 +313,6 @@ (define_insn "paired_merge11"
>    "ps_merge11 %0, %1, %2"
>    [(set_attr "type" "fp")])
>
> -(define_expand "vec_perm_constv2sf"
> -  [(match_operand:V2SF 0 "gpc_reg_operand" "")
> -   (match_operand:V2SF 1 "gpc_reg_operand" "")
> -   (match_operand:V2SF 2 "gpc_reg_operand" "")
> -   (match_operand:V2SI 3 "" "")]
> -  "TARGET_PAIRED_FLOAT"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_insn "paired_sum0"
>    [(set (match_operand:V2SF 0 "gpc_reg_operand" "=f")
>         (vec_concat:V2SF (plus:SF (vec_select:SF
> Index: gcc/config/rs6000/vsx.md
> ===================================================================
> --- gcc/config/rs6000/vsx.md    2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/vsx.md    2017-12-09 22:47:27.875318095 +0000
> @@ -3189,19 +3189,6 @@ (define_insn "vsx_xxpermdi2_<mode>_1"
>  }
>    [(set_attr "type" "vecperm")])
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VSX_D 0 "vsx_register_operand" "")
> -   (match_operand:VSX_D 1 "vsx_register_operand" "")
> -   (match_operand:VSX_D 2 "vsx_register_operand" "")
> -   (match_operand:V2DI  3 "" "")]
> -  "VECTOR_MEM_VSX_P (<MODE>mode)"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;; Extraction of a single element in a small integer vector.  Until ISA 3.0,
>  ;; none of the small types were allowed in a vector register, so we had to
>  ;; extract to a DImode and either do a direct move or store.
> Index: gcc/config/rs6000/rs6000-protos.h
> ===================================================================
> --- gcc/config/rs6000/rs6000-protos.h   2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/rs6000-protos.h   2017-12-09 22:47:27.872318093 +0000
> @@ -63,9 +63,7 @@ extern void rs6000_expand_vector_extract
>  extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
>  extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
>  extern void rs6000_split_v4si_init (rtx []);
> -extern bool altivec_expand_vec_perm_const (rtx op[4]);
>  extern void altivec_expand_vec_perm_le (rtx op[4]);
> -extern bool rs6000_expand_vec_perm_const (rtx op[4]);
>  extern void altivec_expand_lvx_be (rtx, rtx, machine_mode, unsigned);
>  extern void altivec_expand_stvx_be (rtx, rtx, machine_mode, unsigned);
>  extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/rs6000.c  2017-12-09 22:47:27.874318095 +0000
> @@ -1907,8 +1907,8 @@ #define TARGET_SET_CURRENT_FUNCTION rs60
>  #undef TARGET_LEGITIMATE_CONSTANT_P
>  #define TARGET_LEGITIMATE_CONSTANT_P rs6000_legitimate_constant_p
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST rs6000_vectorize_vec_perm_const
>
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
> @@ -35545,6 +35545,9 @@ rs6000_emit_parity (rtx dst, rtx src)
>  }
>
>  /* Expand an Altivec constant permutation for little endian mode.
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   SEL specifies the constant permutation vector.
> +
>     There are two issues: First, the two input operands must be
>     swapped so that together they form a double-wide array in LE
>     order.  Second, the vperm instruction has surprising behavior
> @@ -35586,22 +35589,18 @@ rs6000_emit_parity (rtx dst, rtx src)
>
>     vr9  = 00000006 00000004 00000002 00000000.  */
>
> -void
> -altivec_expand_vec_perm_const_le (rtx operands[4])
> +static void
> +altivec_expand_vec_perm_const_le (rtx target, rtx op0, rtx op1,
> +                                 const vec_perm_indices &sel)
>  {
>    unsigned int i;
>    rtx perm[16];
>    rtx constv, unspec;
> -  rtx target = operands[0];
> -  rtx op0 = operands[1];
> -  rtx op1 = operands[2];
> -  rtx sel = operands[3];
>
>    /* Unpack and adjust the constant selector.  */
>    for (i = 0; i < 16; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      unsigned int elt = 31 - (INTVAL (e) & 31);
> +      unsigned int elt = 31 - (sel[i] & 31);
>        perm[i] = GEN_INT (elt);
>      }
>
> @@ -35683,10 +35682,14 @@ altivec_expand_vec_perm_le (rtx operands
>  }
>
>  /* Expand an Altivec constant permutation.  Return true if we match
> -   an efficient implementation; false to fall back to VPERM.  */
> +   an efficient implementation; false to fall back to VPERM.
>
> -bool
> -altivec_expand_vec_perm_const (rtx operands[4])
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   SEL specifies the constant permutation vector.  */
> +
> +static bool
> +altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
> +                              const vec_perm_indices &sel)
>  {
>    struct altivec_perm_insn {
>      HOST_WIDE_INT mask;
> @@ -35734,19 +35737,13 @@ altivec_expand_vec_perm_const (rtx opera
>
>    unsigned int i, j, elt, which;
>    unsigned char perm[16];
> -  rtx target, op0, op1, sel, x;
> +  rtx x;
>    bool one_vec;
>
> -  target = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  sel = operands[3];
> -
>    /* Unpack the constant selector.  */
>    for (i = which = 0; i < 16; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      elt = INTVAL (e) & 31;
> +      elt = sel[i] & 31;
>        which |= (elt < 16 ? 1 : 2);
>        perm[i] = elt;
>      }
> @@ -35902,7 +35899,7 @@ altivec_expand_vec_perm_const (rtx opera
>
>    if (!BYTES_BIG_ENDIAN)
>      {
> -      altivec_expand_vec_perm_const_le (operands);
> +      altivec_expand_vec_perm_const_le (target, op0, op1, sel);
>        return true;
>      }
>
> @@ -35962,59 +35959,53 @@ rs6000_expand_vec_perm_const_1 (rtx targ
>    return true;
>  }
>
> -bool
> -rs6000_expand_vec_perm_const (rtx operands[4])
> -{
> -  rtx target, op0, op1, sel;
> -  unsigned char perm0, perm1;
> -
> -  target = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  sel = operands[3];
> -
> -  /* Unpack the constant selector.  */
> -  perm0 = INTVAL (XVECEXP (sel, 0, 0)) & 3;
> -  perm1 = INTVAL (XVECEXP (sel, 0, 1)) & 3;
> -
> -  return rs6000_expand_vec_perm_const_1 (target, op0, op1, perm0, perm1);
> -}
> -
> -/* Test whether a constant permutation is supported.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
>  static bool
> -rs6000_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> +rs6000_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                                rtx op1, const vec_perm_indices &sel)
>  {
> +  bool testing_p = !target;
> +
>    /* AltiVec (and thus VSX) can handle arbitrary permutations.  */
> -  if (TARGET_ALTIVEC)
> +  if (TARGET_ALTIVEC && testing_p)
>      return true;
>
> -  /* Check for ps_merge* or evmerge* insns.  */
> -  if (TARGET_PAIRED_FLOAT && vmode == V2SFmode)
> +  /* Check for ps_merge* or xxpermdi insns.  */
> +  if ((vmode == V2SFmode && TARGET_PAIRED_FLOAT)
> +      || ((vmode == V2DFmode || vmode == V2DImode)
> +         && VECTOR_MEM_VSX_P (vmode)))
> +    {
> +      if (testing_p)
> +       {
> +         op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
> +         op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
> +       }
> +      if (rs6000_expand_vec_perm_const_1 (target, op0, op1, sel[0], sel[1]))
> +       return true;
> +    }
> +
> +  if (TARGET_ALTIVEC)
>      {
> -      rtx op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
> -      rtx op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
> -      return rs6000_expand_vec_perm_const_1 (NULL, op0, op1, sel[0], sel[1]);
> +      /* Force the target-independent code to lower to V16QImode.  */
> +      if (vmode != V16QImode)
> +       return false;
> +      if (altivec_expand_vec_perm_const (target, op0, op1, sel))
> +       return true;
>      }
>
>    return false;
>  }
>
> -/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.  */
> +/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   PERM specifies the constant permutation vector.  */
>
>  static void
>  rs6000_do_expand_vec_perm (rtx target, rtx op0, rtx op1,
> -                          machine_mode vmode, unsigned nelt, rtx perm[])
> +                          machine_mode vmode, const vec_perm_builder &perm)
>  {
> -  machine_mode imode;
> -  rtx x;
> -
> -  imode = vmode;
> -  if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
> -    imode = mode_for_int_vector (vmode).require ();
> -
> -  x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
> -  x = expand_vec_perm (vmode, op0, op1, x, target);
> +  rtx x = expand_vec_perm_const (vmode, op0, op1, perm, BLKmode, target);
>    if (x != target)
>      emit_move_insn (target, x);
>  }
> @@ -36026,12 +36017,12 @@ rs6000_expand_extract_even (rtx target,
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, nelt = GET_MODE_NUNITS (vmode);
> -  rtx perm[16];
> +  vec_perm_builder perm (nelt);
>
>    for (i = 0; i < nelt; i++)
> -    perm[i] = GEN_INT (i * 2);
> +    perm.quick_push (i * 2);
>
> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>  }
>
>  /* Expand a vector interleave operation.  */
> @@ -36041,16 +36032,16 @@ rs6000_expand_interleave (rtx target, rt
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
> -  rtx perm[16];
> +  vec_perm_builder perm (nelt);
>
>    high = (highp ? 0 : nelt / 2);
>    for (i = 0; i < nelt / 2; i++)
>      {
> -      perm[i * 2] = GEN_INT (i + high);
> -      perm[i * 2 + 1] = GEN_INT (i + nelt + high);
> +      perm.quick_push (i + high);
> +      perm.quick_push (i + nelt + high);
>      }
>
> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>  }
>
>  /* Scale a V2DF vector SRC by two to the SCALE and place in TGT.  */
> Index: gcc/config/sparc/sparc.md
> ===================================================================
> --- gcc/config/sparc/sparc.md   2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/sparc/sparc.md   2017-12-09 22:47:27.876318096 +0000
> @@ -9327,28 +9327,6 @@ (define_insn "bshuffle<VM64:mode>_vis"
>     (set_attr "subtype" "other")
>     (set_attr "fptype" "double")])
>
> -;; The rtl expanders will happily convert constant permutations on other
> -;; modes down to V8QI.  Rely on this to avoid the complexity of the byte
> -;; order of the permutation.
> -(define_expand "vec_perm_constv8qi"
> -  [(match_operand:V8QI 0 "register_operand" "")
> -   (match_operand:V8QI 1 "register_operand" "")
> -   (match_operand:V8QI 2 "register_operand" "")
> -   (match_operand:V8QI 3 "" "")]
> -  "TARGET_VIS2"
> -{
> -  unsigned int i, mask;
> -  rtx sel = operands[3];
> -
> -  for (i = mask = 0; i < 8; ++i)
> -    mask |= (INTVAL (XVECEXP (sel, 0, i)) & 0xf) << (28 - i*4);
> -  sel = force_reg (SImode, gen_int_mode (mask, SImode));
> -
> -  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), sel, const0_rtx));
> -  emit_insn (gen_bshufflev8qi_vis (operands[0], operands[1], operands[2]));
> -  DONE;
> -})
> -
>  ;; Unlike constant permutation, we can vastly simplify the compression of
>  ;; the 64-bit selector input to the 32-bit %gsr value by knowing what the
>  ;; width of the input is.
> Index: gcc/config/sparc/sparc.c
> ===================================================================
> --- gcc/config/sparc/sparc.c    2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/sparc/sparc.c    2017-12-09 22:47:27.876318096 +0000
> @@ -686,6 +686,8 @@ static bool sparc_modes_tieable_p (machi
>  static bool sparc_can_change_mode_class (machine_mode, machine_mode,
>                                          reg_class_t);
>  static HOST_WIDE_INT sparc_constant_alignment (const_tree, HOST_WIDE_INT);
> +static bool sparc_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
> +                                           const vec_perm_indices &);
>
>  #ifdef SUBTARGET_ATTRIBUTE_TABLE
>  /* Table of valid machine attributes.  */
> @@ -930,6 +932,9 @@ #define TARGET_CAN_CHANGE_MODE_CLASS spa
>  #undef TARGET_CONSTANT_ALIGNMENT
>  #define TARGET_CONSTANT_ALIGNMENT sparc_constant_alignment
>
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST sparc_vectorize_vec_perm_const
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  /* Return the memory reference contained in X if any, zero otherwise.  */
> @@ -12812,6 +12817,32 @@ sparc_expand_vec_perm_bmask (machine_mod
>    emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), sel, t_1));
>  }
>
> +/* Implement TARGET_VEC_PERM_CONST.  */
> +
> +static bool
> +sparc_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                               rtx op1, const vec_perm_indices &sel)
> +{
> +  /* All permutes are supported.  */
> +  if (!target)
> +    return true;
> +
> +  /* Force target-independent code to convert constant permutations on other
> +     modes down to V8QI.  Rely on this to avoid the complexity of the byte
> +     order of the permutation.  */
> +  if (vmode != V8QImode)
> +    return false;
> +
> +  unsigned int i, mask;
> +  for (i = mask = 0; i < 8; ++i)
> +    mask |= (sel[i] & 0xf) << (28 - i*4);
> +  rtx mask_rtx = force_reg (SImode, gen_int_mode (mask, SImode));
> +
> +  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), mask_rtx, const0_rtx));
> +  emit_insn (gen_bshufflev8qi_vis (target, op0, op1));
> +  return true;
> +}
> +
>  /* Implement TARGET_FRAME_POINTER_REQUIRED.  */
>
>  static bool


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]