[PATCH] [i386] Optimize vec_perm_expr to match vpmov{dw,qd,wb}.

Jakub Jelinek jakub@redhat.com
Fri Aug 13 08:47:53 GMT 2021

On Fri, Aug 13, 2021 at 09:42:00AM +0800, Hongtao Liu wrote:
> > So, I wonder if your new routine shouldn't be instead done after
> > in ix86_expand_vec_perm_const_1 after vec_perm_1 among other 2 insn cases
> > and handle the other vpmovdw etc. cases in combine splitters (see that we
> > only use low half or quarter of the result and transform whatever
> > permutation we've used into what we want).
> >
> Got it, i'll try that way.

Note, IMHO the ultimate fix would be to add real support for the
__builtin_shufflevector -1 indices (meaning I don't care what will be in
that element, perhaps narrowed down to an implementation choice of
any element of the input vector(s) or 0).
As VEC_PERM_EXPR is currently used for both perms by variable permutation
vector and constant, I think we'd need to introduce VEC_PERM_CONST_EXPR,
which would be exactly like VEC_PERM_EXPR, except that the last operand
would be required to be a VECTOR_CST and that all ones element would mean
something different, the I don't care behavior.
The GIMPLE side would be fairly easy, except that there should be some
optimizations eventually, like when only certain subset of elements of
a vector are used later, we can mark the other elements as don't care.

The hard part would be backend expansion, especially x86.
I guess we could easily canonicalize VEC_PERM_EXPR with constant
permutations into VEC_PERM_CONST_EXPR by replacing all ones elements
with elements modulo the number of elements (or twice that for 2 operand
perms), but then in all the routines that recognize something we'd
need to special case the unknown elements to match anything during testing
and for expansion replace it by something that would match.
That is again a lot of work, but not extremely hard.  The hardest would be
to deal with the expand_vec_perm_1 handling many cases by trying to recog
an instruction.  Either we'd need to represent the unknown case by a magic
CONST_INT_WILDCARD or CONST_INT_RANGE that recog with the help of the
patterns would replace by some CONST_INT that matches it, but note we have
all those const_0_to_N_operand and in conditions
INTVAL (operands[3]) & 1 == 0 and INTVAL (operands[3]) + 1 == INTVAL (operands[4])
etc., or we'd need to either manually or semi-automatically build some code that
would try to guess right values for unknown before trying to recog it.


More information about the Gcc-patches mailing list