[PATCH] Improve AVX512 sse movcc (PR target/88547)

Jakub Jelinek jakub@redhat.com
Thu Dec 20 07:49:00 GMT 2018


On Thu, Dec 20, 2018 at 08:42:05AM +0100, Uros Bizjak wrote:
> > If one vcond argument is all ones (non-bool) vector and another one is all
> > zeros, we can use for AVX512{DQ,BW} (sometimes + VL) the vpmovm2? insns.
> > While if op_true is all ones and op_false, we emit large code that the
> > combiner often optimizes to that vpmovm2?, if the arguments are swapped,
> > we emit vpxor + vpternlog + and masked move (blend), while we could just
> > invert the mask with knot* and use vpmovm2?.
> >
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk?  The patch is large, but it is mostly reindentation, in the
> > attachment there is diff -ubpd variant of the i386.c changes to make it more
> > readable.
> >
> > 2018-12-19  Jakub Jelinek  <jakub@redhat.com>
> >
> >         PR target/88547
> >         * config/i386/i386.c (ix86_expand_sse_movcc): For maskcmp, try to
> >         emit vpmovm2? instruction perhaps after knot?.  Reorganize code
> >         so that it doesn't have to test !maskcmp in almost every conditional.
> >
> >         * gcc.target/i386/pr88547-1.c: New test.
> 
> LGTM, under assumption that interunit moves from mask reg to xmm regs are fast.

In a simple benchmark (calling these functions in a tight loop on i9-7960X)
the performance is the same, just shorter sequences.

	Jakub



More information about the Gcc-patches mailing list