This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Improve AVX512 sse movcc (PR target/88547)
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 20 Dec 2018 08:49:40 +0100
- Subject: Re: [PATCH] Improve AVX512 sse movcc (PR target/88547)
- References: <20181219232007.GL23305@tucnak> <CAFULd4af=Vpat===Qp1GaoRj4patAO=umuH+UjUQpg70RkTdQw@mail.gmail.com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Dec 20, 2018 at 08:42:05AM +0100, Uros Bizjak wrote:
> > If one vcond argument is all ones (non-bool) vector and another one is all
> > zeros, we can use for AVX512{DQ,BW} (sometimes + VL) the vpmovm2? insns.
> > While if op_true is all ones and op_false, we emit large code that the
> > combiner often optimizes to that vpmovm2?, if the arguments are swapped,
> > we emit vpxor + vpternlog + and masked move (blend), while we could just
> > invert the mask with knot* and use vpmovm2?.
> >
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk? The patch is large, but it is mostly reindentation, in the
> > attachment there is diff -ubpd variant of the i386.c changes to make it more
> > readable.
> >
> > 2018-12-19 Jakub Jelinek <jakub@redhat.com>
> >
> > PR target/88547
> > * config/i386/i386.c (ix86_expand_sse_movcc): For maskcmp, try to
> > emit vpmovm2? instruction perhaps after knot?. Reorganize code
> > so that it doesn't have to test !maskcmp in almost every conditional.
> >
> > * gcc.target/i386/pr88547-1.c: New test.
>
> LGTM, under assumption that interunit moves from mask reg to xmm regs are fast.
In a simple benchmark (calling these functions in a tight loop on i9-7960X)
the performance is the same, just shorter sequences.
Jakub