]> gcc.gnu.org Git - gcc.git/commit
PR target/106060: Improved SSE vector constant materialization on x86.
authorRoger Sayle <roger@nextmovesoftware.com>
Tue, 7 May 2024 06:14:40 +0000 (07:14 +0100)
committerRoger Sayle <roger@nextmovesoftware.com>
Tue, 7 May 2024 06:16:58 +0000 (07:16 +0100)
commit79649a5dcd81bc05c0ba591068c9075de43bd417
tree0c833b1c89f8afc6eaf58f0f318afcc9306f0ffe
parent0c43c673b0d431ca02d83bf6fae9cd60e9a3d0a8
PR target/106060: Improved SSE vector constant materialization on x86.

This patch resolves PR target/106060 by providing efficient methods for
materializing/synthesizing special "vector" constants on x86.  Currently
there are three methods of materializing a vector constant; the most
general is to load a vector from the constant pool, secondly "duplicated"
constants can be synthesized by moving an integer between units and
broadcasting (of shuffling it), and finally the special cases of the
all-zeros vector and all-ones vectors can be loaded via a single SSE
instruction.   This patch handle additional cases that can be synthesized
in two instructions, loading an all-ones vector followed by another SSE
instruction.  Following my recent patch for PR target/112992, there's
conveniently a single place in i386-expand.cc where these special cases
can be handled.

Two examples are given in the original bugzilla PR for 106060.

__m256i should_be_cmpeq_abs ()
{
  return _mm256_set1_epi8 (1);
}

is now generated (with -O3 -march=x86-64-v3) as:

        vpcmpeqd        %ymm0, %ymm0, %ymm0
        vpabsb  %ymm0, %ymm0
        ret

and

__m256i should_be_cmpeq_add ()
{
  return _mm256_set1_epi8 (-2);
}

is now generated as:

        vpcmpeqd        %ymm0, %ymm0, %ymm0
        vpaddb  %ymm0, %ymm0, %ymm0
        ret

2024-05-07  Roger Sayle  <roger@nextmovesoftware.com>
    Hongtao Liu  <hongtao.liu@intel.com>

gcc/ChangeLog
PR target/106060
* config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
(struct ix86_vec_bcast_map_simode_t): New type for table below.
(ix86_vec_bcast_map_simode): Table of SImode constants that may
be efficiently synthesized by a ix86_vec_bcast_alg method.
(ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
(ix86_vector_duplicate_simode_const): Efficiently synthesize
V4SImode and V8SImode constants that duplicate special constants.
(ix86_vector_duplicate_value): Attempt to synthesize "special"
vector constants using ix86_vector_duplicate_simode_const.
* config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
vector integer mode costs with a single SSE instruction.

gcc/testsuite/ChangeLog
PR target/106060
* gcc.target/i386/auto-init-8.c: Update test case.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr101796-1.c: Likewise.
* gcc.target/i386/pr106060-1.c: New test case.
* gcc.target/i386/pr106060-2.c: Likewise.
* gcc.target/i386/pr106060-3.c: Likewise.
* gcc.target/i386/pr70314.c: Update test case.
* gcc.target/i386/vect-shiftv4qi.c: Likewise.
* gcc.target/i386/vect-shiftv8qi.c: Likewise.
12 files changed:
gcc/config/i386/i386-expand.cc
gcc/config/i386/i386.cc
gcc/testsuite/gcc.target/i386/auto-init-8.c
gcc/testsuite/gcc.target/i386/avx512fp16-13.c
gcc/testsuite/gcc.target/i386/pr100865-9a.c
gcc/testsuite/gcc.target/i386/pr101796-1.c
gcc/testsuite/gcc.target/i386/pr106060-1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr106060-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr106060-3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr70314.c
gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c
gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c
This page took 0.072431 seconds and 6 git commands to generate.