[PATCH] x86: PR target/103611: Splitter for DST:DI = (HI:SI<<32)|LO:SI.
Uros Bizjak
ubizjak@gmail.com
Wed Dec 15 09:01:02 GMT 2021
On Mon, Dec 13, 2021 at 3:10 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> A common idiom is to create a DImode value from the "concat" of two SImode
> values, using "(long long)hi << 32 | (long long)lo", where the operation
> may be ior, xor or plus. On x86, with -m32, the high and low parts of
> a DImode register are actually different SImode registers (typically %edx
> and %eax) so ideally this idiom should reduce to two move instructions
> (or optimally, just clever register allocation).
>
> Unfortunately, GCC currently performs the IOR operation above on -m32,
> and worse allocates DImode registers (split to SImode register pairs)
> for both the zero extended HI and LO values.
>
> Hence, for test1 from the new test case below:
>
> typedef int __v4si __attribute__ ((__vector_size__ (16)));
> long long test1(__v4si v) {
> unsigned int loVal = (unsigned int)v[0];
> unsigned int hiVal = (unsigned int)v[1];
> return (long long)(loVal) | ((long long)(hiVal) << 32);
> }
>
> we currently generate (with -m32 -O2 -msse4.1):
>
> test1: subl $28, %esp
> pextrd $1, %xmm0, %eax
> pmovzxdq %xmm0, %xmm1
> movq %xmm1, 8(%esp)
> movl %eax, %edx
> movl 8(%esp), %eax
> orl 12(%esp), %edx
> addl $28, %esp
> orb $0, %ah
> ret
>
> with this patch we now generate:
>
> test1: pextrd $1, %xmm0, %edx
> movd %xmm0, %eax
> ret
>
> The fix is to recognize and split the idiom (hi<<32)|zext(lo) prior
> to register allocation on !TARGET_64BIT, simplifying this sequence to
> "highpart(dst) = hi; lowpart(dst) = lo".
>
> The one minor complication is that sse.md's define_insn for
> *vec_extractv4si_0_zext_sse4 can sometimes interfere with this
> optimization. It turns out that on !TARGET_64BIT, the zero_extend:DI
> following vec_select:SI isn't free, and this insn gets split back
> into multiple instructions during later passes, but too late to
> be optimized away by this patch/reload. Hence the last hunk of
> this patch is to restrict *vec_extractv4si_0_zext_sse4 to TARGET_64BIT.
> Checking PR target/80286, where *vec_extractv4si_0_zext_sse4 was
> first added, this seems reasonable (but this patch has been tested
> both with and without this last change, if it's consider controversial).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without "--target_board='unix{-m32}'"
> with no new failures. OK for mainline?
>
>
> 2021-12-13 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR target/103611
> * config/i386/i386.md (any_or_plus): New code iterator.
> (define_split): Split (HI<<32)|zext(LO) into piece-wise
> move instructions on !TARGET_64BIT.
> * config/i386/sse.md (*vec_extractv4si_0_zext_sse4):
> Restrict to TARGET_64BIT.
>
> gcc/testsuite/ChangeLog
> PR target/103611
> * gcc.target/i386/pr103611-2.c: New test case.
OK with *vec_extractv4si_0_zext_sse4 change but please also change isa
attribute to:
[(set_attr "isa" "*,*,avx512f")
Thanks,
Uros.
>
> Thanks in advance,
> Roger
> --
>
More information about the Gcc-patches
mailing list