[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86
cvs-commit at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Jun 30 10:03:05 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #26 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:00193676a5a3e7e50e1fa6646bb5abb5a7b2acbb
commit r13-1362-g00193676a5a3e7e50e1fa6646bb5abb5a7b2acbb
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Thu Jun 30 11:00:03 2022 +0100
Use xchg for DImode double word rotate by 32 bits with -m32 on x86.
This patch was motivated by the investigation of Linus Torvalds' spill
heavy cryptography kernels in PR 105930. The <any_rotate>di3 expander
handles all rotations by an immediate constant for 1..63 bits with the
exception of 32 bits, which FAILs and is then split by the middle-end.
This patch makes these 32-bit doubleword rotations consistent with the
other DImode rotations during reload, which results in reduced register
pressure, fewer instructions and the use of x86's xchg instruction
when appropriate. In theory, xchg can be handled by register renaming,
but even on micro-architectures where it's implemented by 3 uops (no
worse than a three instruction shuffle), avoiding nominating a
"temporary" register, reduces user-visible register pressure (and
has obvious code size benefits).
The effects are best shown with the new testcase:
unsigned long long bar();
unsigned long long foo()
{
unsigned long long x = bar();
return (x>>32) | (x<<32);
}
for which GCC with -m32 -O2 currently generates:
subl $12, %esp
call bar
addl $12, %esp
movl %eax, %ecx
movl %edx, %eax
movl %ecx, %edx
ret
but with this patch now generates:
subl $12, %esp
call bar
addl $12, %esp
xchgl %edx, %eax
ret
With this patch, the number of lines of assembly language generated
for the blake2b kernel (from the attachment to PR105930) decreases
from 5626 to 5404. Although there's an impressive reduction in
instruction count, there's no change/reduction in stack frame size.
2022-06-30 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (swap_mode): Rename from *swap<mode> to
provide gen_swapsi.
(<any_rotate>di3): Handle !TARGET_64BIT rotations by 32 bits
via new gen_<insn>32di2_doubleword below.
(<anyrotate>32di2_doubleword): New define_insn_and_split
that splits after reload as either a pair of move instructions
or an xchgl (using gen_swapsi).
gcc/testsuite/ChangeLog
* gcc.target/i386/xchg-3.c: New test case.
More information about the Gcc-bugs
mailing list