[PATCH] PR rtl-optimization/46235: Improved use of bt for bit tests on x86_64.
Roger Sayle
roger@nextmovesoftware.com
Tue Jun 15 15:17:29 GMT 2021
This patch tackles PR46235 to improve the code generated for bit tests
on x86_64 by making more use of the bt instruction. Currently, GCC emits
bt instructions when followed by condition jumps (thanks to Uros'
splitters).
This patch adds splitters in i386.md, to catch the cases where bt is
followed
by a conditional move (as in the original report), or by a setc/setnc (as in
comment 5 of the Bugzilla PR).
With this patch, the motivating function in the original PR
int foo(int a, int x, int y) {
if (a & (1 << x))
return a;
return 1;
}
which with -O2 on mainline generates:
foo: movl %edi, %eax
movl %esi, %ecx
sarl %cl, %eax
testb $1, %al
movl $1, %eax
cmovne %edi, %eax
ret
now generates:
foo: btl %esi, %edi
movl $1, %eax
cmovc %edi, %eax
ret
Likewise, IsBitSet1 (from comment 5)
bool IsBitSet1(unsigned char byte, int index) {
return (byte & (1<<index)) != 0;
}
Before:
movzbl %dil, %eax
movl %esi, %ecx
sarl %cl, %eax
andl $1, %eax
ret
After:
movzbl %dil, %edi
btl %esi, %edi
setc %al
ret
[Identical code is generated for comment 5's IsBitSet2]
bool IsBitSet2(unsigned char byte, int index) {
return (byte >> index) & 1;
}
And finally to demonstrate the corner cases also handled,
int IsBitClr(long long dword, int index) {
return (dword & (1LL<<index)) == 0;
}
Before:
movq %rdi, %rax
movl %esi, %ecx
sarq %cl, %rax
notq %rax
andl $1, %eax
ret
After:
xorl %eax, %eax
btq %rsi, %rdi
setnc %al
ret
According to Agner Fog, SAR/SHR r,cl takes 2 cycles on skylake,
where BT r,r takes only one, so the performance improvements on
recent hardware may be more significant than implied by just the
reduced number of instructions. I've avoided transforming cases
(such as btsi_setcsi) where using bt sequences may not be a clear
win (over sarq/andl).
This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures.
Ok for mainline?
2010-06-15 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/46235
* config/i386/i386.md: New define_split for bt followed by cmov.
(*bt<mode>_setcqi): New define_insn_and_split for bt followed by
setc.
(*bt<mode>_setncqi): New define_insn_and_split for bt then setnc.
(*bt<mode>_setnc<mode>): New define_insn_and_split for bt followed
by setnc with zero extension.
gcc/testsuite/ChangeLog
PR rtl-optimization/46235
* gcc.target/i386/bt-5.c: New test.
* gcc.target/i386/bt-6.c: New test.
* gcc.target/i386/bt-7.c: New test.
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patchs4.txt
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210615/463f6039/attachment-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bt-5.c
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210615/463f6039/attachment-0003.c>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bt-6.c
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210615/463f6039/attachment-0004.c>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bt-7.c
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210615/463f6039/attachment-0005.c>
More information about the Gcc-patches
mailing list