[Bug target/55181] [4.7/4.8/4.9 Regression] Expensive shift loop where a bit-testing instruction could be used
olegendo at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Sun Mar 2 18:08:00 GMT 2014
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55181
--- Comment #9 from Oleg Endo <olegendo at gcc dot gnu.org> ---
The first "if (...) b++;" is transformed to a bit extraction (right shift +
and), because the result is either b = 0 or b = 1.
The second "if (...) b++" uses an and + zero-compare + branch around add.
The and + zero-compare are then combined to a bit test insn.
The first bit extraction could be turned into a bit test followed by a test
result store by implementing the according zero_extract combine pattern:
(set (reg:SI 169)
(zero_extract:SI (reg/v:SI 165 [ number ])
(const_int 1 [0x1])
(const_int 29 [0x1d])))
Although doing so resulted in problems when matching bit test insns, if I
remember correctly.
The second bit test + branch + add is a bit more difficult, as it is always
expanded as a branch + add.
On SH there are multiple minimal sequences, depending on the context /
surrounding code. The following branchless variant could used if the tested
reg dies after the tests and the whole thing is not inside a (inner) loop:
mov r4,r0 // r0 = number
shlr8 r0 // r0 = r0 >> 8 (logical shift)
tst #(1 << (13-8)),r0 // T = (r0 & (1 << (13-8))) == 0
shlr8 r0 // r0 = r0 >> 8 (logical shift)
movrt r1 // r1 = !T
tst #(1 << (29-16)),r0 // T = (r0 & (1 << (26-16))) == 0
movrt r0 // r0 = !T
rts
add r1,r0 // r0 = r0 + r1
If the code is in a loop, it's more efficient to load constants (which might
require a constant pool on SH):
mov.l (1 << 13),r1
mov.l (1 << 29),r2
mov #0,r0
...
loop:
...
// r4 = number from somewhere
tst r1,r4 // T = (r4 & (1 << 13)) == 0
movrt r3 // r3 = !T
tst r2,r4 // T = (r4 & (1 << 29)) == 0
add r3,r0 // r0 = r0 + r3
movrt r3 // r3 = !T
add r3,r0
...
<cbranch loop>
Using shift + and is usually worse on SH for these kind of sequences.
I've tried adding the standard name pattern extzv<mode> but it doesn't seem to
be used neither for the first if (...) nor for the second during RTL expansion.
Maybe if-convert could be taught to transform the second if (...) to a
zero_extract as well. But probably it's better to catch this earlier.
More information about the Gcc-bugs
mailing list