[Bug target/55181] [4.7/4.8/4.9 Regression] Expensive shift loop where a bit-testing instruction could be used

olegendo at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Sun Mar 2 18:08:00 GMT 2014


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55181

--- Comment #9 from Oleg Endo <olegendo at gcc dot gnu.org> ---
The first "if (...) b++;" is transformed to a bit extraction (right shift +
and), because the result is either b = 0 or b = 1.
The second "if (...) b++" uses an and + zero-compare + branch around add.
The and + zero-compare are then combined to a bit test insn.

The first bit extraction could be turned into a bit test followed by a test
result store by implementing the according zero_extract combine pattern:

(set (reg:SI 169)
    (zero_extract:SI (reg/v:SI 165 [ number ])
        (const_int 1 [0x1])
        (const_int 29 [0x1d])))

Although doing so resulted in problems when matching bit test insns, if I
remember correctly.

The second bit test + branch + add is a bit more difficult, as it is always
expanded as a branch + add.

On SH there are multiple minimal sequences, depending on the context /
surrounding code.  The following branchless variant could used if the tested
reg dies after the tests and the whole thing is not inside a (inner) loop:

        mov     r4,r0                // r0 = number
        shlr8   r0                   // r0 = r0 >> 8 (logical shift)
        tst     #(1 << (13-8)),r0    // T = (r0 & (1 << (13-8))) == 0
        shlr8   r0                   // r0 = r0 >> 8 (logical shift)
        movrt   r1                   // r1 = !T
        tst     #(1 << (29-16)),r0   // T = (r0 & (1 << (26-16))) == 0
        movrt   r0                   // r0 = !T
        rts
        add     r1,r0                // r0 = r0 + r1

If the code is in a loop, it's more efficient to load constants (which might
require a constant pool on SH):

        mov.l   (1 << 13),r1
        mov.l   (1 << 29),r2
        mov     #0,r0
        ...
loop:
        ...
        // r4 = number from somewhere
        tst     r1,r4               // T = (r4 & (1 << 13)) == 0
        movrt   r3                  // r3 = !T
        tst     r2,r4               // T = (r4 & (1 << 29)) == 0
        add     r3,r0               // r0 = r0 + r3
        movrt   r3                  // r3 = !T
        add     r3,r0
        ...
        <cbranch loop>


Using shift + and is usually worse on SH for these kind of sequences.

I've tried adding the standard name pattern extzv<mode> but it doesn't seem to
be used neither for the first if (...) nor for the second during RTL expansion.

Maybe if-convert could be taught to transform the second if (...) to a
zero_extract as well.  But probably it's better to catch this earlier.



More information about the Gcc-bugs mailing list