Bug 105731 - superfluous second operation before conditional branch -O2 -mcpu=cortex-m0plus
Summary: superfluous second operation before conditional branch -O2 -mcpu=cortex-m0plus
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 11.2.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2022-05-25 20:20 UTC by Kio
Modified: 2022-09-04 18:27 UTC (History)
1 user (show)

See Also:
Host:
Target: arm*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-05-25 00:00:00


Attachments
testcase (375 bytes, text/plain)
2022-05-25 20:26 UTC, Andrew Pinski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kio 2022-05-25 20:20:52 UTC
gcc 10.3.1 misses an easy oportunity for optimization:

Instruction:  
    if ((bits<<=1)>=0) goto p3;

generated code:
    strh    r7, [r0, #4]
    lsls    r3, r2, #1
    lsls    r2, r2, #1
    bpl     .L8

i believe the 2nd "lsls" is superfluous. 
It would save me 3MHz if it wasn't there. :-)

full example: 
    https://godbolt.org/z/xocbvjn5x
Comment 1 Marek Polacek 2022-05-25 20:24:19 UTC
Not a C++ FE problem.
Comment 2 Andrew Pinski 2022-05-25 20:26:49 UTC
Created attachment 53032 [details]
testcase
Comment 3 Andrew Pinski 2022-05-25 20:34:06 UTC
Combine:

Trying 41 -> 44:
   41: r120:SI=r119:SI<<0x1a
      REG_DEAD r119:SI
   44: pc={(r120:SI<0)?L114:pc}
      REG_BR_PROB 283038348
Failed to match this instruction:
(parallel [
        (set (pc)
            (if_then_else (ne (zero_extract:SI (reg:SI 119 [ _7 ])
                        (const_int 1 [0x1])
                        (const_int 5 [0x5]))
                    (const_int 0 [0]))
                (label_ref 114)
                (pc)))
        (set (reg/v:SI 120 [ bits ])
            (ashift:SI (reg:SI 119 [ _7 ])
                (const_int 26 [0x1a])))
    ])
Failed to match this instruction:
(parallel [
        (set (pc)
            (if_then_else (ne (zero_extract:SI (reg:SI 119 [ _7 ])
                        (const_int 1 [0x1])
                        (const_int 5 [0x5]))
                    (const_int 0 [0]))
                (label_ref 114)
                (pc)))
        (set (reg/v:SI 120 [ bits ])
            (ashift:SI (reg:SI 119 [ _7 ])
                (const_int 26 [0x1a])))
    ])
Failed to match this instruction:
(parallel [
        (set (pc)
            (if_then_else (ne (and:SI (lshiftrt:SI (reg:SI 119 [ _7 ])
                            (const_int 5 [0x5]))
                        (const_int 1 [0x1]))
                    (const_int 0 [0]))
                (label_ref 114)
                (pc)))
        (set (reg/v:SI 120 [ bits ])
            (ashift:SI (reg:SI 119 [ _7 ])
                (const_int 26 [0x1a])))
    ])
Failed to match this instruction:
(parallel [
        (set (pc)
            (if_then_else (ne (and:SI (lshiftrt:SI (reg:SI 119 [ _7 ])
                            (const_int 5 [0x5]))
                        (const_int 1 [0x1]))
                    (const_int 0 [0]))
                (label_ref 114)
                (pc)))
        (set (reg/v:SI 120 [ bits ])
            (ashift:SI (reg:SI 119 [ _7 ])
                (const_int 26 [0x1a])))
    ])
Successfully matched this instruction:
(set (reg/v:SI 120 [ bits ])
    (ashift:SI (reg:SI 119 [ _7 ])
        (const_int 26 [0x1a])))
Successfully matched this instruction:
(set (pc)
    (if_then_else (ne (zero_extract:SI (reg:SI 119 [ _7 ])
                (const_int 1 [0x1])
                (const_int 5 [0x5]))
            (const_int 0 [0]))
        (label_ref 114)
        (pc)))
allowing combination of insns 41 and 44
original costs 4 + 14 = 18
replacement costs 4 + 14 = 18
modifying insn i2    41: r120:SI=r119:SI<<0x1a
deferring rescan insn with uid = 41.
modifying insn i3    44: {pc={(zero_extract(r119:SI,0x1,0x5)!=0)?L114:pc};clobber scratch;}
      REG_DEAD r119:SI
      REG_BR_PROB 283038348
deferring rescan insn with uid = 44.

The cost for the following instruction is wrong:
(jump_insn 113 110 114 13 (parallel [
            (set (pc)
                (if_then_else (eq (zero_extract:SI (reg:SI 119 [ _7 ])
                            (const_int 1 [0x1])
                            (const_int 5 [0x5]))
                        (const_int 0 [0]))
                    (label_ref 59)
                    (pc)))
            (clobber (scratch:SI))
        ]) "/app/example.cpp":33:26 961 {*tbit_cbranch}
     (expr_list:REG_DEAD (reg:SI 119 [ _7 ])
        (int_list:REG_BR_PROB 457091900 (nil)))
 -> 59)

It should be 18 since it gets splitted into two one which has a cost of 4 and another which is a cost of 14.
Comment 4 Andrew Pinski 2022-05-25 20:34:25 UTC
Confirmed.
Comment 5 Richard Earnshaw 2022-05-26 09:08:49 UTC
The thumb1 rtx-costing function needs a complete rewrite along the lines and style of the Thumb2 and Arm costing routines.
Another thing for my copious free time (TM).
Comment 6 Kio 2022-09-04 18:27:14 UTC
i just came across the test on godbolt again and found they added gcc 11.2.1.
compiled code no change. so i updated gcc version for the ticket.