This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/79012] New: basic block reordering causes suboptimal code


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79012

            Bug ID: 79012
           Summary: basic block reordering causes suboptimal code
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: saaadhu at gcc dot gnu.org
                CC: segher at kernel dot crashing.org
  Target Milestone: ---

For this C code (slightly modified from PR 30908)

void wait(int i)
{
        while (i-- > 0)
                asm volatile("nop" ::: "memory");
}

  gcc 4.8 at -Os produces

        jmp     .L2
.L3:
        nop
        decl    %edi
.L2:
        testl   %edi, %edi
        jg      .L3
        ret

whereas gcc trunk (and 4.9 onwards, from a quick check) produces

.L2:
        testl   %edi, %edi
        jle     .L5
        nop
        decl    %edi
        jmp     .L2
.L5:
        ret

The code size is identical, but the trunk version executes one more
instruction everytime the loop runs (explicit jump to .L5 with trunk vs
fallthrough with 4.8) - it's faster only if the loop never runs. This
happens irrespective of the memory clobber inline assembler statement.

Digging into the dump files, I found that the transformation occurs in
the bb reorder pass, when it calls cfg_layout_initialize, which
eventually calls try_redirect_by_replacing_jump with in_cfglayout set to
true. That function then removes the jump and causes the RTL
transformation that eventually results in slower code.

RTL before and after bbro.

Before:

(jump_insn 24 6 25 2 (set (pc)
        (label_ref 15)) "pr30908.c":3 678 {jump}
     (nil)
 -> 15)
(barrier 25 24 17)
(code_label 17 25 12 3 3 "" [1 uses])
(note 12 17 13 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 13 12 14 3 (parallel [
            (asm_operands/v ("nop") ("") 0 []
                 []
                 [] pr30908.c:4)
            (clobber (mem:BLK (scratch) [0  A8]))
            (clobber (reg:CCFP 18 fpsr))
            (clobber (reg:CC 17 flags))
        ]) "pr30908.c":4 -1
     (expr_list:REG_UNUSED (reg:CCFP 18 fpsr)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 14 13 15 3 (parallel [
            (set (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
                (plus:SI (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
                    (const_int -1 [0xffffffffffffffff])))
            (clobber (reg:CC 17 flags))
        ]) 210 {*addsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(code_label 15 14 16 4 2 "" [1 uses])
(note 16 15 18 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 18 16 19 4 (set (reg:CCNO 17 flags)
        (compare:CCNO (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
            (const_int 0 [0]))) "pr30908.c":3 3 {*cmpsi_ccno_1}
     (nil))
(jump_insn 19 18 30 4 (set (pc)
        (if_then_else (gt (reg:CCNO 17 flags)
                (const_int 0 [0]))
            (label_ref 17)
            (pc))) "pr30908.c":3 646 {*jcc_1}
     (expr_list:REG_DEAD (reg:CCNO 17 flags)
        (int_list:REG_BR_PROB 8500 (nil)))
 -> 17)
(note 30 19 28 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(note 28 30 29 5 NOTE_INSN_EPILOGUE_BEG)
(jump_insn 29 28 31 5 (simple_return) "pr30908.c":5 708
{simple_return_internal}
     (nil)
 -> simple_return)

After:

<snip>
(code_label 15 6 16 3 2 "" [1 uses])
(note 16 15 18 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 18 16 19 3 (set (reg:CCNO 17 flags)
        (compare:CCNO (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
            (const_int 0 [0]))) "pr30908.c":3 3 {*cmpsi_ccno_1}
     (nil))
(jump_insn 19 18 12 3 (set (pc)
        (if_then_else (le (reg:CCNO 17 flags)
                (const_int 0 [0]))
            (label_ref:DI 34)
            (pc))) "pr30908.c":3 646 {*jcc_1}
     (expr_list:REG_DEAD (reg:CCNO 17 flags)
        (int_list:REG_BR_PROB 1500 (nil)))
 -> 34)
(note 12 19 13 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 13 12 14 4 (parallel [
            (asm_operands/v ("nop") ("") 0 []
                 []
                 [] pr30908.c:4)
            (clobber (mem:BLK (scratch) [0  A8]))
            (clobber (reg:CCFP 18 fpsr))
            (clobber (reg:CC 17 flags))
        ]) "pr30908.c":4 -1
     (expr_list:REG_UNUSED (reg:CCFP 18 fpsr)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 14 13 35 4 (parallel [
            (set (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
                (plus:SI (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
                    (const_int -1 [0xffffffffffffffff])))
            (clobber (reg:CC 17 flags))
        ]) 210 {*addsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(jump_insn 35 14 36 4 (set (pc)
        (label_ref 15)) -1
     (nil)
 -> 15)
(barrier 36 35 34)
(code_label 34 36 30 5 5 "" [1 uses])
(note 30 34 28 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(note 28 30 29 5 NOTE_INSN_EPILOGUE_BEG)
(jump_insn 29 28 31 5 (simple_return) "pr30908.c":5 708
{simple_return_internal}
     (nil)
 -> simple_return)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]