This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Tree tail merging breaks __builtin_unreachable optimization


Hello,

starting with 4.7, if multiple __builtin_unreachable statements occur in
a single function, they are no longer optimized as they used to be.

For example,

int foo(int a)
{
    if (a <= 0)
        __builtin_unreachable();
    if (a > 2)
        __builtin_unreachable();

    return a > 0;
}

results in the following (ARM) code:

foo:
        cmp r0, #0
        ble .L3
        cmp r0, #2
        bgt .L3
        mov r0, #1
        bx lr
.L3:

with the label .L3 hanging off after the end of the function.

With 4.6, we instead get the expected:

foo:
        mov     r0, #1
        bx      lr


The problem seems to be an unfortunate interaction between tree and
RTL optimization passes. In 4.6, we had something like:

<bb 2>:
  if (a_1(D) <= 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  __builtin_unreachable ();

<bb 4>:
  if (a_1(D) > 2)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
  __builtin_unreachable ();

<bb 6>:
  return 1;

on the tree level; during RTL expansion __builtin_unreachable expands to just a
barrier, and subsequent CFG optimization detects basic blocks containing just a
barrier and optimizes the predecessor blocks.

With 4.7, we get instead:

<bb 2>:
  if (a_1(D) <= 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  __builtin_unreachable ();

<bb 4>:
  if (a_1(D) > 2)
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  return 1;

where there is just a single basic block containing __builtin_unreachable,
and multiple predecessors branching to it. Now unfortunately the RTL
optimizers detecting unreachable blocks appear to have difficulties if
such a block has multiple predecessors, and fail to optimize them.

The tree pass that merged the two blocks is a new pass called "tail merging",
which was added in the 4.7 cycle. In fact, using -fno-tree-tail-merge gets
the expected result back.

Any suggestions how to fix this?  Should tail merging detect
__builtin_unreachable and not merge such block?  Or else, should
the CFG optimizer be extended (how?) to handle unreachable blocks
with multiple predecessors better?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]