pr51879-5.c: ... int bar (int); void baz (int); void foo2 (void); void foo (int y, int z) { int a; if (y == 6) { if (z) foo2 (); a = bar (7); } else a = bar (7); baz (a); } ... compile: ... gcc -O2 pr51879-5.c -S -fdump-tree-all-all ... pr51879-5.c.094t.pre: ... # BLOCK 5 freq:4877 # PRED: 8 [100.0%] (fallthru) 4 [100.0%] (fallthru,exec) # .MEMD.1719_6 = PHI <.MEMD.1719_8(D)(8), .MEMD.1719_9(4)> # .MEMD.1719_10 = VDEF <.MEMD.1719_6> # USE = nonlocal # CLB = nonlocal aD.1712_4 = barD.1703 (7); goto <bb 7>; # SUCC: 7 [100.0%] (fallthru,exec) # BLOCK 6 freq:5123 # PRED: 2 [51.2%] (false,exec) # .MEMD.1719_11 = VDEF <.MEMD.1719_8(D)> # USE = nonlocal # CLB = nonlocal aD.1712_5 = barD.1703 (7); # SUCC: 7 [100.0%] (fallthru,exec) # BLOCK 7 freq:10000 # PRED: 5 [100.0%] (fallthru,exec) 6 [100.0%] (fallthru,exec) # aD.1712_1 = PHI <aD.1712_4(5), aD.1712_5(6)> # .MEMD.1719_7 = PHI <.MEMD.1719_10(5), .MEMD.1719_11(6)> # .MEMD.1719_12 = VDEF <.MEMD.1719_7> # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1712_1); # VUSE <.MEMD.1719_12> return; # SUCC: EXIT [100.0%] ... Blocks 5 and 6 are not merged by tail_merge_optimize (they are merged by rtl cross-jumping though). The reason the blocks are not merged by tail_merge_optimize is that tail_merge_optimize uses value numbering to determine equivalence of blocks. And since the calls have a different vuse (.MEMD.1719_6 and .MEMD.1719_8(D)) the results of the calls won't have the same value number (even after fixing PR51879). However, the reason we can merge the calls is not because the calls have the same result. It's because the results are used in the same way. To detect this we should use a different comparison mechanism than the current.
I have a still rather vague idea that we might value number the uses rather than the defs: assign the same number to uses which use a value in the same way. I don't know how that would work exactly, but the idea is something like this: ... syntax: a -> b : a is used to copy to b a -> <b,c,d>: a is used as b to define c and d. # .MEMD.1719_7 = PHI <.MEMD.1719_10(5), .MEMD.1719_11(6)> .MEMD.1719_10(5) -> .MEMD.1719_7 .MEMD.1719_11(6) -> .MEMD.1719_7 # aD.1712_1 = PHI <aD.1712_4(5), aD.1712_5(6)> aD.1712_4(5) -> aD.1712_1 aD.1712_5(6) -> aD.1712_1 # .MEMD.1719_10 = VDEF <.MEMD.1719_6> # USE = nonlocal # CLB = nonlocal aD.1712_4 = barD.1703 (7); bar -> <call, .MEMD.1719_7, aD.1712_1> .MEMD.1719_6 -> <vuse, .MEMD.1719_7, aD.1712_1> 7 -> <callarg0, .MEMD.1719_7, aD.1712_1> # .MEMD.1719_11 = VDEF <.MEMD.1719_8(D)> # USE = nonlocal # CLB = nonlocal aD.1712_5 = barD.1703 (7); bar -> <call, .MEMD.1719_7, aD.1712_1> .MEMD.1719_8(D) -> <vuse, .MEMD.1719_7, aD.1712_1> 7 -> <callarg0, .MEMD.1719_7, aD.1712_1> ... By comparing the value numbers of the uses of the 2 calls, we can conclude that the calls use values in the same way, which means we can merge them. And in order to merge them we need to insert phis to merge the actual values that are used. In the example above, we would only need a phi for .MEMD.1719_6 and .MEMD.1719_8(D). And if we had f.i. 'bar (7)' and 'bar (8)' in the example, this still would compare equal, and we would have to insert a phi (7,8) and use that as argument of the tail-merged call to bar.
Confirmed.
I thought there was another bug for this same thing.
PR 13563 is really similar. The only difference is the call is both operands of 7 here while in that case it is 0/1.
PR 89018 and PR 59424 have a similar case.