Bug 51964 - Missed tail merging opportunity
Summary: Missed tail merging opportunity
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Andrew Pinski
URL:
Keywords: missed-optimization
Depends on: 13563 89018 59424
Blocks: 99473 110192
  Show dependency treegraph
 
Reported: 2012-01-23 13:32 UTC by Tom de Vries
Modified: 2023-12-28 00:04 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-27 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2012-01-23 13:32:09 UTC
pr51879-5.c:
...
int bar (int);
void baz (int);
void foo2 (void);
void
foo (int y, int z)
{
  int a;
  if (y == 6)
    {
      if (z)
	foo2 ();
      a = bar (7);
    }
  else
    a = bar (7);
  baz (a);
}
...

compile:
...
gcc -O2 pr51879-5.c -S -fdump-tree-all-all
...

pr51879-5.c.094t.pre:
...
  # BLOCK 5 freq:4877
  # PRED: 8 [100.0%]  (fallthru) 4 [100.0%]  (fallthru,exec)
  # .MEMD.1719_6 = PHI <.MEMD.1719_8(D)(8), .MEMD.1719_9(4)>
  # .MEMD.1719_10 = VDEF <.MEMD.1719_6>
  # USE = nonlocal 
  # CLB = nonlocal 
  aD.1712_4 = barD.1703 (7);
  goto <bb 7>;
  # SUCC: 7 [100.0%]  (fallthru,exec)

  # BLOCK 6 freq:5123
  # PRED: 2 [51.2%]  (false,exec)
  # .MEMD.1719_11 = VDEF <.MEMD.1719_8(D)>
  # USE = nonlocal 
  # CLB = nonlocal 
  aD.1712_5 = barD.1703 (7);
  # SUCC: 7 [100.0%]  (fallthru,exec)

  # BLOCK 7 freq:10000
  # PRED: 5 [100.0%]  (fallthru,exec) 6 [100.0%]  (fallthru,exec)
  # aD.1712_1 = PHI <aD.1712_4(5), aD.1712_5(6)>
  # .MEMD.1719_7 = PHI <.MEMD.1719_10(5), .MEMD.1719_11(6)>
  # .MEMD.1719_12 = VDEF <.MEMD.1719_7>
  # USE = nonlocal 
  # CLB = nonlocal 
  bazD.1705 (aD.1712_1);
  # VUSE <.MEMD.1719_12>
  return;
  # SUCC: EXIT [100.0%] 
...

Blocks 5 and 6 are not merged by tail_merge_optimize (they are merged by rtl cross-jumping though).

The reason the blocks are not merged by tail_merge_optimize is that tail_merge_optimize uses value numbering to determine equivalence of blocks.
And since the calls have a different vuse (.MEMD.1719_6 and .MEMD.1719_8(D)) the results of the calls won't have the same value number (even after fixing PR51879).

However, the reason we can merge the calls is not because the calls have the same result. It's because the results are used in the same way. To detect this we should use a different comparison mechanism than the current.
Comment 1 Tom de Vries 2012-01-23 13:43:33 UTC
I have a still rather vague idea that we might value number the uses rather than the defs: assign the same number to uses which use a value in the same way. I don't know how that would work exactly, but the idea is something like this:
...
syntax:
a -> b      : a is used to copy to b
a -> <b,c,d>: a is used as b to define c and d.

  # .MEMD.1719_7 = PHI <.MEMD.1719_10(5), .MEMD.1719_11(6)>
.MEMD.1719_10(5) -> .MEMD.1719_7
.MEMD.1719_11(6) -> .MEMD.1719_7

  # aD.1712_1 = PHI <aD.1712_4(5), aD.1712_5(6)>
aD.1712_4(5) -> aD.1712_1
aD.1712_5(6) -> aD.1712_1

  # .MEMD.1719_10 = VDEF <.MEMD.1719_6>
  # USE = nonlocal 
  # CLB = nonlocal 
  aD.1712_4 = barD.1703 (7);
bar          -> <call, .MEMD.1719_7, aD.1712_1>
.MEMD.1719_6 -> <vuse, .MEMD.1719_7, aD.1712_1>
7            -> <callarg0, .MEMD.1719_7, aD.1712_1>

  # .MEMD.1719_11 = VDEF <.MEMD.1719_8(D)>
  # USE = nonlocal 
  # CLB = nonlocal 
  aD.1712_5 = barD.1703 (7);
bar             -> <call, .MEMD.1719_7, aD.1712_1>
.MEMD.1719_8(D) -> <vuse, .MEMD.1719_7, aD.1712_1>
7               -> <callarg0, .MEMD.1719_7, aD.1712_1>
...

By comparing the value numbers of the uses of the 2 calls, we can conclude that the calls use values in the same way, which means we can merge them. And in order to merge them we need to insert phis to merge the actual values that are used. In the example above, we would only need a phi for .MEMD.1719_6 and .MEMD.1719_8(D).

And if we had f.i. 'bar (7)' and 'bar (8)' in the example, this still would compare equal, and we would have to insert a phi (7,8) and use that as argument of the tail-merged call to bar.
Comment 2 Andrew Pinski 2016-08-14 18:10:50 UTC
Confirmed.
Comment 3 Andrew Pinski 2021-12-28 06:52:23 UTC
I thought there was another bug for this same thing.
Comment 4 Andrew Pinski 2021-12-28 06:56:59 UTC
PR 13563 is really similar. The only difference is the call is both operands of 7 here while in that case it is 0/1.
Comment 5 Andrew Pinski 2023-05-06 21:29:24 UTC
PR 89018 and PR 59424 have a similar case.