Bug 48837 - [4.4/4.5/4.6 Regression] Wrong optimization of recursive function calls
Summary: [4.4/4.5/4.6 Regression] Wrong optimization of recursive function calls
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.3.6
Assignee: Zdenek Dvorak
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2011-04-30 21:25 UTC by e-maxx
Modified: 2011-05-10 08:49 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.4.6, 4.1.0, 4.2.4, 4.7.0
Known to fail: 4.3.3, 4.4.2, 4.5.0, 4.6.0
Last reconfirmed: 2011-05-01 17:13:19


Attachments
The example program - it should output "ans = 1", but outputs "ans = 0" in -O2 (676 bytes, application/octet-stream)
2011-04-30 21:25 UTC, e-maxx
Details
Fix for the issue (305 bytes, patch)
2011-05-04 08:33 UTC, Zdenek Dvorak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description e-maxx 2011-04-30 21:25:28 UTC
Created attachment 24153 [details]
The example program - it should output "ans = 1", but outputs "ans = 0" in -O2

The program attached outputs "global ans = 1" if compiled without optimizations, and "global ans = 0" with -O2 set.

Moreover, if we insert debug-output into the recursive function, it becomes working right:
	//cout << "query = " << ans << endl;
	// ^^^^^ UNCOMMENT THIS LINE TO MAKE THE PROGRAM WORK OK
If we uncomment the line, the program outputs "global ans = 1" both with -O2 and without it.



Unfortunately, I couldn't make the test program very simple - optimizer works OK on simple programs, but when we have a complex recursion calls - it starts making wrong code.


To make you understand it better - there are some additional debug-outputs.

For example, the right program flow results in the following output:

call auxillary (t[9], 123):
auxillary = 1
call auxillary (t[5], 123):
auxillary = 0
global ans = 1


When we compile with -O2, the output becomes:

call auxillary (t[9], 123):
auxillary = 1
call auxillary (t[5], 123):
auxillary = 0
global ans = 0

(note that from code we can see that query() returns sum of all recursive answers - then how can it return 0, if one auxillary() returned 1???)
Comment 1 Dmitry Gorbachev 2011-05-01 05:52:09 UTC
Simpler testcase:

=========================== 8< ===========================
__attribute__((noinline))
int baz(void)
{
  return 1;
}

inline const int& bar(const int& a, const int& b)
{
  return a ? a : b;
}

int foo(int a, int b)
{
  return a || b ? baz() : foo(bar(a, b), 1) + foo(1, 0);
}

int main(void)
{
  return foo(0, 0) != 2;
}
=========================== >8 ===========================
Comment 2 e-maxx 2011-05-01 06:13:59 UTC
It affects even 4.4.3.
Comment 3 H.J. Lu 2011-05-01 17:13:19 UTC
It is caused by revision 127491:

http://gcc.gnu.org/ml/gcc-cvs/2007-08/msg00384.html
Comment 4 Steven Bosscher 2011-05-01 19:01:45 UTC
Works with -fno-optimize-sibling-calls. Looks like this has something to do with the accumulators optimization in tree-tailcall.c
Comment 5 Steven Bosscher 2011-05-01 19:24:06 UTC
Function foo from .143.expand dump:

;; Function int foo(int, int) (_Z3fooii)

int foo(int, int) (int a, int b)
{
  int acc_tmp.13;
  int add_acc.12;
  int D.2091;
  int D.2085;
  int D.2081;

  # BLOCK 2 freq:6950
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  # a_1 = PHI <a_2(D)(0)>
  # b_7 = PHI <b_3(D)(0)>
  # add_acc.12_21 = PHI <0(0)>
  # SUCC: 3 [100.0%]  (fallthru)

  # BLOCK 3 freq:10000
  # PRED: 2 [100.0%]  (fallthru) 5 [100.0%]  (fallthru,dfs_back,exec)
  # a_14 = PHI <a_1(2), a_23(5)>
  # b_22 = PHI <b_7(2), b_25(5)>
  # add_acc.12_19 = PHI <add_acc.12_21(2), add_acc.12_12(5)>
  D.2085_4 = b_22 | a_14;
  if (D.2085_4 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;
  # SUCC: 4 [39.0%]  (true,exec) 5 [61.0%]  (false,exec)

  # BLOCK 4 freq:3900
  # PRED: 3 [39.0%]  (true,exec)
  D.2081_5 = baz (); [tail call]
  acc_tmp.13_10 = D.2081_5 + add_acc.12_19;
  return acc_tmp.13_10;
  # SUCC: EXIT [100.0%]

  # BLOCK 5 freq:3050
  # PRED: 3 [61.0%]  (false,exec)
Invalid sum of incoming frequencies 6100, should be 3050
  D.2091_8 = foo (0, 1);
  add_acc.12_12 = add_acc.12_19 + D.2091_8;
  a_23 = 1;
  b_25 = 0;
  goto <bb 3>;
  # SUCC: 3 [100.0%]  (fallthru,dfs_back,exec)

}
Invalid sum of incoming frequencies 3900, should be 6950

(Note, someone is not updating the CFG properly, see incorrect profile info. This is caused by VRP, the mismatch appears first in the .vrp1 dump.)


The PHI nodes in basic block 3 show that the "+ foo(1,0)" part of foo() has been transformed from a recursive call into a loop inside foo.

We enter the function with a==0 and b==0 so control flow is as follows:

BB2 -> BB3 -> BB5 (because (a | b) == 0) -> 
D.2091_8 = foo(0, 1)                                // D.2091_8 = 1
add_acc = 0 (from the PHI in BB2) + 1 = 1
-> BB3 with a=1 and b=0 -> BB4 ->
D.2081_5 = baz () [tail call]                       // D.2081_5 = 1
acc_tmp.13_10 = 1 + add_acc.12_19 = 1 + 1 = 2
return acc_tmp.13_10                                // return 2

That looks correct to me, but foo(0,0) returns 1 instead of 2. Both foo(0,1) and foo(1,0) both return 1 as expected.

I think the problem is in RTL expansion of the "tail call" in basic block 4, which doesn't look like a tail call to me. The RTL generated for basic block 4 is this:

;; Generating RTL for gimple basic block 4

;; D.2081_5 = baz (); [tail call]

(call_insn/u/j 13 12 14 4 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("_Z3bazv") [flags 0x3]  <function_decl 0x7ffff722c100 baz>) [0 baz S1 A8])
            (const_int 0 [0]))) t.cc:14 -1
     (expr_list:REG_EH_REGION (const_int 0 [0])
        (nil))
    (nil))

(barrier 14 13 0)

The "  acc_tmp.13_10 = D.2081_5 + add_acc.12_19;" part is lost.
Comment 6 Steven Bosscher 2011-05-01 19:56:37 UTC
In the tail recursion optimization:

Breakpoint 3, gimple_call_set_tail (s=0x7ffff7ed3680, tail_p=1 '\001') at ../../trunk/gcc/gimple.h:2241
2241      GIMPLE_CHECK (s, GIMPLE_CALL);
(gdb) bt 2
#0  gimple_call_set_tail (s=0x7ffff7ed3680, tail_p=1 '\001') at ../../trunk/gcc/gimple.h:2241
#1  0x0000000000f6d5a3 in optimize_tail_call (t=0x1cd7820, opt_tailcalls=1 '\001') at ../../trunk/gcc/tree-tailcall.c:917
(More stack frames follow...)
(gdb) up
#1  0x0000000000f6d5a3 in optimize_tail_call (t=0x1cd7820, opt_tailcalls=1 '\001') at ../../trunk/gcc/tree-tailcall.c:917
917           gimple_call_set_tail (stmt, true);
(gdb) l
912
913       if (opt_tailcalls)
914         {
915           gimple stmt = gsi_stmt (t->call_gsi);
916
917           gimple_call_set_tail (stmt, true);
918           if (dump_file && (dump_flags & TDF_DETAILS))
919             {
920               fprintf (dump_file, "Found tail call ");
921               print_gimple_stmt (dump_file, stmt, 0, dump_flags);
(gdb) disp debug_bb_n(5)
3: debug_bb_n (5) = ;; basic block 5, loop depth 0, count 0
;; prev block 4, next block 1
;; pred:       3 [100.0%]  (fallthru,exec)
;; succ:       EXIT [100.0%]
<bb 5>:
Invalid sum of incoming frequencies 3900, should be 6950
# D.2081_1 = PHI <D.2081_5(3)>
return D.2081_1;

(struct basic_block_def *) 0x7ffff7227548
(gdb) p debug_bb_n(3)
;; basic block 3, loop depth 0, count 0
;; prev block 2, next block 4
;; pred:       2 [39.0%]  (true,exec)
;; succ:       5 [100.0%]  (fallthru,exec)
<bb 3>:
D.2081_5 = baz (); [tail call]
goto <bb 5>;

$9 = (struct basic_block_def *) 0x7ffff7227478
(gdb) 


So far so good. But now compensation code must be inserted for the recursive-call-turned-loop:

(gdb)
3: debug_bb_n (5) = ;; basic block 5, loop depth 0, count 0
;; prev block 4, next block 1
;; pred:       3 [100.0%]  (fallthru,exec)
;; succ:       EXIT [100.0%]
<bb 5>:
Invalid sum of incoming frequencies 3900, should be 6950
# D.2081_1 = PHI <D.2081_5(3)>
return D.2081_1;

(struct basic_block_def *) 0x7ffff7227548
(gdb)
1040                adjust_return_value (e->src, m_acc, a_acc);
3: debug_bb_n (5) = ;; basic block 5, loop depth 0, count 0
;; prev block 4, next block 1
;; pred:       3 [100.0%]  (fallthru,exec)
;; succ:       EXIT [100.0%]
<bb 5>:
Invalid sum of incoming frequencies 3900, should be 6950
# D.2081_1 = PHI <D.2081_5(3)>
return D.2081_1;

(struct basic_block_def *) 0x7ffff7227548
(gdb)
1034          FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
3: debug_bb_n (5) = ;; basic block 5, loop depth 0, count 0
;; prev block 4, next block 1
;; pred:       3 [100.0%]  (fallthru,exec)
;; succ:       EXIT [100.0%]
<bb 5>:
Invalid sum of incoming frequencies 3900, should be 6950
# D.2081_1 = PHI <D.2081_5(3)>
acc_tmp.13_10 = add_acc.12_19 + D.2081_1;
return acc_tmp.13_10;

(struct basic_block_def *) 0x7ffff7227548
(gdb)

(I don't see now r127491 can be the cause of this, FWIW.)
Comment 7 Steven Bosscher 2011-05-01 20:24:11 UTC
Zdenek, this problem is caused by your work to handle addends/multiplicands in tree-tailcall.c. It looks like an interaction problem between tail-calls and tail-recursion, where elimination tail recursion turns a tail-call into a normal call.
Comment 8 Zdenek Dvorak 2011-05-04 08:33:51 UTC
Created attachment 24177 [details]
Fix for the issue

Indeed, once the accumulator transformation is performed, the other tail calls in the function are moved to non-tail positions.
Comment 9 Zdenek Dvorak 2011-05-07 19:43:21 UTC
Author: rakdver
Date: Sat May  7 19:43:18 2011
New Revision: 173534

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173534
Log:
	PR tree-optimization/48837
	* tree-tailcall.c (tree_optimize_tail_calls_1): Do not mark tailcalls
	when accumulator transformation is performed.

	* gcc.dg/pr48837.c: New testcase.


Added:
    trunk/gcc/testsuite/gcc.dg/pr48837.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-tailcall.c
Comment 10 Jakub Jelinek 2011-05-10 08:45:09 UTC
Author: jakub
Date: Tue May 10 08:45:00 2011
New Revision: 173609

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173609
Log:
	Backported from mainline
	2011-05-07  Zdenek Dvorak  <ook@ucw.cz>

	PR tree-optimization/48837
	* tree-tailcall.c (tree_optimize_tail_calls_1): Do not mark tailcalls
	when accumulator transformation is performed.

	* gcc.dg/pr48837.c: New testcase.

Added:
    branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/pr48837.c
Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_6-branch/gcc/tree-tailcall.c
Comment 11 Jakub Jelinek 2011-05-10 08:45:56 UTC
Author: jakub
Date: Tue May 10 08:45:50 2011
New Revision: 173610

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173610
Log:
	Backported from mainline
	2011-05-07  Zdenek Dvorak  <ook@ucw.cz>

	PR tree-optimization/48837
	* tree-tailcall.c (tree_optimize_tail_calls_1): Do not mark tailcalls
	when accumulator transformation is performed.

	* gcc.dg/pr48837.c: New testcase.

Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/pr48837.c
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_5-branch/gcc/tree-tailcall.c
Comment 12 Jakub Jelinek 2011-05-10 08:47:12 UTC
Author: jakub
Date: Tue May 10 08:47:09 2011
New Revision: 173611

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173611
Log:
	Backported from mainline
	2011-05-07  Zdenek Dvorak  <ook@ucw.cz>

	PR tree-optimization/48837
	* tree-tailcall.c (tree_optimize_tail_calls_1): Do not mark tailcalls
	when accumulator transformation is performed.

	* gcc.dg/pr48837.c: New testcase.

Added:
    branches/gcc-4_4-branch/gcc/testsuite/gcc.dg/pr48837.c
Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_4-branch/gcc/tree-tailcall.c
Comment 13 Jakub Jelinek 2011-05-10 08:49:41 UTC
Fixed.