Bug 48770 - [4.7 Regression] wrong code with -O -fprofile-arcs -fPIC -fno-dce -fno-forward-propagate -fno-tree-forwprop
Summary: [4.7 Regression] wrong code with -O -fprofile-arcs -fPIC -fno-dce -fno-forwar...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: 4.7.0
Assignee: Jeffrey A. Law
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2011-04-26 10:30 UTC by Zdenek Sojka
Modified: 2011-06-23 21:31 UTC (History)
3 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Build:
Known to work: 4.6.1
Known to fail: 4.7.0
Last reconfirmed: 2011-04-26 20:03:25


Attachments
reduced testcase (from g++.dg/bprob/g++-bprob-1.C) (121 bytes, text/plain)
2011-04-26 10:30 UTC, Zdenek Sojka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2011-04-26 10:30:31 UTC
Created attachment 24101 [details]
reduced testcase (from g++.dg/bprob/g++-bprob-1.C)

Compiler output:
$ rm *.gcda
$ g++ -O -fprofile-arcs -fPIC -fno-dce -fno-forward-propagate -fno-tree-forwprop testcase.C
$ valgrind -q ./a.out
==5350== Invalid read of size 8
==5350==    at 0x400D51: test_goto2(int) (testcase.C:10)
==5350==    by 0x400D7E: main (testcase.C:14)
==5350==  Address 0x8 is not stack'd, malloc'd or (recently) free'd
Segmentation fault

Fails only when compiled as C++ code.

Tested revisions:
r172940 - fail
4.6 r172337 - OK
Comment 1 Zdenek Sojka 2011-04-26 14:57:08 UTC
(gdb) disassemble
...
   0x0000000000400d06 <+18>:    addq   $0x1,(%rax)
   0x0000000000400d0a <+22>:    mov    $0x8,%eax
   0x0000000000400d0f <+27>:    test   %edi,%edi
   0x0000000000400d11 <+29>:    jne    0x400d51 <test_goto2(int)+93>
...
   0x0000000000400d50 <+92>:    retq   
=> 0x0000000000400d51 <+93>:    mov    (%rax),%rcx
   0x0000000000400d54 <+96>:    add    $0x1,%rcx
   0x0000000000400d58 <+100>:   mov    0x202461(%rip),%rdx        # 0x6031c0
   0x0000000000400d5f <+107>:   mov    $0x1,%eax
   0x0000000000400d64 <+112>:   jmp    0x400d15 <test_goto2(int)+33>

In the asm output, the problem is apparent (comparing trunk and 4.6 output) - the only difference is:
73,75c73,76
<       mov     rcx, QWORD PTR [rax]    # *.LPBX1_I_lsm.7, *.LPBX1
<       add     rcx, 1  # *.LPBX1_I_lsm.7,
<       mov     rdx, QWORD PTR .LPBX1[rip+32]   # *.LPBX1_I_lsm.6, *.LPBX1
---
>       lea     rax, .LPBX1[rip+24]     # tmp113,
>       mov     rcx, QWORD PTR [rax]    # *.LPBX1, *.LPBX1
>       add     rcx, 1  # *.LPBX1_I_lsm.4,
>       mov     rdx, QWORD PTR .LPBX1[rip+32]   # *.LPBX1_I_lsm.3, *.LPBX1

Instruction "lea" is missing, and rax contains value 8.
Comment 2 Andrew Pinski 2011-04-26 20:03:25 UTC
Here is one that fails with the C front-end:
int test_goto2 (int f)
{
  int i;
  for (i = 0; ({_Bool a = i < 10;a;}); i++)
  {
    if (i == f)
      goto lab2;
  }
  return 4;
lab2:
  return 8;
}

int main ()
{
  test_goto2 (30);
  return 0;
}
Comment 3 Jakub Jelinek 2011-05-10 11:37:33 UTC
Started with:
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171111
Comment 4 Jakub Jelinek 2011-05-10 11:46:34 UTC
*.asmcons is identical in between r171110 and r171111, *.ira already looks wrong (no setting of %rax to 24+.LPBX1, or changing the (%rax) address to 24+.LPBX1).
Comment 5 Jeffrey A. Law 2011-05-10 13:13:33 UTC
Thanks.  I'm more than happy to take it from here...
Comment 6 Jeffrey A. Law 2011-05-10 15:52:15 UTC
We have a block-local equivalence between a pseudo and a memory location:

(insn 86 85 87 9 (set (reg/f:DI 0 ax [113])
        (const:DI (plus:DI (symbol_ref:DI ("*.LPBX1") [flags 0x2] <var_decl 0x7ffff7dec3c0 *.LPBX1>)
                (const_int 8 [0x8])))) 62 {*movdi_internal_rex64}
     (expr_list:REG_EQUIV (const:DI (plus:DI (symbol_ref:DI ("*.LPBX1") [flags 0x2] <var_decl 0x7ffff7dec3c0 *.LPBX1>)
                (const_int 8 [0x8])))
        (nil)))
(insn 87 86 88 9 (set (reg:DI 114 [ *.LPBX1+8 ])
        (mem/s/j/c:DI (reg/f:DI 0 ax [113]) [0 *.LPBX1+8 S8 A64])) 62 {*movdi_internal_rex64}
     (expr_list:REG_DEAD (reg/f:DI 0 ax [113])
        (expr_list:REG_EQUIV (mem/s/j/c:DI (reg/f:DI 0 ax [113]) [0 *.LPBX1+8 S8 A64])
            (nil))))
(insn 88 87 91 9 (parallel [
            (set (reg:DI 2 cx [orig:95 *.LPBX1_I_lsm.5 ] [95])
                (plus:DI (reg:DI 114 [ *.LPBX1+8 ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) 253 {*adddi_1}
     (expr_list:REG_DEAD (reg:DI 114 [ *.LPBX1+8 ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_EQUAL (plus:DI (mem/s/j/c:DI (const:DI (plus:DI (symbol_ref:DI ("*.LPBX1") [flags 0x2] <var_decl 0x7ffff7dec3c0 *.LPBX1>)
                                (const_int 8 [0x8]))) [0 *.LPBX1+8 S8 A64])
                    (const_int 1 [0x1]))
                (nil)))))


reg114 is marked as equivalent to (mem (reg 113)); reg114 does not get a hard reg.  As usual, reload deletes the insn that creates the equivalence between reg114 and its memory location (insn 87).  delete_dead_insn decides to peek at insn86 and decides that insn86 is dead as well, which removes the initialization of reg113.

Later reg114 is replaced with its equivalent memory location which results in an uninitialized reference to reg113 and reading from an invalid memory location and the segfault.

What's interesting here is delete_dead_insn's behavior -- it's been like this since circa 1991, well before we ran any kind of real dead code elimination after reload.  The solution *may* be to remove the recursion in delete_dead_insn.  I'm still investigating.
Comment 7 Jeffrey A. Law 2011-06-23 21:30:23 UTC
Author: law
Date: Thu Jun 23 21:30:20 2011
New Revision: 175353

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175353
Log:

	PR middle-end/48770
	* reload.h (reload): Change to return a bool.
	* ira.c (ira): If requested by reload, run a fast DCE pass after
	reload has completed.  Fix comment typo.
	* reload1.c (need_dce): New file scoped static.
	(reload): Set reload_completed here.  Return whether or not a DCE
	pass after reload is needed.
	(delete_dead_insn): Set need_dce as needed.

	PR middle-end/48770
	* gcc.dg/pr48770.c: New test.



Added:
    trunk/gcc/testsuite/gcc.dg/pr48770.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira.c
    trunk/gcc/reload.h
    trunk/gcc/reload1.c
    trunk/gcc/testsuite/ChangeLog
Comment 8 Jeffrey A. Law 2011-06-23 21:31:31 UTC
Fixed