Bug 58365 - [4.7 Regression] likely wrong code bug
Summary: [4.7 Regression] likely wrong code bug
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: 4.7.4
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2013-09-08 20:12 UTC by John Regehr
Modified: 2014-05-07 16:20 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-09-08 00:00:00


Attachments
gcc49-pr58365.patch (692 bytes, patch)
2013-09-10 07:32 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Regehr 2013-09-08 20:12:47 UTC
regehr@john-home ~/z/reduce/r109 $ clang -O small.c ; ./a.out 
0
regehr@john-home ~/z/reduce/r109 $ gcc -O small.c ; ./a.out 
0
regehr@john-home ~/z/reduce/r109 $ gcc -Os small.c ; ./a.out 
1
regehr@john-home ~/z/reduce/r109 $ cat small.c
int printf(const char *, ...);
struct x0 {
  volatile int x1;
  int x2;
  int x3;
  int x4;
  int x5;
} x6;
static struct x0 x7, x8;
int x9 = 1;
char x10();
static struct x0 x11() {
  if (x10())
    return x6;
  return x7;
}

char x10() { return x9; }

int main() {
  x8 = x11();
  x6.x2 = 1;
  printf("%d\n", x8.x2);
  return 0;
}
regehr@john-home ~/z/reduce/r109 $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/regehr/z/compiler-install/gcc-r202341-install/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /home/regehr/z/compiler-source/gcc/configure --prefix=/home/regehr/z/compiler-install/gcc-r202341-install --enable-languages=c,c++ --disable-multilib
Thread model: posix
gcc version 4.9.0 20130906 (experimental) (GCC) 
regehr@john-home ~/z/reduce/r109 $
Comment 1 Marc Glisse 2013-09-08 20:42:33 UTC
Confirmed, from 4.6 to trunk. The .optimized dump looks ok (basically the same for -O2 and -Os), the dumps become very different at expansion (didn't check if the error appears in expand or later).
Comment 2 Jakub Jelinek 2013-09-09 06:49:53 UTC
Works still with r162588, fails with r162597, revisions in between those two don't build, so can't be bisected more.  But it must be one of the GCSE changes in that range.
Comment 3 Maxim Kuvyrkov 2013-09-10 03:29:08 UTC
Bug introduced in revision 162592.  Investigating.

commit 48d5e32fb40bb7973e596b36d12ad4041d8ef10a
Author: mkuvyrkov <mkuvyrkov@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Jul 27 19:38:10 2010 +0000

        PR target/42495
        PR middle-end/42574
        * gcse.c (hoist_expr_reaches_here_p): Remove excessive check.
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@162592 138bc75d-0d04-0410-961f
Comment 4 Maxim Kuvyrkov 2013-09-10 06:07:57 UTC
The underlying bug appears to be in CSA (combine-stack-adj.c) and I would appreciate Richard's insight on how to fix it.

Summary: CSA merges stack variable into global one, and then alias analysis thinks that ex-stack-now-global variable does not overlap with global variables.

After patch in rev. 162592 code hoisting becomes able to simplify code just a bit, which causes scheduler to generate a different sequence, exposing a latent bug.

During expand we have:

  # BLOCK 3 freq:3900
  # PRED: 2 [39.0%]  (true,exec)
1 D.2750 = x6;
  goto <bb 5>;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:6100
  # PRED: 2 [61.0%]  (false,exec)
  D.2750 = x7;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:10000
  # PRED: 4 [100.0%]  (fallthru,exec) 3 [100.0%]  (fallthru,exec)
2 x8 = D.2750;
3 x6.x2 = 1;
  D.2733_1 = x8.x2;
  printf ("%d\n", D.2733_1);

CSA combines stack variable D.2750 in insn (1) into global x6, but fails to update uses of D.2750.  After CSA DECL_RTL of MEM in insn (2) still references D.2750 even though it now reads directly from x6.  Because D.2750 is on stack and x6 is global -- alias analysis (nonoverlapping_memrefs_p()) tells scheduler that it is OK to move insn (3) before insn (2).
Comment 5 Jakub Jelinek 2013-09-10 06:53:10 UTC
That is actually the crossjumping pass after csa (jump2; though yes, previously it has been part of csa).  In any case, I don't see anything wrong with the cross jumping, it turns the two:
            (set (mem/c:BLK (reg:DI 5 di [81]) [3 D.1761+0 S20 A32])
                (mem/c:BLK (reg:DI 4 si [82]) [3 x6+0 S20 A32]))
and
            (set (mem/c:BLK (reg:DI 5 di [84]) [3 D.1761+0 S20 A32])
                (mem/u/c:BLK (reg:DI 4 si [85]) [3 x7+0 S20 A128]))
into:
            (set (mem/c:BLK (reg:DI 5 di [84]) [3 D.1761+0 S20 A32])
                (mem/u/c:BLK (reg:DI 4 si [85]) [3 S20 A32]))
(note, MEM_EXPR cleared), where si is set conditionally to x6 or x7.
And then we have the:
(insn 29 27 31 6 (set (mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("x6")  <var_decl 0x7f9ce47572f8 x6>)
                    (const_int 4 [0x4]))) [2 x6.x2+0 S4 A32])
        (const_int 1 [0x1])) pr58365.c:31 86 {*movsi_internal}
     (nil))
store that sched2 moves over the above rep_movsi, although there is a (conditional, may, but doesn't have to) aliasing case.
Comment 6 Maxim Kuvyrkov 2013-09-10 07:03:38 UTC
(In reply to Jakub Jelinek from comment #5)
> That is actually the crossjumping pass after csa (jump2; though yes,
> previously it has been part of csa).  In any case, I don't see anything
> wrong with the cross jumping, it turns the two:
>             (set (mem/c:BLK (reg:DI 5 di [81]) [3 D.1761+0 S20 A32])
>                 (mem/c:BLK (reg:DI 4 si [82]) [3 x6+0 S20 A32]))
> and
>             (set (mem/c:BLK (reg:DI 5 di [84]) [3 D.1761+0 S20 A32])
>                 (mem/u/c:BLK (reg:DI 4 si [85]) [3 x7+0 S20 A128]))
> into:
>             (set (mem/c:BLK (reg:DI 5 di [84]) [3 D.1761+0 S20 A32])
>                 (mem/u/c:BLK (reg:DI 4 si [85]) [3 S20 A32]))
> (note, MEM_EXPR cleared), where si is set conditionally to x6 or x7.
> And then we have the:
> (insn 29 27 31 6 (set (mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("x6") 
> <var_decl 0x7f9ce47572f8 x6>)
>                     (const_int 4 [0x4]))) [2 x6.x2+0 S4 A32])
>         (const_int 1 [0x1])) pr58365.c:31 86 {*movsi_internal}
>      (nil))
> store that sched2 moves over the above rep_movsi, although there is a
> (conditional, may, but doesn't have to) aliasing case.

Jacub, I don't quite understand whether you are saying that the bug is /not/ in csa (at GCC 4.7 age) or whether you are providing additional analysis.

I'm debugging the gcc-4.7-era revision 162592 and here the source MEM in rep_movsi references stack variable instead of x6, which causes wrong aliasing decision.
Comment 7 Jakub Jelinek 2013-09-10 07:04:02 UTC
Ah, actually the bug is really in the cross-jumping, in this case that it hasn't cleared MEM_READONLY_P.  While x7 is read-only, x6 is not, so merge_memattrs should combine that to !MEM_READONLY_P.
Comment 8 Jakub Jelinek 2013-09-10 07:32:49 UTC
Created attachment 30781 [details]
gcc49-pr58365.patch

Untested fix.
Comment 9 Jakub Jelinek 2013-09-10 11:47:21 UTC
Author: jakub
Date: Tue Sep 10 11:47:19 2013
New Revision: 202434

URL: http://gcc.gnu.org/viewcvs?rev=202434&root=gcc&view=rev
Log:
	PR rtl-optimization/58365
	* cfgcleanup.c (merge_memattrs): Also clear MEM_READONLY_P
	resp. MEM_NOTRAP_P if they differ, or set MEM_VOLATILE_P if
	it differs.

	* gcc.c-torture/execute/pr58365.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr58365.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgcleanup.c
    trunk/gcc/testsuite/ChangeLog
Comment 10 Jakub Jelinek 2013-09-10 11:48:32 UTC
Author: jakub
Date: Tue Sep 10 11:48:30 2013
New Revision: 202435

URL: http://gcc.gnu.org/viewcvs?rev=202435&root=gcc&view=rev
Log:
	PR rtl-optimization/58365
	* cfgcleanup.c (merge_memattrs): Also clear MEM_READONLY_P
	resp. MEM_NOTRAP_P if they differ, or set MEM_VOLATILE_P if
	it differs.

	* gcc.c-torture/execute/pr58365.c: New test.

Added:
    branches/gcc-4_8-branch/gcc/testsuite/gcc.c-torture/execute/pr58365.c
Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/cfgcleanup.c
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
Comment 11 Jakub Jelinek 2013-09-10 11:50:10 UTC
Fixed for 4.8.2+ so far.
Comment 12 Jakub Jelinek 2014-05-07 16:05:16 UTC
Author: jakub
Date: Wed May  7 16:04:44 2014
New Revision: 210173

URL: http://gcc.gnu.org/viewcvs?rev=210173&root=gcc&view=rev
Log:
	Backported from mainline
	2013-09-10  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/58365
	* cfgcleanup.c (merge_memattrs): Also clear MEM_READONLY_P
	resp. MEM_NOTRAP_P if they differ, or set MEM_VOLATILE_P if
	it differs.

	* gcc.c-torture/execute/pr58365.c: New test.

Added:
    branches/gcc-4_7-branch/gcc/testsuite/gcc.c-torture/execute/pr58365.c
Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/cfgcleanup.c
    branches/gcc-4_7-branch/gcc/testsuite/ChangeLog
Comment 13 Jakub Jelinek 2014-05-07 16:20:38 UTC
Fixed.