Bug 103202 - [12 regression] gcc miscompiles ed-1.17 since r12-3876-g4a960d548b7d7d94
Summary: [12 regression] gcc miscompiles ed-1.17 since r12-3876-g4a960d548b7d7d94
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P1 normal
Target Milestone: 12.0
Assignee: Aldy Hernandez
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2021-11-11 23:59 UTC by Sergei Trofimovich
Modified: 2021-11-12 19:47 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 11.2.0
Known to fail: 12.0
Last reconfirmed: 2021-11-12 00:00:00


Attachments
main_loop.c (790 bytes, text/x-csrc)
2021-11-11 23:59 UTC, Sergei Trofimovich
Details
main_loop_simpler.c (777 bytes, text/x-csrc)
2021-11-12 09:15 UTC, Sergei Trofimovich
Details
patch in testing (845 bytes, patch)
2021-11-12 15:24 UTC, Aldy Hernandez
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sergei Trofimovich 2021-11-11 23:59:37 UTC
Created attachment 51774 [details]
main_loop.c

Originally found the problem as a test failure on GNU ed-1.17 test suite.

The test fails on gcc-12.0.0 20211107 snapshot and works on gcc-11.2.0 release.

I shrunk test down to a single file, but was not able to figure out where the problem pops up:

# gcc-11, good:
$ ../result-1/bin/gcc -O2 main_loop.c -o a && ./a
element 1
element 2
element 3

# gcc-12, bad:
$ ../result-2/bin/gcc -O2 main_loop.c -o a && ./a
element 1
element 2

$ LANG=C ../result-1/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/nix/store/gsj6c88q5hf7fmaaym7023q92jvx1i20-gcc-11.2.0/bin/gcc
COLLECT_LTO_WRAPPER=/nix/store/gsj6c88q5hf7fmaaym7023q92jvx1i20-gcc-11.2.0/libexec/gcc/x86_64-unknown-linux-gnu/11.2.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with:
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.2.0 (GCC)

$ LANG=C ../result-2/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/nix/store/1g16njhrdlhq61qh30c063jdmmz3hv4d-gcc-12.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/nix/store/1g16njhrdlhq61qh30c063jdmmz3hv4d-gcc-12.0.0/libexec/gcc/x86_64-unknown-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with:
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20211107 (experimental) (GCC)
Comment 1 Andrew Pinski 2021-11-12 00:05:13 UTC
Still borken as of r12-5142.


Confirmed.
Comment 2 Andrew Pinski 2021-11-12 01:38:01 UTC
This seems jump threading related but I can't tell how.
-fno-tree-vrp is enough to workaround the bug but I don't have anything more than that.
Comment 3 Sergei Trofimovich 2021-11-12 09:15:17 UTC
Created attachment 51776 [details]
main_loop_simpler.c

Attaching seemingly simpler example. I applied most inlines and simplified loops as much as I could. It looks like VRP does something funny with:

  while (n > 0)
  {
    while (n-- > 0)
    {

Looking at 039t.mergephi1

with -fno-tree-vrp:

--- no-vrp/ed-main_loop.c.039t.mergephi1    2021-11-12 09:07:55.120821967 +0000
+++ yes-vrp/ed-main_loop.c.039t.mergephi1    2021-11-12 09:07:55.186822973 +0000
@@ -93,7 +93,7 @@
   # np_6 = PHI <np_7(10), np_40(6)>
   # n_8 = PHI <n_9(10), n_23(6)>
   n_23 = n_8 + -1;
-  if (n_8 > 0)
+  if (n_8 != 0)
     goto <bb 4>; [INV]
   else
     goto <bb 8>; [INV]

I think our loop is executed with n == -1 on real data and vrp provided incorrect range.
Comment 4 Richard Biener 2021-11-12 09:33:46 UTC
(In reply to Sergei Trofimovich from comment #3)
> Created attachment 51776 [details]
> main_loop_simpler.c
> 
> Attaching seemingly simpler example. I applied most inlines and simplified
> loops as much as I could. It looks like VRP does something funny with:
> 
>   while (n > 0)
>   {
>     while (n-- > 0)
>     {
> 
> Looking at 039t.mergephi1
> 
> with -fno-tree-vrp:
> 
> --- no-vrp/ed-main_loop.c.039t.mergephi1    2021-11-12 09:07:55.120821967
> +0000
> +++ yes-vrp/ed-main_loop.c.039t.mergephi1    2021-11-12 09:07:55.186822973
> +0000
> @@ -93,7 +93,7 @@
>    # np_6 = PHI <np_7(10), np_40(6)>
>    # n_8 = PHI <n_9(10), n_23(6)>
>    n_23 = n_8 + -1;
> -  if (n_8 > 0)
> +  if (n_8 != 0)
>      goto <bb 4>; [INV]
>    else
>      goto <bb 8>; [INV]
> 
> I think our loop is executed with n == -1 on real data and vrp provided
> incorrect range.

But how would the inner loop be reachable with n == -1?
Comment 5 Martin Liška 2021-11-12 09:35:44 UTC
Started with r12-3876-g4a960d548b7d7d94.
Comment 6 Aldy Hernandez 2021-11-12 10:18:40 UTC
Not looking at this yet, but disabling jump threading from all passes (dom included) makes the problem go away:

$ ./xgcc -B./ a.c -w -O2 -fno-thread-jumps && ./a.out
element 1
element 2
element 3

...so *maybe* threading related.

The minimum number of threads that reproduces this is:

 ./xgcc -B./ a.c -w -O2 -fdbg-cnt=registered_jump_thread:1-1:3-3:5-5  && ./a.out
***dbgcnt: lower limit 1 reached for registered_jump_thread.***
***dbgcnt: upper limit 1 reached for registered_jump_thread.***
***dbgcnt: lower limit 3 reached for registered_jump_thread.***
***dbgcnt: upper limit 3 reached for registered_jump_thread.***
***dbgcnt: lower limit 5 reached for registered_jump_thread.***
***dbgcnt: upper limit 5 reached for registered_jump_thread.***
element 1
element 2
$ grep 'Register.*jump' a.c.*
a.c.111t.threadfull1:  [1] Registering jump thread: (6, 7) incoming edge;  (7, 9) nocopy; 
a.c.126t.thread1:  [3] Registering jump thread: (2, 4) incoming edge;  (4, 12) nocopy; 
a.c.191t.thread2:  [5] Registering jump thread: (13, 11) incoming edge;  (11, 10) nocopy;
Comment 7 Aldy Hernandez 2021-11-12 10:19:58 UTC
(In reply to Aldy Hernandez from comment #6)
> Not looking at this yet, but disabling jump threading from all passes (dom
> included) makes the problem go away:
> 
> $ ./xgcc -B./ a.c -w -O2 -fno-thread-jumps && ./a.out
> element 1
> element 2
> element 3

a.c is main_loop_simpler.c
Comment 8 Aldy Hernandez 2021-11-12 15:23:41 UTC
The problem here is this thread:

[5] Registering jump thread: (13, 11) incoming edge;  (11, 10) nocopy; 

BB11 has this:

=========== BB 11 ============
Imports: n_42  
Exports: n_42  
Relational : (n_45 < n_42)
    <bb 11> [local count: 118111614]:
    # np_43 = PHI <np_19(13), &buffer_head(4)>
    # n_42 = PHI <m_31(13), addr_14(D)(4)>
    # m_31 = PHI <0(13), m_16(4)>
    n_45 = n_42 + -1;
    if (n_42 != 0)
      goto <bb 12>; [89.00%]
    else
      goto <bb 10>; [11.00%]

Because of the ordering of the import bitmap, we solve m_31 first to 0.  Then, when we solve n_42, we think we can use the m_31 in the cache, but the ordering is wrong.

PHIs must always be done first.  We went through a similar exercise to get relationals right and somehow missed this.
Comment 9 Aldy Hernandez 2021-11-12 15:24:17 UTC
Created attachment 51780 [details]
patch in testing
Comment 10 Sergei Trofimovich 2021-11-12 16:42:21 UTC
(In reply to Aldy Hernandez from comment #9)
> Created attachment 51780 [details]
> patch in testing

The patch fixed real ed-1.17 test suite as well. Thank you!
Comment 11 CVS Commits 2021-11-12 19:46:38 UTC
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>:

https://gcc.gnu.org/g:264f061997c0a5349cdce6d73f0dc167ac7fc8f4

commit r12-5216-g264f061997c0a5349cdce6d73f0dc167ac7fc8f4
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Fri Nov 12 16:08:01 2021 +0100

    path solver: Solve PHI imports first for ranges.
    
    PHIs must be resolved first while solving ranges in a block,
    regardless of where they appear in the import bitmap.  We went through
    a similar exercise for the relational code, but missed these.
    
    Tested on x86-64 & ppc64le Linux.
    
    gcc/ChangeLog:
    
            PR tree-optimization/103202
            * gimple-range-path.cc
            (path_range_query::compute_ranges_in_block): Solve PHI imports first.
Comment 12 Aldy Hernandez 2021-11-12 19:47:28 UTC
fixed