Bug 102736 - [12 Regression] wrong code at -O1 -ftree-vrp by r12-3903
Summary: [12 Regression] wrong code at -O1 -ftree-vrp by r12-3903
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P1 normal
Target Milestone: 12.0
Assignee: Aldy Hernandez
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2021-10-13 20:54 UTC by Zhendong Su
Modified: 2021-10-14 12:24 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-10-13 00:00:00


Attachments
proposed patch in testing (1.39 KB, application/mbox)
2021-10-14 09:02 UTC, Aldy Hernandez
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zhendong Su 2021-10-13 20:54:17 UTC
This appears to be a recent regression.

[592] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap --prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20211013 (experimental) [master r12-4363-g52055987fba] (GCC) 
[593] % 
[593] % gcctk -Os small.c; ./a.out
[594] % 
[594] % gcctk -O2 small.c
[595] % ./a.out
Aborted
[596] % 
[596] % cat small.c
int a, b = -1, c;
int d = 1;
char e(char f, int g) { return g ? f : 0; }
char h(char f) { return f < a ? f : f < a; }
unsigned char i(unsigned char f, int g) { return g ? f : f > g; }
void j() {
L:
  c = e(1, i(h(b), d));
  if (b)
    return;
  goto L;
}
int main() {
  j();
  if (c != 1)
    __builtin_abort ();
  return 0;
}
Comment 1 Andrew Pinski 2021-10-13 21:00:21 UTC
Take:

int a, b = -1, c;
int d = 1;
static inline char e(char f, int g) { return g ? f : 0; }
static inline char h(char f) { return f < a ? f : f < a; }
static inline unsigned char i(unsigned char f, int g) { return g ? f : f > g; }
void j() {
L:
  c = e(1, i(h(b), d));
  if (b)
    return;
  goto L;
}
int main() {
  j();
  if (c != 1)
    __builtin_abort ();
  return 0;
}
---- CUT ----
This fails at -O1 -ftree-vrp or -O2 and above.
Comment 2 Andrew Macleod 2021-10-13 22:52:53 UTC
works with  --disable-tree-vrp-thread1


Looking at the  .vrp-thread1 listing, I see a lot of

 Registering value_relation (_4 >= a.4_14) on (3->4)
 Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb4)
  [1] Registering jump thread: (4, 5) incoming edge;  (5, 7) joiner (7, 8) normal (8, 9) nocopy;
 Registering value_relation (path_oracle) (iftmp.6_12 == iftmp.6_13) (bb6)
 Registering value_relation (path_oracle) (iftmp.6_12 == iftmp.6_13) (bb6)
  [2] Registering jump thread: (6, 7) incoming edge;  (7, 9) joiner;
 Registering value_relation (path_oracle) (iftmp.6_12 == iftmp.6_13) (bb6)
  [3] Registering jump thread: (6, 7) incoming edge;  (7, 8) joiner (8, 9) nocopy;
 Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb5)
 Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb5)
 Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb5)
 Registering value_relation (path_oracle) (_4 < a.4_14) (bb3)
 Registering value_relation (path_oracle) (_3 == iftmp.3_15) (bb3)
 Registering value_relation (path_oracle) (_4 < a.4_14) (bb3)
 Registering value_relation (path_oracle) (_3 == iftmp.3_15) (bb3)
 Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb3)

And then 2 blocks get removed. I'd hazard a guess that we're getting a relation wrong...  but we shall see.
Comment 3 Aldy Hernandez 2021-10-14 08:38:05 UTC
(In reply to Andrew Macleod from comment #2)
> works with  --disable-tree-vrp-thread1
> 
> 
> Looking at the  .vrp-thread1 listing, I see a lot of
> 
>  Registering value_relation (_4 >= a.4_14) on (3->4)
>  Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb4)
>   [1] Registering jump thread: (4, 5) incoming edge;  (5, 7) joiner (7, 8)
> normal (8, 9) nocopy;
>  Registering value_relation (path_oracle) (iftmp.6_12 == iftmp.6_13) (bb6)
>  Registering value_relation (path_oracle) (iftmp.6_12 == iftmp.6_13) (bb6)
>   [2] Registering jump thread: (6, 7) incoming edge;  (7, 9) joiner;
>  Registering value_relation (path_oracle) (iftmp.6_12 == iftmp.6_13) (bb6)
>   [3] Registering jump thread: (6, 7) incoming edge;  (7, 8) joiner (8, 9)
> nocopy;
>  Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb5)
>  Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb5)
>  Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb5)
>  Registering value_relation (path_oracle) (_4 < a.4_14) (bb3)
>  Registering value_relation (path_oracle) (_3 == iftmp.3_15) (bb3)
>  Registering value_relation (path_oracle) (_4 < a.4_14) (bb3)
>  Registering value_relation (path_oracle) (_3 == iftmp.3_15) (bb3)
>  Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb3)

What you're seeing here is the verbosity out of path_oracle::register_relation for each candidate path as it's being tried.

What I've been doing is avoiding dumping the details of the path solver in action, unless TDF_THREADING, but the above message is coming from the path oracle itself, which is keyed off of TDF_DETAILS.

This is a bit confusing.  Perhaps we should silence these messages unless TDF_THREADING?  What do you think?
Comment 4 Aldy Hernandez 2021-10-14 09:01:54 UTC
Here are some debugging tips for when I'm not around ;-).

When the jump threader is suspected, I find it useful to do some bisecting to find the actual jump thread that is causing the problem.  Note that sometimes it's a combination of threaded paths, but most of the time it's just one.

First I make sure the problem goes away without jump threading:

abulafia:~/bld/t/gcc$ ./xgcc -B./ a.c -O2 -fno-thread-jumps
abulafia:~/bld/t/gcc$ ./a.out
abulafia:~/bld/t/gcc$ 

And then I start playing -fdbg-cnt games, which ultimately led me to:

abulafia:~/bld/t/gcc$ ./xgcc -B./ a.c -O2 -fdbg-cnt=registered_jump_thread:4-4 -fdump-tree-all-details-threading
***dbgcnt: lower limit 4 reached for registered_jump_thread.***
***dbgcnt: upper limit 4 reached for registered_jump_thread.***
abulafia:~/bld/t/gcc$ ./a.out
Aborted (core dumped)

The 4th jump threaded path reproduces the problem.

I then look at the dump file with the dbgcnt message (*.vrp-thread1):

***dbgcnt: lower limit 4 reached for registered_jump_thread.***
***dbgcnt: upper limit 4 reached for registered_jump_thread.***
  [4] Registering jump thread: (3, 5) incoming edge;  (5, 7) joiner (7, 8) normal (8, 9) nocopy; 

The block immediately preceding this message is the path solver in action:

*********** path_range_query ******************
 Registering value_relation (path_oracle) (_4 < a.4_14) (bb3)
 Registering value_relation (path_oracle) (_3 == iftmp.3_15) (bb3)
 Registering value_relation (path_oracle) (_5 == iftmp.6_13) (bb3)

path_range_query: compute_ranges for path: BB 3, BB 5, BB 7
range_defined_in_block (BB5) for iftmp.3_15 is char [0, 0]
range_defined_in_block (BB5) for _5 is unsigned char [0, 0]
range_defined_in_block (BB5) for iftmp.3_15 is char [0, 0]
range_defined_in_block (BB7) for iftmp.6_13 is unsigned char [0, 0]

Path is (length=3):

=========== BB 3 ============
Imports: b.1_2  a.4_14  
Exports: b.1_2  _3  _4  a.4_14  
         _3 : b.1_2(I)  
         _4 : b.1_2(I)  _3  
    <bb 3> [local count: 1073741824]:
  L:
    d.0_1 = d;
    b.1_2 = b;
    _3 = (char) b.1_2;
    _4 = (int) _3;
    a.4_14 = a;
    if (_4 >= a.4_14)
      goto <bb 4>; [50.00%]
    else
      goto <bb 5>; [50.00%]

_4 : int [-128, 127]
3->4  (T) _4 : 	int [-128, 127]
3->4  (T) a.4_14 : 	int [-INF, 127]
3->5  (F) _4 : 	int [-128, 127]
3->5  (F) a.4_14 : 	int [-127, +INF]

=========== BB 5 ============
Imports: d.0_1  
Exports: d.0_1  
d.0_1	int VARYING
b.1_2	int VARYING
    <bb 5> [local count: 1073741824]:
    # iftmp.3_15 = PHI <_3(3), 0(4)>
    _5 = (unsigned char) iftmp.3_15;
    if (d.0_1 == 0)
      goto <bb 6>; [50.00%]
    else
      goto <bb 7>; [50.00%]

5->6  (T) d.0_1 : 	int [0, 0]
5->7  (F) d.0_1 : 	int [-INF, -1][1, +INF]
etc
etc
etc

The solver decided that iftmp.3_15 is [0, 0] along the path:

  range_defined_in_block (BB5) for iftmp.3_15 is char [0, 0]

This makes absolutely no sense.  The iftmp.3_15 SSA is _3 on the 3->5 edge, and we have no knowledge of _3 in BB3.  Well, unless we had some global range for _3 from a previous *vrp pass, but that's not the case here.

The problem here is that since we didn't have a range so far for _3, we decided to ask for the ranger's range on entry to the path.  This is incorrect, because _3 is not live on entry.  Assuming it is had us calling range_on_edge on each incoming edge to BB3, which ranger was happy to return UNDEFINED for.  This was an oversight.  We should never call range_on_path_entry for things defined in the path.  The other call to range_on_path_entry had an appropriate gate.  The one when calculating PHIs did not.
Comment 5 Aldy Hernandez 2021-10-14 09:02:28 UTC
Created attachment 51598 [details]
proposed patch in testing
Comment 6 GCC Commits 2021-10-14 12:23:09 UTC
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>:

https://gcc.gnu.org/g:a311163fd81babd6116e2856f4551c3ca13d8914

commit r12-4395-ga311163fd81babd6116e2856f4551c3ca13d8914
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Thu Oct 14 10:28:39 2021 +0200

    Do not call range_on_path_entry for SSAs defined within the path
    
    In the path solver, when requesting the range of an SSA for which we
    know nothing, we ask the ranger for the range incoming to the path.
    We do this by asking for all the incoming ranges to the path entry
    block and unioning them.
    
    The problem here is that we're asking for a range on path entry for an
    SSA which *is* defined in the path, but for which we know nothing
    about:
    
            some_global.1_2 = some_global;
            _3 = (char) some_global.1_2;
    
    This request is causing us to ask for range_on_edge of _3 on the
    incoming edges to the path.  This is a bit of nonsensical request
    because _3 isn't live on entry to the path, so ranger correctly
    returns UNDEFINED.  The proper thing is to avoid asking this in the
    first place.
    
    I have added a relevant assert, since it doesn't make sense to call
    range_on_path_entry for SSAs defined within the path.
    
    Tested on x86-64 Linux.
    
            PR tree-optimization/102736
    
    gcc/ChangeLog:
    
            PR tree-optimization/102736
            * gimple-range-path.cc (path_range_query::range_on_path_entry):
            Assert that the requested range is defined outside the path.
            (path_range_query::ssa_range_in_phi): Do not call
            range_on_path_entry for SSA names that are defined within the
            path.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/tree-ssa/pr102736.c: New test.
Comment 7 Aldy Hernandez 2021-10-14 12:24:04 UTC
fixed