Bug 102511 - [12 Regression] GCC produces incorrect code for -O3: first element of the array is skipped after r12-3903
Summary: [12 Regression] GCC produces incorrect code for -O3: first element of the arr...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P1 blocker
Target Milestone: 12.0
Assignee: Aldy Hernandez
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2021-09-28 03:11 UTC by Vsevolod Livinskii
Modified: 2021-11-01 23:07 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-09-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vsevolod Livinskii 2021-09-28 03:11:31 UTC
Reproducer:
//func.cpp
extern char arr_15[];
void test(signed char a, unsigned short b, unsigned long long c,
          unsigned short f) {
  for (int d = b - 8; d < b; d += 2)
    for (short e = 0; e < (unsigned short)((f ? 122 : 0) ^ (a ? c : 0)) - 64055;
         e += 3)
      arr_15[d] = 42;
}

//driver.cpp 
#include <stdio.h>

unsigned char arr_15 [8];

void test(signed char a, unsigned short b, unsigned long long int c, unsigned short f);

int main() {
    test(37, 8, 12325048486467861044ULL, 45936);
    for (size_t i_0 = 0; i_0 < 8; ++i_0)
            printf("%d ", arr_15 [i_0]);
    printf("\n");
}

Error:
>$ g++ -O3 driver.cpp func.cpp && ./a.out 
0 0 42 0 42 0 42 0 
>$ g++ -O2 driver.cpp func.cpp && ./a.out 
42 0 42 0 42 0 42 0 

GCC version:

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/testing/gcc/bin/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /testing/gcc/gcc_src/configure --enable-multilib --prefix=/testing/gcc/bin --disable-bootstrap
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210927 (51018dd1395c72b3681ae5f84eceb94320472922) (GCC)
Comment 1 Andrew Pinski 2021-09-28 04:24:19 UTC
Does not work:
r12-3906-g51018dd1
Works:
r12-3890-gfe2771b2

Note there is one undefined behavior in this code, but it does not matter really.
The types of arr_15 don't match but after fixing it should fail still.
Comment 2 Andrew Pinski 2021-09-28 04:33:52 UTC
Here is a single file version which shows the problem in the latest trunk:
#include <stdio.h>
extern  char arr_15[];
__attribute__((noipa))
void test(signed char a, unsigned short b, unsigned long long c,
          unsigned short f) {
  for (int d = b - 8; d < b; d += 2)
    for (short e = 0; e < (unsigned short)((f ? 122 : 0) ^ (a ? c : 0)) - 64055;
         e += 3)
      arr_15[d] = 42;
}

 char arr_15 [8];

void test(signed char a, unsigned short b, unsigned long long int c, unsigned short f);

int main() {
    test(37, 8, 12325048486467861044ULL, 45936);
    for (size_t i_0 = 0; i_0 < 8; ++i_0)
            printf("%d ", arr_15 [i_0]);
    printf("\n");
}
------------------- CUT ------------
Confirmed; I will change it to one which aborts in a few minutes.
Comment 3 Andrew Pinski 2021-09-28 04:37:04 UTC
Here is one which aborts and shows the issue:
char arr_15 [8];
__attribute__((noipa))
void test(signed char a, unsigned short b, unsigned long long c,
          unsigned short f) {
  for (int d = b - 8; d < b; d += 2)
    for (short e = 0; e < (unsigned short)((f ? 122 : 0) ^ (a ? c : 0)) - 64055;
         e += 3)
      arr_15[d] = 42;
}
int main() {
    test(37, 8, 12325048486467861044ULL, 45936);
    for (int i = 0; i < 8; ++i)
      {
        if (arr_15[i] != ((i&1) ? 0 : 42))
          __builtin_abort();
      }
  return 0;
}
Comment 4 Andrew Pinski 2021-09-28 05:23:34 UTC
So the difference between -O2 and -O3 that matters here is -ftree-vectorize but it is not the vectorizer that is making the difference but rather ch_vect.
But I can't figure out what is going wrong.

Also adding -fwrapv fixes it too.
Comment 5 Andrew Pinski 2021-09-28 05:37:30 UTC
Bisected down to r12-3903-g0288527f47cec66.
Comment 6 Aldy Hernandez 2021-09-28 07:36:26 UTC
[Bumping this up to a P1.]

The problem here is that we're incorrectly threading a path in the vrp-thread2 pass.  As there is only one registered thread in the dump file (-fdump-tree-vrp-thread2-details), it's easy to find the culprit:

  Registering jump thread: (2, 6) incoming edge;  (6, 5) normal;

With a little bisecting, we can find the exact path as its being registered:

$ ./xgcc -B./ -O3 a.c -fdbg-cnt=registered_jump_thread:10,10 && ./a.out
***dbgcnt: lower limit 1 reached for registered_jump_thread.***
***dbgcnt: upper limit 10 reached for registered_jump_thread.***
Aborted (core dumped)

That is, the 10th attempt at registering a path.

If we turn on debugging in the solver (DEBUG_SOLVER in gimple-range-path.cc), we can see the solver as it precomputes the SSAs along the path.  Basically, we find the dbg counter and look at the dump prior to that:

***dbgcnt: upper limit 10 reached for registered_jump_thread.***
  Registering jump thread: (2, 6) incoming edge;  (6, 5) normal; 

Immediately before this, we see the solver output:

*********** path_range_query ******************
 Registering value_relation (path_oracle) (ivtmp.21_14 == ivtmp.21_32) (bb2)

path_range_query: compute_ranges for path: BB 2, BB 6
range_defined_in_block (BB2) for iftmp.1_11 is short unsigned int [0, 0][122, 122]
range_defined_in_block (BB2) for _8 is long long unsigned int [0, 0]
range_defined_in_block (BB2) for iftmp.2_12 is short unsigned int [0, 0]
range_defined_in_block (BB2) for _5 is short unsigned int [0, 0][122, 122]
range_defined_in_block (BB2) for _6 is int [0, 0][122, 122]
 Registering value_relation (path_oracle) (_7 < _6) (bb2)
range_defined_in_block (BB2) for _7 is int [-64055, -64055][-63933, -63933]
range_defined_in_block (BB2) for iftmp.1_11 is short unsigned int [0, 0][122, 122]

Path is (length=2):

=========== BB 2 ============
b_15(D)	short unsigned int VARYING
c_20(D)	long long unsigned int VARYING
Equivalence set : [_36]
Relational : (_7 < _6)
Relational : (d_16 < _1)
    <bb 2> [local count: 29527901]:
    _1 = (int) b_15(D);
    d_16 = _1 + -8;
    iftmp.1_11 = f_18(D) != 0 ? 122 : 0;
    _8 = a_19(D) != 0 ? c_20(D) : 0;
    iftmp.2_12 = (short unsigned int) _8;
    _5 = iftmp.1_11 ^ iftmp.2_12;
    _6 = (int) _5;
    _7 = _6 + -64055;
    ivtmp.21_14 = (sizetype) d_16;
    _36 = (sizetype) b_15(D);
    goto <bb 6>; [100.00%]

_1 : int [0, 65535]
_6 : int [0, 65535]
_7 : int [-64055, 1480]
iftmp.1_11 : short unsigned int [0, 0][122, 122]
ivtmp.21_14 : sizetype [0, 65527][18446744073709551608, +INF]
d_16 : int [-8, 65527]
_36 : sizetype [0, 65535]

=========== BB 6 ============
Imports: _7  iftmp.1_11  c_20(D)  
Exports: _5  _6  _7  _8  iftmp.1_11  iftmp.2_12  c_20(D)  
_7	int [-64055, 1480]
_36	sizetype [0, 65535]
    <bb 6> [local count: 118111600]:
    # ivtmp.21_32 = PHI <ivtmp.21_24(9), ivtmp.21_14(2)>
    if (_7 > 0)
      goto <bb 8>; [89.00%]
    else
      goto <bb 5>; [11.00%]

6->8  (T) _7 : 	int [1, 1480]
6->5  (F) _7 : 	int [-64055, 0]

The problematic thread is out of BB6, so the threader thinks it knows enough about _7 to solve _7 > 0.  Chasing back the definition of _7 we end up in _8, which the solver incorrectly thinks is 0:

range_defined_in_block (BB2) for _8 is long long unsigned int [0, 0]

If we assume _8 is 0, shit rolls downhill from here.

Describing the process to get here makes it abundantly clear that we need to improve the process of debugging this.  We need a way to turn on the solver debugging from the command line (--param??), and not by some magic #define.  Also, some counter that matches the path being registered with the equivalent solver dump would be useful.  This way someone could easily find the problematic thread in the solver dump.  I'll look into this.

Anywhoo...

The issue here is that range_on_path_entry is incorrectly returning UNDEFINED if it can't find any incoming edges (excluding the entry block).  In this case BB2's only predecessor is the entry block, so we return UNDEFINED by mistake.  UNDEFINED means unreachable, so things go very badly, very quickly.

Here is a proposed patch I will test:

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 71e04e4deba..9da67d2a35b 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -136,14 +136,23 @@ path_range_query::range_on_path_entry (irange &r, tree name)
 {
   int_range_max tmp;
   basic_block entry = entry_bb ();
+  bool changed = false;
+
   r.set_undefined ();
   for (unsigned i = 0; i < EDGE_COUNT (entry->preds); ++i)
     {
       edge e = EDGE_PRED (entry, i);
       if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
 	  && m_ranger.range_on_edge (tmp, e, name))
-	r.union_ (tmp);
+	{
+	  r.union_ (tmp);
+	  changed = true;
+	}
     }
+
+  // Make sure we don't return UNDEFINED by mistake.
+  if (!changed)
+    r.set_varying (TREE_TYPE (name));
 }
 
 // Return the range of NAME at the end of the path being analyzed.
Comment 7 Andrew Pinski 2021-09-28 07:44:52 UTC
(In reply to Aldy Hernandez from comment #6)
> Describing the process to get here makes it abundantly clear that we need to
> improve the process of debugging this.  We need a way to turn on the solver
> debugging from the command line (--param??), and not by some magic #define. 
> Also, some counter that matches the path being registered with the
> equivalent solver dump would be useful.  This way someone could easily find
> the problematic thread in the solver dump.  I'll look into this.

Maybe look into dbgcnt.def and their usage. It should help with the counter situtation :).
Comment 8 GCC Commits 2021-09-28 09:12:20 UTC
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>:

https://gcc.gnu.org/g:fb8b72ebb5b0bf40f7dfef9154c42320ce46f2a7

commit r12-3916-gfb8b72ebb5b0bf40f7dfef9154c42320ce46f2a7
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Tue Sep 28 09:38:50 2021 +0200

    Return VARYING in range_on_path_entry if nothing found.
    
    The problem here is that the solver's code solving unknown SSAs on entry
    to a path was returning UNDEFINED if there were no incoming edges to the
    start of the path that were not the function entry block.  This caused a
    cascade of pain down stream.
    
    Tested on x86-64 Linux.
    
            PR tree-optimization/102511
    
    gcc/ChangeLog:
    
            * gimple-range-path.cc (path_range_query::range_on_path_entry):
            Return VARYING when nothing found.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/pr102511.c: New test.
            * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Adjust.
Comment 9 Aldy Hernandez 2021-09-28 09:19:48 UTC
fixed
Comment 10 Aldy Hernandez 2021-09-28 14:51:04 UTC
(In reply to Andrew Pinski from comment #7)
> (In reply to Aldy Hernandez from comment #6)
> > Describing the process to get here makes it abundantly clear that we need to
> > improve the process of debugging this.  We need a way to turn on the solver
> > debugging from the command line (--param??), and not by some magic #define. 
> > Also, some counter that matches the path being registered with the
> > equivalent solver dump would be useful.  This way someone could easily find
> > the problematic thread in the solver dump.  I'll look into this.
> 
> Maybe look into dbgcnt.def and their usage. It should help with the counter
> situtation :).

Thanks Andrew.  I've used your idea in my committed patch here:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580369.html

It's not perfect, but it should help in diagnosing these issues in the future.  I hope to improve the dumps in the next month, to help any stage3 bugfixing.