Bug 103037 - [11 Regression] Wrong code with -O2 since r11-6100-gd41b097350d3c5d0
Summary: [11 Regression] Wrong code with -O2 since r11-6100-gd41b097350d3c5d0
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P2 normal
Target Milestone: 12.0
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on: 104700
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2021-11-02 07:17 UTC by Vsevolod Livinskii
Modified: 2024-07-19 07:42 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.3.0, 12.0
Known to fail: 11.1.0, 11.3.0, 11.5.0
Last reconfirmed: 2022-01-29 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vsevolod Livinskii 2021-11-02 07:17:23 UTC
Link to the Compiler Explorer: https://godbolt.org/z/eaGzsPnax

Reproducer:
#include <stdio.h>

short var_3 = 2;
unsigned char var_9 = 23;
unsigned short var_11;
unsigned short arr_4 [23];

void test() __attribute__((noipa));

const unsigned short &min(unsigned short &d, const unsigned short &e) {
  return e < d ? e : d;
}

void test() {
  for (int a = 0; a < var_9; a += 3)
      var_11 = min(arr_4[a], 1) / (arr_4[a] ? arr_4[a] : var_3);
}


int main() {
    for (size_t i_0 = 0; i_0 < 23; ++i_0)
        arr_4 [i_0] = 2;
    test();
    printf("%hu\n", var_11);
    if (var_11 != 0)
        __builtin_abort();

}

Error:
>$ g++ driver.cpp -O0 && ./a.out
0
>$ g++ driver.cpp -O2 && ./a.out
1

GCC version:
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/testing/gcc/bin_master/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /testing/gcc/gcc_src_master/configure --enable-multilib --prefix=/testing/gcc/bin_master --disable-bootstrap
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20211101 (679652a77da60078392a834ed4b6b910127dbf24) (GCC)
Comment 1 Martin Liška 2021-11-02 07:36:29 UTC
Confirmed, started with r11-6100-gd41b097350d3c5d0.

A bit reduced:

short var_3 = 2;
unsigned char var_9 = 23;
unsigned short var_11;
unsigned short arr_4 [23];

void test() __attribute__((noipa));

const unsigned short &min(unsigned short &d, const unsigned short &e) {
  return e < d ? e : d;
}

void test() {
  for (int a = 0; a < var_9; a += 3)
      var_11 = min(arr_4[a], 1) / (arr_4[a] ? arr_4[a] : var_3);
}


int main() {
    for (unsigned long i_0 = 0; i_0 < 23; ++i_0)
        arr_4 [i_0] = 2;
    test();
    __builtin_printf("%hu\n", var_11);
    if (var_11 != 0)
        __builtin_abort();
    return 0;
}
Comment 2 Andrew Pinski 2021-11-02 07:52:32 UTC
(In reply to Martin Liška from comment #1)
> Confirmed, started with r11-6100-gd41b097350d3c5d0.

Hmm, it is definitely PRE causing the problem:

[changed] ANTIC_IN[7] := { _20 (0014), a_25 (0016), iftmp.0_10 (0008), _29 (0018), {trunc_div_expr,_29,iftmp.0_10} (0004), {nop_expr,_5} (0005), {plus_expr,a_25,3} (0012) }
S[7] := {  }
Applying pattern match.pd:340, gimple-match.c:16554
ANTIC_OUT[6] := { _20 (0014), iftmp.0_14 (0011), a_25 (0016), {plus_expr,a_25,3} (0012) }
[changed] ANTIC_IN[6] := { _20 (0014), iftmp.0_14 (0011), a_25 (0016), {plus_expr,a_25,3} (0012) }
S[6] := { _20 (0014), iftmp.0_14 (0011), a_25 (0016), {plus_expr,a_25,3} (0012) }
Created SSA_NAME representative pretmp_16 for expression:{trunc_div_expr,_28,_3} (0020)
ANTIC_OUT[5] := { _20 (0014), a_25 (0016), _28 (0017), iftmp.0_15 (0002), {plus_expr,a_25,3} (0012), {trunc_div_expr,_28,_3} (0020), {nop_expr,pretmp_16} (0021) }
[changed] ANTIC_IN[5] := { _20 (0014), a_25 (0016), _19 (0013), {nop_expr,_19} (0002), _28 (0017), {plus_expr,a_25,3} (0012), {trunc_div_expr,_28,_3} (0020), {nop_expr,pretmp_16} (0021) }
S[5] := { _20 (0014), a_25 (0016), _28 (0017), {plus_expr,a_25,3} (0012), {trunc_div_expr,_28,_3} (0020), {nop_expr,pretmp_16} (0021) }
Applying pattern match.pd:350, gimple-match.c:8615
ANTIC_OUT[9] := { _20 (0014), a_25 (0016), _19 (0013), {nop_expr,_19} (0002), {plus_expr,a_25,3} (0012) }
[changed] ANTIC_IN[9] := { _20 (0014), a_25 (0016), _19 (0013), {nop_expr,_19} (0002), {plus_expr,a_25,3} (0012) }
S[9] := { _20 (0014), a_25 (0016), _19 (0013), {nop_expr,_19} (0002), {plus_expr,a_25,3} (0012) }
Applying pattern match.pd:350, gimple-match.c:8615
Comment 3 Richard Biener 2021-11-02 08:43:55 UTC
I will have a look.  There might be latent availability issues with PRE simplification still.
Comment 4 Jakub Jelinek 2021-11-02 08:51:44 UTC
The match.pd:358 triggers twice during PRE, but certainly before PRE there is no division with boolean range second operand, the only one in there is
  _5 = _25 / iftmp.1_10;
where iftmp.1_10 has
  # RANGE [-32768, 65535]
  # iftmp.1_10 = PHI <iftmp.1_15(5), iftmp.1_14(6)>
Comment 5 Jakub Jelinek 2021-11-02 09:08:18 UTC
We have:
  var_3.2_4 = var_3;
  iftmp.1_14 = (int) var_3.2_4;
  var_11_lsm.14_8 = _7(D);

  <bb 3> [local count: 955630225]:
  # RANGE [0, 24] NONZERO 31
  # a_20 = PHI <a_18(10), 0(9)>
  _19 = MEM <short unsigned int[23]> [(short unsigned int &)&arr_4][a_20];
  if (_19 > 1)
    goto <bb 5>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 4> [local count: 477815112]:
  # RANGE [0, 1] NONZERO 1
  _3 = (int) _19;
  if (_19 != 0)
    goto <bb 5>; [20.00%]
  else
    goto <bb 6>; [80.00%]
  
  <bb 5> [local count: 477815112]:
  # RANGE [0, 1] NONZERO 1
  # _24 = PHI <_3(4), 1(3)>
  # RANGE [1, 65535] NONZERO 65535
  iftmp.1_15 = (int) _19;
  goto <bb 7>; [100.00%]

  <bb 6> [local count: 477815112]:
  
  <bb 7> [local count: 955630225]:
  # RANGE [-32768, 65535]
  # iftmp.1_10 = PHI <iftmp.1_15(5), iftmp.1_14(6)>
  # RANGE [0, 1] NONZERO 1
  # _25 = PHI <_24(5), 0(6)>
  # RANGE [-1, 1]
  _5 = _25 / iftmp.1_10;
and seems PRE is trying top simplify 1 / _3 and _3 / _3.  _3 has correctly range of [0, 1] - in that bb _19 can't be anything but 0 or 1 given the condition, but
iftmp.1_15 = (int) _19; isn't equivalent to that, because it is reachable from other bbs, even when it is also (int) _19.
So, does PRE need to temporarily reset_flow_sensitive_info in such cases and restore if it didn't succeed?
Comment 6 rguenther@suse.de 2021-11-02 14:18:26 UTC
On Tue, 2 Nov 2021, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103037
> 
> --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> We have:
>   var_3.2_4 = var_3;
>   iftmp.1_14 = (int) var_3.2_4;
>   var_11_lsm.14_8 = _7(D);
> 
>   <bb 3> [local count: 955630225]:
>   # RANGE [0, 24] NONZERO 31
>   # a_20 = PHI <a_18(10), 0(9)>
>   _19 = MEM <short unsigned int[23]> [(short unsigned int &)&arr_4][a_20];
>   if (_19 > 1)
>     goto <bb 5>; [50.00%]
>   else
>     goto <bb 4>; [50.00%]
> 
>   <bb 4> [local count: 477815112]:
>   # RANGE [0, 1] NONZERO 1
>   _3 = (int) _19;
>   if (_19 != 0)
>     goto <bb 5>; [20.00%]
>   else
>     goto <bb 6>; [80.00%]
> 
>   <bb 5> [local count: 477815112]:
>   # RANGE [0, 1] NONZERO 1
>   # _24 = PHI <_3(4), 1(3)>
>   # RANGE [1, 65535] NONZERO 65535
>   iftmp.1_15 = (int) _19;
>   goto <bb 7>; [100.00%]
> 
>   <bb 6> [local count: 477815112]:
> 
>   <bb 7> [local count: 955630225]:
>   # RANGE [-32768, 65535]
>   # iftmp.1_10 = PHI <iftmp.1_15(5), iftmp.1_14(6)>
>   # RANGE [0, 1] NONZERO 1
>   # _25 = PHI <_24(5), 0(6)>
>   # RANGE [-1, 1]
>   _5 = _25 / iftmp.1_10;
> and seems PRE is trying top simplify 1 / _3 and _3 / _3.  _3 has correctly
> range of [0, 1] - in that bb _19 can't be anything but 0 or 1 given the
> condition, but
> iftmp.1_15 = (int) _19; isn't equivalent to that, because it is reachable from
> other bbs, even when it is also (int) _19.
> So, does PRE need to temporarily reset_flow_sensitive_info in such cases and
> restore if it didn't succeed?

No, we have means to avoid this situation but somehow it doesn't work.
Give me some time to look into this.
Comment 7 Richard Biener 2021-11-03 10:08:32 UTC
So we value number iftmp.0_15 to _3 and things start to go downhill when we
PHI translate _25 / iftmp.0_10 5 -> 7 as _24 / _3 since there's too much
sharing of the PRE IL used for PHI translation and the VN tables used for
value-numbering.

I did try to untangle this mess a few times already.  I will see what I can do during stage3.
Comment 8 Richard Biener 2022-02-23 15:25:36 UTC
More reduced testcase - the min() obfuscation is to avoid recognizing a MIN_EXPR before jump threading gets the chance to disrupt it.  A GIMPLE unit testcase for
PRE is difficult since we are not supporting SSA annotations (ranges) at the moment.

static inline const unsigned short *
min(unsigned short *d, const unsigned short *e)
{
  return *e < *d ? e : d;
}

unsigned short __attribute__((noipa))
test(unsigned short arr, unsigned short val)
{
  unsigned short tem = 1;
  unsigned short tem2 = *min(&arr, &tem);
  return tem2 / (arr ? arr : val);
}

int
main()
{
  if (test (2, 2) != 0)
    __builtin_abort();
  return 0;
}

The IL for the reduced testcase before PRE is

  <bb 2> [local count: 1073741824]:
  if (arr_6(D) > 1)
    goto <bb 4>; [50.00%]
  else
    goto <bb 3>; [50.00%]

  <bb 3> [local count: 536870912]:
  # RANGE [0, 1] NONZERO 1
  _1 = (int) arr_6(D);                 // (A)
  if (arr_6(D) != 0)
    goto <bb 4>; [20.00%]
  else
    goto <bb 5>; [80.00%]

  <bb 4> [local count: 536870913]:
  # RANGE [0, 1] NONZERO 1
  # _17 = PHI <_1(3), 1(2)>
  # RANGE [1, 65535] NONZERO 65535
  iftmp.0_9 = (int) arr_6(D);          // (B) 
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 536870913]:
  # RANGE [0, 65535] NONZERO 65535
  iftmp.0_8 = (int) val_7(D);

  <bb 6> [local count: 1073741824]:
  # RANGE [0, 65535] NONZERO 65535
  # iftmp.0_3 = PHI <iftmp.0_9(4), iftmp.0_8(5)>
  # RANGE [0, 1] NONZERO 1
  # _18 = PHI <_17(4), 0(5)>
  # RANGE [0, 1] NONZERO 1
  _2 = _18 / iftmp.0_3;
  # RANGE [0, 1] NONZERO 1
  _10 = (short unsigned int) _2;
  return _10;

and the problematic value-numbering is

iftmp.0_9 = _1 (0001)

at (A) and (B).  By design we may only use _available_ expressions during simplification but _1 is not available on the 4 -> 6 edge.  That's working
OK for the translation from 4 -> 6 but then we record a valueized
expression into the ANTIC set of 4 which when further translated and
simplified has the wrong representative inside.
Comment 9 GCC Commits 2022-02-25 12:21:01 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:e25dce501334053239dcc433e4c46ecbddbcb13e

commit r12-7389-ge25dce501334053239dcc433e4c46ecbddbcb13e
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Feb 24 13:04:29 2022 +0100

    tree-optimization/103037 - PRE simplifying valueized expressions
    
    This fixes a long-standing issue in PRE where we track valueized
    expressions in our expression sets that we use for PHI translation,
    code insertion but also feed into match-and-simplify via
    vn_nary_simplify.  But that's not what is expected from vn_nary_simplify
    or match-and-simplify which assume we are simplifying with operands
    available at the point of the expression so they can use contextual
    information on the SSA names like ranges.  While the VN side was
    updated to ensure this with the rewrite to RPO VN, thereby removing
    all workarounds that nullified such contextual info on all SSA names,
    the PRE side still suffers from this.
    
    The following patch tries to apply minimal surgery at this point
    and makes PRE track un-valueized expressions in the expression sets
    but only for the NARY kind (both NAME and CONSTANT do not suffer
    from this issue), leaving the REFERENCE kind alone.  The REFERENCE
    kind is important when trying to remove the workarounds still in
    place in compute_avail for code hoisting, but that's a separate issue
    and we have a working workaround in place.
    
    Doing this comes at the cost of duplicating the VN IL on the PRE side
    for NARY and eventually some extra overhead for translated expressions
    that is difficult to assess.
    
    2022-02-25  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/103037
            * tree-ssa-sccvn.h (alloc_vn_nary_op_noinit): Declare.
            (vn_nary_length_from_stmt): Likewise.
            (init_vn_nary_op_from_stmt): Likewise.
            (vn_nary_op_compute_hash): Likewise.
            * tree-ssa-sccvn.cc (alloc_vn_nary_op_noinit): Export.
            (vn_nary_length_from_stmt): Likewise.
            (init_vn_nary_op_from_stmt): Likewise.
            (vn_nary_op_compute_hash): Likewise.
            * tree-ssa-pre.cc (pre_expr_obstack): New obstack.
            (get_or_alloc_expr_for_nary): Pass in the value-id to use,
            (re-)compute the hash value and if the expression is not
            found allocate it from pre_expr_obstack.
            (phi_translate_1): Do not insert the NARY found in the
            VN tables but build a PRE expression from the valueized
            NARY with the value-id we eventually found.
            (find_or_generate_expression): Assert we have an entry
            for constant values.
            (compute_avail): Insert not valueized expressions into
            EXP_GEN using the value-id from the VN tables.
            (init_pre): Allocate pre_expr_obstack.
            (fini_pre): Free pre_expr_obstack.
    
            * gcc.dg/torture/pr103037.c: New testcase.
Comment 10 Richard Biener 2022-02-25 12:21:37 UTC
Fixed on trunk sofar.  Let's watch for fallout.
Comment 11 Richard Biener 2022-04-21 07:50:41 UTC
GCC 11.3 is being released, retargeting bugs to GCC 11.4.
Comment 12 Jakub Jelinek 2023-05-29 10:05:58 UTC
GCC 11.4 is being released, retargeting bugs to GCC 11.5.
Comment 13 Richard Biener 2024-07-19 07:42:16 UTC
Fixed in GCC 12.