Link to the Compiler Explorer: https://godbolt.org/z/eaGzsPnax Reproducer: #include <stdio.h> short var_3 = 2; unsigned char var_9 = 23; unsigned short var_11; unsigned short arr_4 [23]; void test() __attribute__((noipa)); const unsigned short &min(unsigned short &d, const unsigned short &e) { return e < d ? e : d; } void test() { for (int a = 0; a < var_9; a += 3) var_11 = min(arr_4[a], 1) / (arr_4[a] ? arr_4[a] : var_3); } int main() { for (size_t i_0 = 0; i_0 < 23; ++i_0) arr_4 [i_0] = 2; test(); printf("%hu\n", var_11); if (var_11 != 0) __builtin_abort(); } Error: >$ g++ driver.cpp -O0 && ./a.out 0 >$ g++ driver.cpp -O2 && ./a.out 1 GCC version: Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/testing/gcc/bin_master/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /testing/gcc/gcc_src_master/configure --enable-multilib --prefix=/testing/gcc/bin_master --disable-bootstrap Thread model: posix Supported LTO compression algorithms: zlib gcc version 12.0.0 20211101 (679652a77da60078392a834ed4b6b910127dbf24) (GCC)
Confirmed, started with r11-6100-gd41b097350d3c5d0. A bit reduced: short var_3 = 2; unsigned char var_9 = 23; unsigned short var_11; unsigned short arr_4 [23]; void test() __attribute__((noipa)); const unsigned short &min(unsigned short &d, const unsigned short &e) { return e < d ? e : d; } void test() { for (int a = 0; a < var_9; a += 3) var_11 = min(arr_4[a], 1) / (arr_4[a] ? arr_4[a] : var_3); } int main() { for (unsigned long i_0 = 0; i_0 < 23; ++i_0) arr_4 [i_0] = 2; test(); __builtin_printf("%hu\n", var_11); if (var_11 != 0) __builtin_abort(); return 0; }
(In reply to Martin Liška from comment #1) > Confirmed, started with r11-6100-gd41b097350d3c5d0. Hmm, it is definitely PRE causing the problem: [changed] ANTIC_IN[7] := { _20 (0014), a_25 (0016), iftmp.0_10 (0008), _29 (0018), {trunc_div_expr,_29,iftmp.0_10} (0004), {nop_expr,_5} (0005), {plus_expr,a_25,3} (0012) } S[7] := { } Applying pattern match.pd:340, gimple-match.c:16554 ANTIC_OUT[6] := { _20 (0014), iftmp.0_14 (0011), a_25 (0016), {plus_expr,a_25,3} (0012) } [changed] ANTIC_IN[6] := { _20 (0014), iftmp.0_14 (0011), a_25 (0016), {plus_expr,a_25,3} (0012) } S[6] := { _20 (0014), iftmp.0_14 (0011), a_25 (0016), {plus_expr,a_25,3} (0012) } Created SSA_NAME representative pretmp_16 for expression:{trunc_div_expr,_28,_3} (0020) ANTIC_OUT[5] := { _20 (0014), a_25 (0016), _28 (0017), iftmp.0_15 (0002), {plus_expr,a_25,3} (0012), {trunc_div_expr,_28,_3} (0020), {nop_expr,pretmp_16} (0021) } [changed] ANTIC_IN[5] := { _20 (0014), a_25 (0016), _19 (0013), {nop_expr,_19} (0002), _28 (0017), {plus_expr,a_25,3} (0012), {trunc_div_expr,_28,_3} (0020), {nop_expr,pretmp_16} (0021) } S[5] := { _20 (0014), a_25 (0016), _28 (0017), {plus_expr,a_25,3} (0012), {trunc_div_expr,_28,_3} (0020), {nop_expr,pretmp_16} (0021) } Applying pattern match.pd:350, gimple-match.c:8615 ANTIC_OUT[9] := { _20 (0014), a_25 (0016), _19 (0013), {nop_expr,_19} (0002), {plus_expr,a_25,3} (0012) } [changed] ANTIC_IN[9] := { _20 (0014), a_25 (0016), _19 (0013), {nop_expr,_19} (0002), {plus_expr,a_25,3} (0012) } S[9] := { _20 (0014), a_25 (0016), _19 (0013), {nop_expr,_19} (0002), {plus_expr,a_25,3} (0012) } Applying pattern match.pd:350, gimple-match.c:8615
I will have a look. There might be latent availability issues with PRE simplification still.
The match.pd:358 triggers twice during PRE, but certainly before PRE there is no division with boolean range second operand, the only one in there is _5 = _25 / iftmp.1_10; where iftmp.1_10 has # RANGE [-32768, 65535] # iftmp.1_10 = PHI <iftmp.1_15(5), iftmp.1_14(6)>
We have: var_3.2_4 = var_3; iftmp.1_14 = (int) var_3.2_4; var_11_lsm.14_8 = _7(D); <bb 3> [local count: 955630225]: # RANGE [0, 24] NONZERO 31 # a_20 = PHI <a_18(10), 0(9)> _19 = MEM <short unsigned int[23]> [(short unsigned int &)&arr_4][a_20]; if (_19 > 1) goto <bb 5>; [50.00%] else goto <bb 4>; [50.00%] <bb 4> [local count: 477815112]: # RANGE [0, 1] NONZERO 1 _3 = (int) _19; if (_19 != 0) goto <bb 5>; [20.00%] else goto <bb 6>; [80.00%] <bb 5> [local count: 477815112]: # RANGE [0, 1] NONZERO 1 # _24 = PHI <_3(4), 1(3)> # RANGE [1, 65535] NONZERO 65535 iftmp.1_15 = (int) _19; goto <bb 7>; [100.00%] <bb 6> [local count: 477815112]: <bb 7> [local count: 955630225]: # RANGE [-32768, 65535] # iftmp.1_10 = PHI <iftmp.1_15(5), iftmp.1_14(6)> # RANGE [0, 1] NONZERO 1 # _25 = PHI <_24(5), 0(6)> # RANGE [-1, 1] _5 = _25 / iftmp.1_10; and seems PRE is trying top simplify 1 / _3 and _3 / _3. _3 has correctly range of [0, 1] - in that bb _19 can't be anything but 0 or 1 given the condition, but iftmp.1_15 = (int) _19; isn't equivalent to that, because it is reachable from other bbs, even when it is also (int) _19. So, does PRE need to temporarily reset_flow_sensitive_info in such cases and restore if it didn't succeed?
On Tue, 2 Nov 2021, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103037 > > --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > We have: > var_3.2_4 = var_3; > iftmp.1_14 = (int) var_3.2_4; > var_11_lsm.14_8 = _7(D); > > <bb 3> [local count: 955630225]: > # RANGE [0, 24] NONZERO 31 > # a_20 = PHI <a_18(10), 0(9)> > _19 = MEM <short unsigned int[23]> [(short unsigned int &)&arr_4][a_20]; > if (_19 > 1) > goto <bb 5>; [50.00%] > else > goto <bb 4>; [50.00%] > > <bb 4> [local count: 477815112]: > # RANGE [0, 1] NONZERO 1 > _3 = (int) _19; > if (_19 != 0) > goto <bb 5>; [20.00%] > else > goto <bb 6>; [80.00%] > > <bb 5> [local count: 477815112]: > # RANGE [0, 1] NONZERO 1 > # _24 = PHI <_3(4), 1(3)> > # RANGE [1, 65535] NONZERO 65535 > iftmp.1_15 = (int) _19; > goto <bb 7>; [100.00%] > > <bb 6> [local count: 477815112]: > > <bb 7> [local count: 955630225]: > # RANGE [-32768, 65535] > # iftmp.1_10 = PHI <iftmp.1_15(5), iftmp.1_14(6)> > # RANGE [0, 1] NONZERO 1 > # _25 = PHI <_24(5), 0(6)> > # RANGE [-1, 1] > _5 = _25 / iftmp.1_10; > and seems PRE is trying top simplify 1 / _3 and _3 / _3. _3 has correctly > range of [0, 1] - in that bb _19 can't be anything but 0 or 1 given the > condition, but > iftmp.1_15 = (int) _19; isn't equivalent to that, because it is reachable from > other bbs, even when it is also (int) _19. > So, does PRE need to temporarily reset_flow_sensitive_info in such cases and > restore if it didn't succeed? No, we have means to avoid this situation but somehow it doesn't work. Give me some time to look into this.
So we value number iftmp.0_15 to _3 and things start to go downhill when we PHI translate _25 / iftmp.0_10 5 -> 7 as _24 / _3 since there's too much sharing of the PRE IL used for PHI translation and the VN tables used for value-numbering. I did try to untangle this mess a few times already. I will see what I can do during stage3.
More reduced testcase - the min() obfuscation is to avoid recognizing a MIN_EXPR before jump threading gets the chance to disrupt it. A GIMPLE unit testcase for PRE is difficult since we are not supporting SSA annotations (ranges) at the moment. static inline const unsigned short * min(unsigned short *d, const unsigned short *e) { return *e < *d ? e : d; } unsigned short __attribute__((noipa)) test(unsigned short arr, unsigned short val) { unsigned short tem = 1; unsigned short tem2 = *min(&arr, &tem); return tem2 / (arr ? arr : val); } int main() { if (test (2, 2) != 0) __builtin_abort(); return 0; } The IL for the reduced testcase before PRE is <bb 2> [local count: 1073741824]: if (arr_6(D) > 1) goto <bb 4>; [50.00%] else goto <bb 3>; [50.00%] <bb 3> [local count: 536870912]: # RANGE [0, 1] NONZERO 1 _1 = (int) arr_6(D); // (A) if (arr_6(D) != 0) goto <bb 4>; [20.00%] else goto <bb 5>; [80.00%] <bb 4> [local count: 536870913]: # RANGE [0, 1] NONZERO 1 # _17 = PHI <_1(3), 1(2)> # RANGE [1, 65535] NONZERO 65535 iftmp.0_9 = (int) arr_6(D); // (B) goto <bb 6>; [100.00%] <bb 5> [local count: 536870913]: # RANGE [0, 65535] NONZERO 65535 iftmp.0_8 = (int) val_7(D); <bb 6> [local count: 1073741824]: # RANGE [0, 65535] NONZERO 65535 # iftmp.0_3 = PHI <iftmp.0_9(4), iftmp.0_8(5)> # RANGE [0, 1] NONZERO 1 # _18 = PHI <_17(4), 0(5)> # RANGE [0, 1] NONZERO 1 _2 = _18 / iftmp.0_3; # RANGE [0, 1] NONZERO 1 _10 = (short unsigned int) _2; return _10; and the problematic value-numbering is iftmp.0_9 = _1 (0001) at (A) and (B). By design we may only use _available_ expressions during simplification but _1 is not available on the 4 -> 6 edge. That's working OK for the translation from 4 -> 6 but then we record a valueized expression into the ANTIC set of 4 which when further translated and simplified has the wrong representative inside.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:e25dce501334053239dcc433e4c46ecbddbcb13e commit r12-7389-ge25dce501334053239dcc433e4c46ecbddbcb13e Author: Richard Biener <rguenther@suse.de> Date: Thu Feb 24 13:04:29 2022 +0100 tree-optimization/103037 - PRE simplifying valueized expressions This fixes a long-standing issue in PRE where we track valueized expressions in our expression sets that we use for PHI translation, code insertion but also feed into match-and-simplify via vn_nary_simplify. But that's not what is expected from vn_nary_simplify or match-and-simplify which assume we are simplifying with operands available at the point of the expression so they can use contextual information on the SSA names like ranges. While the VN side was updated to ensure this with the rewrite to RPO VN, thereby removing all workarounds that nullified such contextual info on all SSA names, the PRE side still suffers from this. The following patch tries to apply minimal surgery at this point and makes PRE track un-valueized expressions in the expression sets but only for the NARY kind (both NAME and CONSTANT do not suffer from this issue), leaving the REFERENCE kind alone. The REFERENCE kind is important when trying to remove the workarounds still in place in compute_avail for code hoisting, but that's a separate issue and we have a working workaround in place. Doing this comes at the cost of duplicating the VN IL on the PRE side for NARY and eventually some extra overhead for translated expressions that is difficult to assess. 2022-02-25 Richard Biener <rguenther@suse.de> PR tree-optimization/103037 * tree-ssa-sccvn.h (alloc_vn_nary_op_noinit): Declare. (vn_nary_length_from_stmt): Likewise. (init_vn_nary_op_from_stmt): Likewise. (vn_nary_op_compute_hash): Likewise. * tree-ssa-sccvn.cc (alloc_vn_nary_op_noinit): Export. (vn_nary_length_from_stmt): Likewise. (init_vn_nary_op_from_stmt): Likewise. (vn_nary_op_compute_hash): Likewise. * tree-ssa-pre.cc (pre_expr_obstack): New obstack. (get_or_alloc_expr_for_nary): Pass in the value-id to use, (re-)compute the hash value and if the expression is not found allocate it from pre_expr_obstack. (phi_translate_1): Do not insert the NARY found in the VN tables but build a PRE expression from the valueized NARY with the value-id we eventually found. (find_or_generate_expression): Assert we have an entry for constant values. (compute_avail): Insert not valueized expressions into EXP_GEN using the value-id from the VN tables. (init_pre): Allocate pre_expr_obstack. (fini_pre): Free pre_expr_obstack. * gcc.dg/torture/pr103037.c: New testcase.
Fixed on trunk sofar. Let's watch for fallout.
GCC 11.3 is being released, retargeting bugs to GCC 11.4.
GCC 11.4 is being released, retargeting bugs to GCC 11.5.
Fixed in GCC 12.