Created attachment 56982 [details] -freport-bug output Testcase: char a, c; short b; void d() { c = a - 1; b = c; unsigned short *e = &b; while (++*e > 256) ; } Command: > /scratch/tc-testing/tc-jan-1-trunk/build-rv64gcv/bin/riscv64-unknown-linux-gnu-gcc -march=rv64gcv -O2 red.c -freport-bug during GIMPLE pass: vect red.c: In function 'd': red.c:3:6: internal compiler error: tree check: expected integer_cst, have cond_expr in get_len, at tree.h:6481 3 | void d() { | ^ 0xab8132 tree_check_failed(tree_node const*, char const*, int, char const*, ...) ../../../gcc/gcc/tree.cc:8952 0xc97cb2 tree_check(tree_node const*, char const*, int, char const*, tree_code) ../../../gcc/gcc/tree.h:3900 0xc97cb2 wi::extended_tree<131072>::get_len() const ../../../gcc/gcc/tree.h:6481 0xc97cb2 wi::int_traits<generic_wide_int<wi::extended_tree<131072> > >::decompose(long*, unsigned int, generic_wide_int<wi::extended_tree<131072> > const&) ../../../gcc/gcc/wide-int.h:1050 0xc97cb2 wide_int_ref_storage<true, false>::wide_int_ref_storage<generic_wide_int<wi::extended_tree<131072> > >(generic_wide_int<wi::extended_tree<131072> > const&, unsigned int) ../../../gcc/gcc/wide-int.h:1099 0xc97cb2 generic_wide_int<wide_int_ref_storage<true, false> >::generic_wide_int<generic_wide_int<wi::extended_tree<131072> > >(generic_wide_int<wi::extended_tree<131072> > const&, unsigned int) ../../../gcc/gcc/wide-int.h:855 0xc97cb2 wi::binary_traits<generic_wide_int<wi::extended_tree<131072> >, int, wi::int_traits<generic_wide_int<wi::extended_tree<131072> > >::precision_type, wi::int_traits<int>::precision_type>::result_type wi::add<generic_wide_int<wi::extended_tree<131072> >, int>(generic_wide_int<wi::extended_tree<131072> > const&, int const&) ../../../gcc/gcc/wide-int.h:2871 0x156f0e7 wi::binary_traits<generic_wide_int<wi::extended_tree<131072> >, int, wi::int_traits<generic_wide_int<wi::extended_tree<131072> > >::precision_type, wi::int_traits<int>::precision_type>::operator_result operator+<generic_wide_int<wi::extended_tree<131072> >, int>(generic_wide_int<wi::extended_tree<131072> > const&, int const&) ../../../gcc/gcc/wide-int.h:3857 0x156f0e7 vect_analyze_loop_costing ../../../gcc/gcc/tree-vect-loop.cc:2270 0x157ff01 vect_analyze_loop_2 ../../../gcc/gcc/tree-vect-loop.cc:3166 0x1581a20 vect_analyze_loop_1 ../../../gcc/gcc/tree-vect-loop.cc:3461 0x15821b9 vect_analyze_loop(loop*, vec_info_shared*) ../../../gcc/gcc/tree-vect-loop.cc:3619 0x15c84d4 try_vectorize_loop_1 ../../../gcc/gcc/tree-vectorizer.cc:1066 0x15c84d4 try_vectorize_loop ../../../gcc/gcc/tree-vectorizer.cc:1182 0x15c8dfc execute ../../../gcc/gcc/tree-vectorizer.cc:1298 Please submit a full bug report, with preprocessed source. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Preprocessed source stored into /scratch/tmp/ccdgSv2l.out file, please attach this to your bugreport. Godbolt: https://godbolt.org/z/x851M9q7n The ICE goes away if you change the 256 -> 255 -freport-bug output attached.
This is not RISC-V issues, it's middle-end issue. Plz change the title and CC Richard. https://godbolt.org/z/1bj9xaYTa ARM SVE has the same issue.
On 1/2/24 15:38, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113210 > > --- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> --- > This is not RISC-V issues, it's middle-end issue. > Plz change the title and CC Richard. > > https://godbolt.org/z/1bj9xaYTa > > ARM SVE has the same issue. > Got it - Bugzilla seems to be down for me. Once it's back up I'll update the PR. Thanks, Patrick
Confirmed on aarch64-linux-gnu with `-O3 -march=armv9-a+sve` .
is executed at most (short unsigned int) (a.0_1 + 255) + 1 <= 256 ? 0 : ~(short unsigned int) (a.0_1 + 255) (bounded by 65279) + 1 times in loop 1. Though that is the same niter as in GCC 13 ... The code which seems to be causing this was introduced at r14-2281-g0682a32c026f1e .
Better/slightly more reduced testcase: ``` unsigned char a, c; unsigned short b; void d() { c = a - 1; b = c; while (++b > 256) ; } ```
LOOP_VINFO_NITERS is INTEGER_CST 1 here, but LOOP_VINFO_NITERSM1 is that complex expression and we assume that if LOOP_VINFO_NITERS_KNOWN_P then NITERSM1 is also known, which is not the case here.
Or maybe just a bug in the PLUS_EXPR folding? The code sets NITERSM1 to (short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0 and then fold_build2s PLUS_EXPR of that and 1 and somehow it folds to 1, that doesn't sound right to me. Now, when folding the + 1 addition just with the second operand, i.e. ~(short unsigned int) (a.0_1 + 255) it correctly folds into -(short unsigned int) (a.0_1 + 255) and obviously the second one to 1. There is also the /* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when X is unsigned, as when X + 1 overflows, X is -1, so -X == 1. */ (simplify (cond (gt (plus @0 integer_onep) @1) (negate @0) integer_onep@2) (if (TYPE_UNSIGNED (type)) (cond (ge @0 @1) (negate @0) @2))) match.pd rule, but that I'd think should just fold the whole thing to: (short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1 Though, a.0_1 is unsigned char, so (short unsigned int) (a.0_1 + 255) + 1 > 256 is actually never true. So guess the folding is correct.
Created attachment 56993 [details] gcc14-pr113210.patch So, I'd go with this patch (so far untested).
(In reply to Jakub Jelinek from comment #8) > Created attachment 56993 [details] > gcc14-pr113210.patch > > So, I'd go with this patch (so far untested). I think we want to keep the invariant that both are INTEGER_CST when one is. If we can fold the add to 1 why can't we fold the original to 0? That is, another fix might be to adjust NITERSM1 to NITERS - 1 when NITERS went constant ... (btw, I want to get rid of _NITERS and only keep _NITERSM1 given _NITERS cannot be represented when the number of latch iterations is TYPE_MAX_VALUE)
(In reply to Richard Biener from comment #9) > I think we want to keep the invariant that both are INTEGER_CST when one is. > > If we can fold the add to 1 why can't we fold the original to 0? Because fold generally isn't recursive, so either whatever created the expression (niters analysis) would need to try to fold all subexpressions which it clearly doesn't try to do so, or we rely on luck that something changed and we fold something because of that, which is exactly what triggers it during the + 1 folding.
(In reply to Richard Biener from comment #9) > That is, another fix might be to adjust NITERSM1 to NITERS - 1 when > NITERS went constant ... (btw, I want to get rid of _NITERS and only Or we could only use fold_build2 for the PLUS_EXPR 1 computation if NITERSM1 is INTEGER_CST, otherwise use build2...
(In reply to Jakub Jelinek from comment #11) > (In reply to Richard Biener from comment #9) > > That is, another fix might be to adjust NITERSM1 to NITERS - 1 when > > NITERS went constant ... (btw, I want to get rid of _NITERS and only > > Or we could only use fold_build2 for the PLUS_EXPR 1 computation if NITERSM1 > is INTEGER_CST, otherwise use build2... I think we should see where the original expression is built but not folded.
(In reply to Richard Biener from comment #12) > (In reply to Jakub Jelinek from comment #11) > > (In reply to Richard Biener from comment #9) > > > That is, another fix might be to adjust NITERSM1 to NITERS - 1 when > > > NITERS went constant ... (btw, I want to get rid of _NITERS and only > > > > Or we could only use fold_build2 for the PLUS_EXPR 1 computation if NITERSM1 > > is INTEGER_CST, otherwise use build2... > > I think we should see where the original expression is built but not folded. Hmm, probably in estimate_numbers_of_iterations, if (TREE_CODE (niter_desc.may_be_zero) != INTEGER_CST) niter = build3 (COND_EXPR, type, niter_desc.may_be_zero, build_int_cst (type, 0), niter); I vaguely remember code trying to pattern match the COND_EXPR created by this (though it should instead use number_of_iterations_exit). It should be safe to replace the above with fold_build3.
(In reply to Richard Biener from comment #13) > (In reply to Richard Biener from comment #12) > > (In reply to Jakub Jelinek from comment #11) > > > (In reply to Richard Biener from comment #9) > > > > That is, another fix might be to adjust NITERSM1 to NITERS - 1 when > > > > NITERS went constant ... (btw, I want to get rid of _NITERS and only > > > > > > Or we could only use fold_build2 for the PLUS_EXPR 1 computation if NITERSM1 > > > is INTEGER_CST, otherwise use build2... > > > > I think we should see where the original expression is built but not folded. > > Hmm, probably in estimate_numbers_of_iterations, > > if (TREE_CODE (niter_desc.may_be_zero) != INTEGER_CST) > niter = build3 (COND_EXPR, type, niter_desc.may_be_zero, > build_int_cst (type, 0), > niter); > > I vaguely remember code trying to pattern match the COND_EXPR created > by this (though it should instead use number_of_iterations_exit). It > should be safe to replace the above with fold_build3. No, it's not that. The COND_EXPR is folded, but we see (short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0 and that isn't catched. #0 0x00000000014d3464 in fold_build3_loc (loc=0, code=COND_EXPR, type=<integer_type 0x7ffff704b540 short unsigned int>, op0=<gt_expr 0x7ffff7207f50>, op1=<bit_not_expr 0x7ffff6e13540>, op2=<integer_cst 0x7ffff7047378>) at /space/rguenther/src/gcc/gcc/fold-const.cc:14174 #1 0x00000000014d0513 in fold_ternary_loc (loc=0, code=COND_EXPR, type=<integer_type 0x7ffff704b540 short unsigned int>, op0=<le_expr 0x7ffff7207f00>, op1=<integer_cst 0x7ffff7047378>, op2=<bit_not_expr 0x7ffff6e13540>) at /space/rguenther/src/gcc/gcc/fold-const.cc:13238 #2 0x00000000014d3436 in fold_build3_loc (loc=0, code=COND_EXPR, type=<integer_type 0x7ffff704b540 short unsigned int>, op0=<le_expr 0x7ffff7207f00>, op1=<integer_cst 0x7ffff7047378>, op2=<bit_not_expr 0x7ffff6e13540>) at /space/rguenther/src/gcc/gcc/fold-const.cc:14172 #3 0x0000000001d4de50 in vect_get_loop_niters (loop=0x7ffff71e3600, main_exit=0x7ffff7203990, assumptions=0x7fffffffd200, number_of_iterations=0x7fffffffd1f0, number_of_iterationsm1=0x7fffffffd1f8) at /space/rguenther/src/gcc/gcc/tree-vect-loop.cc:919 So I suggest to either try fixing that or, when adding one folds to a constant make sure M1 is constant as well.
Created attachment 57007 [details] gcc14-pr113210.patch So like this instead?
(In reply to Jakub Jelinek from comment #15) > Created attachment 57007 [details] > gcc14-pr113210.patch > > So like this instead? yes
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:b1d4e5b51375efd285378173375d634ad6ba8462 commit r14-7030-gb1d4e5b51375efd285378173375d634ad6ba8462 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 9 10:31:51 2024 +0100 vect: Ensure both NITERSM1 and NITERS are INTEGER_CSTs or neither of them [PR113210] On the following testcase e.g. on riscv64 or aarch64 (latter with -O3 -march=armv8-a+sve ) we ICE, because while NITERS is INTEGER_CST, NITERSM1 is a complex expression like (short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0 where a.0_1 is unsigned char. The condition is never true, so the above is equivalent to just 0, but only when trying to fold the above with PLUS_EXPR 1 we manage to simplify it (first ~(short unsigned int) (a.0_1 + 255) to -(short unsigned int) (a.0_1 + 255) and then (short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 255) : 1 to (short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1 and only at this point we fold the condition to be false. But the vectorizer seems to assume that if NITERS is known (i.e. suitable INTEGER_CST) then NITERSM1 also is, so the following hack ensures that if NITERS folds into INTEGER_CST NITERSM1 will be one as well. 2024-01-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113210 * tree-vect-loop.cc (vect_get_loop_niters): If non-INTEGER_CST value in *number_of_iterationsm1 PLUS_EXPR 1 is folded into INTEGER_CST, recompute *number_of_iterationsm1 as the INTEGER_CST minus 1. * gcc.c-torture/compile/pr113210.c: New test.
Fixed.