Link to the Compiler Explorer: https://godbolt.org/z/78ab77Env Reproducer: #include <stdio.h> unsigned long long int var_4 = 235; unsigned long long int var_5 = 74; signed char var_12 = -99; unsigned long long int var_349; unsigned char var_645; void test(); const unsigned long long &min(const unsigned long long &a, const unsigned long long &b) { return b < a ? b : a; } void test() __attribute__((noipa)); void test() { for (short c = var_12; c; c += 5) ; for (int e = 0; e < 12; e += 1) { var_349 = var_4 ? 235 : 74; var_645 = min((unsigned long long)true, var_5 ? var_12 : var_4); } } int main() { test(); printf("%d\n", (int)var_645); //if (var_645 != 1) // __builtin_abort(); } Error: >$g++ -O2 small.cpp && ./a.out 1 >$g++ -O3 small.cpp && ./a.out 255 GCC version: Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/testing/gcc/bin_master/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /testing/gcc/gcc_src_master/configure --enable-multilib --prefix=/testing/gcc/bin_master --disable-bootstrap Thread model: posix Supported LTO compression algorithms: zlib gcc version 12.0.0 20211011 (30cce6f65a77b8eaa22f3efff7f1ba54858106f9) (GCC)
Started with r12-1075-g28484d00c45b7bf094a22a4fddf9ffdc7482c7e1
Confirmed. ... mask__43.21_62 = vect_cst__60 != vect_cst__61; _43 = var_12.0_1 != 0; _3 = (long long unsigned int) _43; vect_patt_34.22_63 = VIEW_CONVERT_EXPR<vector(4) unsigned char>(mask__43.21_62); _26 = (unsigned char) _3; Somehow the vectorizer missed the conversion but has the conversion there. /app/example.cpp:19:23: note: add new stmt: mask__43.21_62 = vect_cst__60 != vect_cst__61; /app/example.cpp:19:23: note: ------>vectorizing statement: _3 = (long long unsigned int) _43; /app/example.cpp:19:23: note: ------>vectorizing statement: patt_34 = (unsigned char) _43; /app/example.cpp:19:23: note: transform statement. /app/example.cpp:19:23: note: vect_is_simple_use: operand var_12.0_1 != 0, type of def: internal /app/example.cpp:19:23: note: vect_is_simple_use: vectype vector(4) <signed-boolean:8> /app/example.cpp:19:23: note: transform assignment. /app/example.cpp:19:23: note: vect_get_vec_defs_for_operand: _43 /app/example.cpp:19:23: note: vect_is_simple_use: operand var_12.0_1 != 0, type of def: internal /app/example.cpp:19:23: note: def_stmt = _43 = var_12.0_1 != 0; /app/example.cpp:19:23: note: add new stmt: vect_patt_34.22_63 = VIEW_CONVERT_EXPR<vector(4) unsigned char>(mask__43.21_62); /app/example.cpp:19:23: note: extracting lane for live stmt _26 = (unsigned char) _3;
(In reply to Jakub Jelinek from comment #1) > Started with r12-1075-g28484d00c45b7bf094a22a4fddf9ffdc7482c7e1 I just think that exposed the latent bug in the vectorizer as far as I can tell.
I will have a look.
The interesting thing is that the difference starts with pattern recog (for the epilogue loop) where the main loop recognizes a bool pattern but the epilogue one does not. That's because 'long long unsigned int' doesn't have a vector type with V4QImode. But then vect_recog_cast_forwprop_pattern makes this cast irrelevant but we do not re-visit the bool pattern as we only (re-)process the pattern def sequence stmts but not the stmt itself: /* If this statement has already been replaced with pattern statements, leave the original statement alone, since the first match wins. Instead try to match against the definition statements that feed the main pattern statement. */ if (STMT_VINFO_IN_PATTERN_P (stmt_info)) { gimple_stmt_iterator gsi; for (gsi = gsi_start (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info)); !gsi_end_p (gsi); gsi_next (&gsi)) vect_pattern_recog_1 (vinfo, recog_func, vinfo->lookup_stmt (gsi_stmt (gsi))); return; } and we're also not expecting to do this: void vect_mark_pattern_stmts (vec_info *vinfo, stmt_vec_info orig_stmt_info, gimple *pattern_stmt, tree pattern_vectype) { ... /* We shouldn't be replacing the main pattern statement. */ gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info)->stmt != orig_pattern_stmt); but bool pattern recog is required for correctness, we are not going to fail vectorization later. Richard - you added most of the re-processing of patterns (and also the forwprop pattern that triggers the failure situation). Can you share insights and maybe fix this? It's also really a latent issue. One could use __int128_t to trigger it for V8QI vectorization, but that's likely it - the trigger is the FAIL of the bool pattern recog (which should probably a fatal vectorization FAIL in case the stmt is ending up relevant). What "works" (for the testcase) is to not terminate vect_recog_bool_pattern when we have no vector type for the LHS but allow NULL to be propagated through the pattern machinery (nothing will in the end use that type for this testcase). As said, since bool patterns are required for correctness we cannot really early-out here but need to communicate failure upthread. I'm going to test a patch with this route.
Created attachment 51621 [details] patch I am testing
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:eb032893675afea4b01cc6ad06a3e0dcfe9b51cd commit r12-4472-geb032893675afea4b01cc6ad06a3e0dcfe9b51cd Author: Richard Biener <rguenther@suse.de> Date: Mon Oct 18 10:31:19 2021 +0200 tree-optimization/102788 - avoid spurious bool pattern fails Bool pattern recog is required for correctness since vectorized compares otherwise produce -1 for true so any context where bool is used as value and not as condition or mask needs to be replaced with CMP ? 1 : 0. When we fail to find a vector type for the result of such use we may not simply elide such transform since a new bool result can emerge when for example the cast_forwprop pattern is applied. So the following avoids failing of the bool pattern recog process and instead not assign a vector type for the stmt. 2021-10-18 Richard Biener <rguenther@suse.de> PR tree-optimization/102788 * tree-vect-patterns.c (vect_init_pattern_stmt): Allow a NULL vectype. (vect_pattern_recog_1): Likewise. (vect_recog_bool_pattern): Continue matching the pattern even if we do not have a vector type for a conversion result. * g++.dg/vect/pr102788.cc: New testcase.
Fixed on trunk but the issue is latent since the introduction of the forwprop pattern. I'm considering backporting to GCC 11.
Fixed.
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:34ed721929700b85f19f14fc56fb598a658b2bbc commit r11-9227-g34ed721929700b85f19f14fc56fb598a658b2bbc Author: Richard Biener <rguenther@suse.de> Date: Mon Oct 18 10:31:19 2021 +0200 tree-optimization/102788 - avoid spurious bool pattern fails Bool pattern recog is required for correctness since vectorized compares otherwise produce -1 for true so any context where bool is used as value and not as condition or mask needs to be replaced with CMP ? 1 : 0. When we fail to find a vector type for the result of such use we may not simply elide such transform since a new bool result can emerge when for example the cast_forwprop pattern is applied. So the following avoids failing of the bool pattern recog process and instead not assign a vector type for the stmt. 2021-10-18 Richard Biener <rguenther@suse.de> PR tree-optimization/102788 * tree-vect-patterns.c (vect_init_pattern_stmt): Allow a NULL vectype. (vect_pattern_recog_1): Likewise. (vect_recog_bool_pattern): Continue matching the pattern even if we do not have a vector type for a conversion result. * g++.dg/vect/pr102788.cc: New testcase. (cherry picked from commit eb032893675afea4b01cc6ad06a3e0dcfe9b51cd)