Reproducer: int a, b, c, f; void g(bool h, int d[][5]) { for (short i = f; i; i += 1) { a = h && d[0][i]; for (int j = 0; j < 4; j += c) b = 0; } } Error: >$ g++ -O3 -march=skylake-avx512 -c func.cpp during GIMPLE pass: vect func.cpp: In function 'void g(bool, int (*)[5])': func.cpp:2:6: internal compiler error: in vect_build_gather_load_calls, at tree-vect-stmts.c:2835 2 | void g(bool h, int d[][5]) { | ^ 0x906a36 vect_build_gather_load_calls /testing/gcc/gcc_src/gcc/tree-vect-stmts.c:2835 0x906a36 vectorizable_load /testing/gcc/gcc_src/gcc/tree-vect-stmts.c:8785 0x1500240 vect_transform_stmt(vec_info*, _stmt_vec_info*, gimple_stmt_iterator*, _slp_tree*, _slp_instance*) /testing/gcc/gcc_src/gcc/tree-vect-stmts.c:11060 0x1503e6a vect_transform_loop_stmt /testing/gcc/gcc_src/gcc/tree-vect-loop.c:9362 0x151fd67 vect_transform_loop(_loop_vec_info*, gimple*) /testing/gcc/gcc_src/gcc/tree-vect-loop.c:9798 0x1553a8f try_vectorize_loop_1 /testing/gcc/gcc_src/gcc/tree-vectorizer.c:1109 0x1554591 vectorize_loops() /testing/gcc/gcc_src/gcc/tree-vectorizer.c:1248 GCC version: Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/testing/gcc/bin/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /testing/gcc/gcc_src/configure --enable-multilib --prefix=/testing/gcc/bin --disable-bootstrap Thread model: posix Supported LTO compression algorithms: zlib gcc version 12.0.0 20211002 (d7705b0ada9e9852b580ca25a45570c82152f287) (GCC)
Confirmed.
if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask))) { poly_uint64 sub1 = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)); poly_uint64 sub2 = TYPE_VECTOR_SUBPARTS (masktype); gcc_assert (known_eq (sub1, sub2));
Confirmed, started with r11-3070-g783dc66f9ccb0019.
I can take a look.
The to be vectorized IL is <bb 4> [local count: 118111600]: # i_21 = PHI <i_18(12), i_14(27)> _2 = (int) i_21; _32 = &(*d_16(D))[_2]; _3 = .MASK_LOAD (_32, 32B, h_15(D)); _5 = _3 != 0; _23 = _5 & h_15(D); prephitmp_36 = _23 ? 1 : 0; i.3_6 = (unsigned short) i_21; _7 = i.3_6 + 1; i_18 = (short int) _7; if (i_18 != 0) goto <bb 12>; [89.00%] else goto <bb 10>; [11.00%] <bb 12> [local count: 105119324]: goto <bb 4>; [100.00%] and the issue is that the mask is an invariant but we're just using vect_get_vec_defs_for_operand and that does if (dt == vect_constant_def || dt == vect_external_def) { tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo); tree vector_type; if (vectype) vector_type = vectype; else if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op)) && VECTOR_BOOLEAN_TYPE_P (stmt_vectype)) vector_type = truth_type_for (stmt_vectype); but of course stmt_vectype is not a boolean type but V8SI.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:9f12a45ef147e563f099c24c293830727e8204cc commit r12-4350-g9f12a45ef147e563f099c24c293830727e8204cc Author: Richard Biener <rguenther@suse.de> Date: Tue Oct 12 13:42:08 2021 +0200 tree-optimization/102572 - fix gathers with invariant mask This fixes the vector def gathering for invariant masks which failed to pass in the desired vector type resulting in a non-mask type to be generate. 2021-10-12 Richard Biener <rguenther@suse.de> PR tree-optimization/102572 * tree-vect-stmts.c (vect_build_gather_load_calls): When gathering the vectorized defs for the mask pass in the desired mask vector type so invariants will be handled correctly. * g++.dg/vect/pr102572.cc: New testcase.
Fixed on trunk sofar.
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:092e98d94080ca253dc4ef6957d6efaeccb88df6 commit r11-9225-g092e98d94080ca253dc4ef6957d6efaeccb88df6 Author: Richard Biener <rguenther@suse.de> Date: Tue Oct 12 13:42:08 2021 +0200 tree-optimization/102572 - fix gathers with invariant mask This fixes the vector def gathering for invariant masks which failed to pass in the desired vector type resulting in a non-mask type to be generate. 2021-10-12 Richard Biener <rguenther@suse.de> PR tree-optimization/102572 * tree-vect-stmts.c (vect_build_gather_load_calls): When gathering the vectorized defs for the mask pass in the desired mask vector type so invariants will be handled correctly. * g++.dg/vect/pr102572.cc: New testcase. (cherry picked from commit 9f12a45ef147e563f099c24c293830727e8204cc)
Fixed.