Blender from spec2017 ICEs when compiled with -Ofast -flto -mcpu=neoverse-v1 with during GIMPLE pass: vect blender/source/blender/editors/object/object_bake_api.c: In function 'write_internal_bake_pixels': blender/source/blender/editors/object/object_bake_api.c:173:13: internal compiler error: in vect_analyze_slp, at tree-vect-slp.cc:4765 173 | static bool write_internal_bake_pixels( | ^ 0x1c0a0f7 internal_error(char const*, ...) /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostic-global-context.cc:492 0x7bb0c7 fancy_abort(char const*, int, char const*) /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostic.cc:1658 0xfaf1bb vect_analyze_slp(vec_info*, unsigned int) /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:4765 0xf83a6b vect_analyze_loop_2 /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-loop.cc:2862 0xf85123 vect_analyze_loop_1 /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-loop.cc:3409 0xf85857 vect_analyze_loop(loop*, vec_info_shared*) /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-loop.cc:3567 0xfc3cef try_vectorize_loop_1 /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vectorizer.cc:1068 0xfc3cef try_vectorize_loop /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vectorizer.cc:1184 0xfc4223 execute /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vectorizer.cc:1300 Creating a reducer and bisecting
--- int a; float *b, *c; void d() { char *e; for (; a; a++, b += 4, c += 4) if (*e++) { float *f = c; f[0] = b[0]; f[1] = b[1]; f[2] = b[2]; f[3] = b[3]; } } compiled with -mcpu=neoverse-v1 -Ofast reproduces the ICE
(In reply to Tamar Christina from comment #1) > compiled with -mcpu=neoverse-v1 -Ofast reproduces the ICE here is a better testcase with no undefinedness: ``` int a; float *b, *c; void d(char * __restrict e) { for (; a; a++, b += 4, c += 4) if (*e++) { float *f = c; f[0] = b[0]; f[1] = b[1]; f[2] = b[2]; f[3] = b[3]; } } ``` Just `-O3 -march=armv9-a` is needed.
I will have a look.
OK, so we fail single-lane SLP discovery where we succeeded with multi-lane. This is because the loads appear permuted during discovery and we have a masked load feeding a masked store. But we do not handle permuting masked operations so for single-lane discovery it appears as such. I'll for now avoid the situation and leave the actual fix for later.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:ac6cd62a351a8f1f3637a2552c74eb5eb51cfdda commit r15-3411-gac6cd62a351a8f1f3637a2552c74eb5eb51cfdda Author: Richard Biener <rguenther@suse.de> Date: Tue Sep 3 09:23:20 2024 +0200 tree-optimization/116575 - avoid ICE with SLP mask_load_lane The following avoids performing re-discovery with single lanes in the attempt to for the use of mask_load_lane as rediscovery will fail since a single lane of a mask load will appear permuted which isn't supported. PR tree-optimization/116575 * tree-vect-slp.cc (vect_analyze_slp): Properly compute the mask argument for vect_load/store_lanes_supported. When the load is masked for now avoid rediscovery. * gcc.dg/vect/pr116575.c: New testcase.
I've morphed this into the appropriate missed-optimization bug.