Created attachment 54927 [details] reduced testcase Compiler output: $ aarch64-unknown-linux-gnu-gcc -O -mcpu=a64fx testcase.c during RTL pass: expand testcase.c: In function 'foo': testcase.c:9:3: internal compiler error: in paradoxical_subreg_p, at rtl.h:3205 9 | bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) / v)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0x7e83b3 paradoxical_subreg_p(machine_mode, machine_mode) /repo/gcc-trunk/gcc/rtl.h:3205 0x7f1871 paradoxical_subreg_p(machine_mode, machine_mode) /repo/gcc-trunk/gcc/simplify-rtx.cc:7459 0x7f1871 simplify_context::simplify_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) /repo/gcc-trunk/gcc/simplify-rtx.cc:7533 0x1193a21 simplify_context::simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) /repo/gcc-trunk/gcc/simplify-rtx.cc:7748 0x1af64c4 simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) /repo/gcc-trunk/gcc/rtl.h:3542 0x1af64c4 gen_udivv2di3(rtx_def*, rtx_def*, rtx_def*) /repo/gcc-trunk/gcc/config/aarch64/aarch64-simd.md:2910 0x104e9da expand_binop_directly /repo/gcc-trunk/gcc/optabs.cc:1442 0x104c481 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*, int, optab_methods) /repo/gcc-trunk/gcc/optabs.cc:1529 0x104f063 sign_expand_binop(machine_mode, optab_tag, optab_tag, rtx_def*, rtx_def*, rtx_def*, int, optab_methods) /repo/gcc-trunk/gcc/optabs.cc:2317 0xd92811 expand_divmod(int, tree_code, machine_mode, rtx_def*, rtx_def*, rtx_def*, int, optab_methods) /repo/gcc-trunk/gcc/expmed.cc:5268 0xd9f3e9 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) /repo/gcc-trunk/gcc/expr.cc:9863 0xda6218 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) /repo/gcc-trunk/gcc/expr.cc:10800 0xda0317 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) /repo/gcc-trunk/gcc/expr.cc:8999 0xda0317 expand_normal(tree_node*) /repo/gcc-trunk/gcc/expr.h:316 0xda0317 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) /repo/gcc-trunk/gcc/expr.cc:10453 0xda6218 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) /repo/gcc-trunk/gcc/expr.cc:10800 0xc4d218 expand_normal(tree_node*) /repo/gcc-trunk/gcc/expr.h:316 0xc4d218 precompute_register_parameters /repo/gcc-trunk/gcc/calls.cc:988 0xc53ca0 expand_call(tree_node*, rtx_def*, int) /repo/gcc-trunk/gcc/calls.cc:3416 0xda4c51 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) /repo/gcc-trunk/gcc/expr.cc:11867 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. $ aarch64-unknown-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc COLLECT_LTO_WRAPPER=/mnt/main-repo/repo/gcc-trunk/binary-trunk-r14-268-20230426091040-ge02f68df385-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/14.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu --with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld --with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r14-268-20230426091040-ge02f68df385-checking-yes-rtl-df-extra-aarch64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.0.0 20230426 (experimental) (GCC)
Are you sure this is not a regression also in GCC 13.1.0. The most obvious revision which caused this is r13-6620-gf23dc726875c26f2c3 .
(In reply to Andrew Pinski from comment #1) > Are you sure this is not a regression also in GCC 13.1.0. > The most obvious revision which caused this is r13-6620-gf23dc726875c26f2c3 . I'd expect it's g:c69db3ef7f7d82a50f46038aa5457b7c8cc2d643 but haven't looked deeper yet
Oh simplify_gen_subreg should not be used I think. Rather gen_lowpart should be used instead. Especially when it comes to big endian.
Confirmed. The operand that's blowing it up is: (subreg:V2DI (reg/v:OI 97 [ w ]) 16) at rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], <MODE>mode, 0); simplify_gen_subreg, lowpart_subreg, copy_to_mode_reg and force_reg all ICE :(
The multiplication case also ICEs void foom (V v, W w) { bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) * v)); } as mulv2di3 was implemented with a similar trick for TARGET_SVE. I'll take this, once I figure out how to wire up the Neon modes through SVE...
Ugh. I guess we've got no option but to force the original subreg into a fresh register, but that's going to pessimise cases where arithmetic is done on tuple types. Perhaps we should just expose the SVE operation as a native V2DI one. Handling predicated ops would be a bit more challenging though.
(In reply to rsandifo@gcc.gnu.org from comment #6) > Ugh. I guess we've got no option but to force the original > subreg into a fresh register, but that's going to pessimise > cases where arithmetic is done on tuple types. > > Perhaps we should just expose the SVE operation as a native > V2DI one. Handling predicated ops would be a bit more challenging > though. I did try a copy_to_mode_reg to a fresh V2DI register for non-REG_P arguments and that did progress, but (surprisingly?) still ICEd during fwprop: during RTL pass: fwprop1 mulice.c: In function 'foom': mulice.c:17:1: internal compiler error: in paradoxical_subreg_p, at rtl.h:3205 17 | } | ^ 0xe903b9 paradoxical_subreg_p(machine_mode, machine_mode) $SRC/gcc/rtl.h:3205 0xe903b9 simplify_context::simplify_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) $SRC/gcc/simplify-rtx.cc:7533 0xe1b5f7 insn_propagation::apply_to_rvalue_1(rtx_def**) $SRC/gcc/recog.cc:1176 0xe1b3d8 insn_propagation::apply_to_rvalue_1(rtx_def**) $SRC/gcc/recog.cc:1118 0xe1b7b7 insn_propagation::apply_to_rvalue_1(rtx_def**) $SRC/gcc/recog.cc:1254 0xe1babf insn_propagation::apply_to_pattern_1(rtx_def**) $SRC/gcc/recog.cc:1361 0xe1bae4 insn_propagation::apply_to_pattern(rtx_def**) $SRC/gcc/recog.cc:1383 0x1c22e5b try_fwprop_subst_pattern $SRC/gcc/fwprop.cc:454 0x1c22e5b try_fwprop_subst $SRC/gcc/fwprop.cc:627 0x1c239a9 forward_propagate_and_simplify $SRC/gcc/fwprop.cc:823 0x1c239a9 forward_propagate_into $SRC/gcc/fwprop.cc:886 0x1c23bc1 fwprop_insn $SRC/gcc/fwprop.cc:943 0x1c23d98 fwprop $SRC/gcc/fwprop.cc:995 0x1c240e1 execute $SRC/gcc/fwprop.cc:1033 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. fwprop ended up creating: (mult:VNx2DI (subreg:VNx2DI (reg/v:V2DI 95 [ v ]) 0) (subreg:VNx2DI (subreg:V2DI (reg/v:OI 97 [ w ]) 16) 0)) and something blew up anyway, so it seems the RTL passes *really* don't like these kind of subregs ;) I'll look into expressing these ops as native V2DI patterns. I guess for the unpredicated SVE2 mul that's easy, but for the predicated forms perhaps we can have them consume a predicate register, generated at expand time, similar to the aarch64-sve.md expanders. Not super-pretty but maybe it'll be enough
*** Bug 113229 has been marked as a duplicate of this bug. ***
The following testcases now fail due to this ICE when compiled with -march=armv9-a+sve2 (something which I have been testing recently too): gcc.dg/torture/pr70083.c gcc.dg/pr69896.c gcc.target/aarch64/pr70120-1.c
(In reply to Andrew Pinski from comment #9) > The following testcases now fail due to this ICE when compiled with > -march=armv9-a+sve2 (something which I have been testing recently too): > gcc.dg/torture/pr70083.c > gcc.dg/pr69896.c > gcc.target/aarch64/pr70120-1.c Note gcc.dg/pr69896.c has a different path (not via gen_*divv*) to the ICE though: ``` /home/apinski/src/upstream-full-cross/gcc/gcc/testsuite/gcc.dg/pr69896.c:22:1: internal compiler error: in paradoxical_subreg_p, at rtl.h:3213 0x80c65b paradoxical_subreg_p(machine_mode, machine_mode) ../../gcc/rtl.h:3213 0x80cfc8 paradoxical_subreg_p(machine_mode, machine_mode) ../../gcc/poly-int.h:2179 0x80cfc8 simplify_const_vector_subreg ../../gcc/simplify-rtx.cc:7423 0x80cfc8 simplify_context::simplify_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) ../../gcc/simplify-rtx.cc:7595 0xfae1c9 insn_propagation::apply_to_rvalue_1(rtx_def**) ../../gcc/recog.cc:1176 0xfadcab insn_propagation::apply_to_rvalue_1(rtx_def**) ../../gcc/recog.cc:1117 0xfade93 insn_propagation::apply_to_rvalue_1(rtx_def**) ../../gcc/recog.cc:1254 0xfae63f insn_propagation::apply_to_pattern(rtx_def**) ../../gcc/recog.cc:1396 0x1cfdb66 try_fwprop_subst_pattern ../../gcc/fwprop.cc:440 0x1cfdb66 try_fwprop_subst ../../gcc/fwprop.cc:613 0x1cfe500 forward_propagate_and_simplify ../../gcc/fwprop.cc:809 0x1cfe500 forward_propagate_into ../../gcc/fwprop.cc:872 0x1cfe89d forward_propagate_into ../../gcc/fwprop.cc:821 0x1cfe89d fwprop_insn ../../gcc/fwprop.cc:929 0x1cfe9c1 fwprop ../../gcc/fwprop.cc:981 ```
Have a patch for the division case and will finish the multiplication and submit when I'm back. Sorry for the delay.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:dfa17fd3b1a50cab51803e8a63c5c7b7db173523 commit r14-8394-gdfa17fd3b1a50cab51803e8a63c5c7b7db173523 Author: Tamar Christina <tamar.christina@arm.com> Date: Wed Jan 24 15:58:34 2024 +0000 AArch64: Fix expansion of Advanced SIMD div and mul using SVE [PR109636] As suggested in the ticket this replaces the expansion by converting the Advanced SIMD types to SVE types by simply printing out an SVE register for these instructions. This fixes the subreg issues since there are no subregs involved anymore. gcc/ChangeLog: PR target/109636 * config/aarch64/aarch64-simd.md (<su_optab>div<mode>3, mulv2di3): Remove. * config/aarch64/iterators.md (VQDIV): Remove. (SVE_FULL_SDI_SIMD, SVE_FULL_HSDI_SIMD_DI, SVE_I_SIMD_DI): New. (VPRED, sve_lane_con): Add V4SI and V2DI. * config/aarch64/aarch64-sve.md (<optab><mode>3, @aarch64_pred_<optab><mode>): Support Advanced SIMD types. (mul<mode>3): New, split from <optab><mode>3. (@aarch64_pred_<optab><mode>, *post_ra_<optab><mode>3): New. * config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_<mode>, *aarch64_mul_unpredicated_<mode>): Change SVE_FULL_HSDI to SVE_FULL_HSDI_SIMD_DI. gcc/testsuite/ChangeLog: PR target/109636 * gcc.target/aarch64/sve/pr109636_1.c: New test. * gcc.target/aarch64/sve/pr109636_2.c: New test. * gcc.target/aarch64/sve2/pr109636_1.c: New test.
Fixed, thanks for the report.