Created attachment 48199 [details] Test case On AArch64 with optimizations enabled (-O1 is enough), attempting to right-shift an unsigned 64-bit value in an OpenMP SIMD loop generates an internal compiler error. This happens on at least GCC 9 and 10, and I've tried it cross-compiling to AArch64 and natively (on a Raspberry Pi running Fedora 31 with gcc 9.3.1). I'm attaching a test case. Here is the full output from attempting to compile it with `aarch64-linux-gnu-gcc-10 -v -fopenmp-simd -O2 -c -o test.o srl.c`: Using built-in specs. COLLECT_GCC=aarch64-linux-gnu-gcc-10 Target: aarch64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 10-20200324-1' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --without-target-system-zlib --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --includedir=/usr/aarch64-linux-gnu/include Thread model: posix Supported LTO compression algorithms: zlib gcc version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536] (Debian 10-20200324-1) COLLECT_GCC_OPTIONS='-v' '-fopenmp-simd' '-O2' '-c' '-o' 'test.o' '-mlittle-endian' '-mabi=lp64' /usr/lib/gcc-cross/aarch64-linux-gnu/10/cc1 -quiet -v -imultiarch aarch64-linux-gnu srl.c -quiet -dumpbase srl.c -mlittle-endian -mabi=lp64 -auxbase-strip test.o -O2 -version -fopenmp-simd -fasynchronous-unwind-tables -o /tmp/ccGROOBh.s GNU C17 (Debian 10-20200324-1) version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536] (aarch64-linux-gnu) compiled by GNU C version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536], GMP version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc-cross/aarch64-linux-gnu/10/include-fixed" ignoring nonexistent directory "/usr/include/aarch64-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/lib/gcc-cross/aarch64-linux-gnu/10/include /usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include /usr/include End of search list. GNU C17 (Debian 10-20200324-1) version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536] (aarch64-linux-gnu) compiled by GNU C version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536], GMP version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: b59507ef9cd435e859f115f5f55f1a57 during RTL pass: expand srl.c: In function ‘l’: srl.c:14:15: internal compiler error: in expand_shift_1, at expmed.c:2654 14 | aj.e[i] = ak.e[i] >> k; | ~~~~~~~~^~~~~~~~~~~~~~ 0x613d01 expand_shift_1 ../../src/gcc/expmed.c:2654 0x83dce5 expand_variable_shift(tree_code, machine_mode, rtx_def*, tree_node*, rtx_def*, int) ../../src/gcc/expmed.c:2695 0x85053b expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../src/gcc/expr.c:9477 0x85725d expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../src/gcc/expr.c:10049 0x864dc1 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../src/gcc/expr.c:8353 0x864dc1 expand_normal ../../src/gcc/expr.h:288 0x864dc1 store_field ../../src/gcc/expr.c:7097 0x86178e expand_assignment(tree_node*, tree_node*, bool) ../../src/gcc/expr.c:5369 0x75c908 expand_gimple_stmt_1 ../../src/gcc/cfgexpand.c:3749 0x75c908 expand_gimple_stmt ../../src/gcc/cfgexpand.c:3847 0x7627ea expand_gimple_basic_block ../../src/gcc/cfgexpand.c:5887 0x7627ea execute ../../src/gcc/cfgexpand.c:6542 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
Self contained source: #define a(b) __attribute__((__vector_size__(b))) #define c(aa, ab, d) memcpy(aa, ab, d) typedef __SIZE_TYPE__ size_t; #define memcpy __builtin_memcpy typedef unsigned long long uint64_t; typedef struct { uint64_t e a(16); } f; f ae, af; int g; int l() { f aj, ak = af, al = ae; int k = al.e[0]; _Pragma("omp simd") for (size_t i = 0; i < sizeof(aj) / sizeof(aj.e[0]); i++) aj.e[i] = ak.e[i] >> k; f j = aj; c(&g, &j, g); return g; }
Reduced testcase, -O1 and higher: typedef unsigned long V __attribute__((__vector_size__(16))); V foo (V x, unsigned long y) { return x >> y; }
Thanks for looking into this. Left shift instead of right also seems to be a problem. The backtrace is a bit different, but I figure it's probably the same issue; if not I can open up a new report. I actually have something similar in my code with a note that it failed on GCC ≤ 7 (<https://github.com/nemequ/simde/blob/9efa34cddce5a5281f6909d48b11d5639ec0b519/simde/x86/sse2.h#L4409>). My guess is that GCC 7 fails all the time but GCC 8+ requires optimization, but I don't have convenient access to GCC 7 on AArch64 so I'm not certain. Here is the output from left shift: during RTL pass: expand foo.c: In function ‘foo’: foo.c:4:12: internal compiler error: in copy_to_mode_reg, at explow.c:632 4 | return x << y; | ~~^~~~ 0x613b07 copy_to_mode_reg(machine_mode, rtx_def*) ../../src/gcc/explow.c:632 0xe19ea3 aarch64_expand_vector_init(rtx_def*, rtx_def*) ../../src/gcc/config/aarch64/aarch64.c:17670 0x10ed6fc ??? ../../src/gcc/config/aarch64/aarch64-simd.md:6140 0xa62722 insn_gen_fn::operator()(rtx_def*, rtx_def*) const ../../src/gcc/recog.h:317 0xa62722 expand_vector_broadcast(machine_mode, rtx_def*) ../../src/gcc/optabs.c:438 0xa641b0 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*, int, optab_methods) ../../src/gcc/optabs.c:1300 0x83d69f expand_shift_1 ../../src/gcc/expmed.c:2624 0x83dce5 expand_variable_shift(tree_code, machine_mode, rtx_def*, tree_node*, rtx_def*, int) ../../src/gcc/expmed.c:2695 0x85053b expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../src/gcc/expr.c:9477 0x85725d expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../src/gcc/expr.c:10049 0x75cd2a expand_expr ../../src/gcc/expr.h:282 0x75cd2a expand_return ../../src/gcc/cfgexpand.c:3611 0x75cd2a expand_gimple_stmt_1 ../../src/gcc/cfgexpand.c:3720 0x75cd2a expand_gimple_stmt ../../src/gcc/cfgexpand.c:3847 0x7627ea expand_gimple_basic_block ../../src/gcc/cfgexpand.c:5887 0x7627ea execute ../../src/gcc/cfgexpand.c:6542 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
Created attachment 48207 [details] gcc10-pr94488.patch Seems this bug goes all the way to the introduction of aarch64 port. The patterns have general_operand predicate on the shift amount, but actually only grok if the amount is a CONST_INT, or REG or MEM and nothing else, while in this case it is a SUBREG of REG. There is no reason why it can't handle any general_operand.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:7a6588fe65432c0f1a8b5fdefba81700ebf88711 commit r10-7584-g7a6588fe65432c0f1a8b5fdefba81700ebf88711 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Apr 7 10:01:16 2020 +0200 aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488] The following testcase ICEs on aarch64 apparently since the introduction of the aarch64 port. The reason is that the {ashl,ashr,lshr}<mode>3 expanders completely unnecessarily FAIL; if operands[2] is something other than a CONST_INT or REG or MEM and the middle-end code can't cope with the pattern giving up in these cases. All the expanders use general_operand predicate for the shift amount operand, but then have just a special case for CONST_INT (if in-bound, emit an immediate shift, otherwise force into REG), or MEM (force into REG), or REG (that is the case it handles). In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid general_operand. I don't see any reason what is magic about MEMs that it should be forced into REG and others like SUBREGs that it shouldn't, there isn't even a reason to check for !REG_P because force_reg will do nothing if the operand is already a REG, and otherwise can handle general_operand just fine. 2020-04-07 Jakub Jelinek <jakub@redhat.com> PR target/94488 * config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3, ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT. Assume it is a REG after that instead of testing it and doing FAIL otherwise. Formatting fix. * gcc.c-torture/compile/pr94488.c: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:2daa92ac4b51387e55e88ee48bdc2fab7ba25981 commit r10-7602-g2daa92ac4b51387e55e88ee48bdc2fab7ba25981 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Apr 7 19:04:31 2020 +0200 aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488] The following testcase ICEs on aarch64 apparently since the introduction of the aarch64 port. The reason is that the {ashl,ashr,lshr}<mode>3 expanders completely unnecessarily FAIL; if operands[2] is something other than a CONST_INT or REG or MEM and the middle-end code can't cope with the pattern giving up in these cases. All the expanders use general_operand predicate for the shift amount operand, but then have just a special case for CONST_INT (if in-bound, emit an immediate shift, otherwise force into REG), or MEM (force into REG), or REG (that is the case it handles). In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid general_operand. I don't see any reason what is magic about MEMs that it should be forced into REG and others like SUBREGs that it shouldn't, there isn't even a reason to check for !REG_P because force_reg will do nothing if the operand is already a REG, and otherwise can handle general_operand just fine. 2020-04-07 Jakub Jelinek <jakub@redhat.com> PR target/94488 * config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3, ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT. Assume it is a REG after that instead of testing it and doing FAIL otherwise. Formatting fix. * gcc.c-torture/compile/pr94488.c: New test.
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:7f3ac38b3c765d49a46f65f1e5e9a812fb1da49c commit r9-8480-g7f3ac38b3c765d49a46f65f1e5e9a812fb1da49c Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Apr 7 10:01:16 2020 +0200 aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488] The following testcase ICEs on aarch64 apparently since the introduction of the aarch64 port. The reason is that the {ashl,ashr,lshr}<mode>3 expanders completely unnecessarily FAIL; if operands[2] is something other than a CONST_INT or REG or MEM and the middle-end code can't cope with the pattern giving up in these cases. All the expanders use general_operand predicate for the shift amount operand, but then have just a special case for CONST_INT (if in-bound, emit an immediate shift, otherwise force into REG), or MEM (force into REG), or REG (that is the case it handles). In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid general_operand. I don't see any reason what is magic about MEMs that it should be forced into REG and others like SUBREGs that it shouldn't, there isn't even a reason to check for !REG_P because force_reg will do nothing if the operand is already a REG, and otherwise can handle general_operand just fine. 2020-04-07 Jakub Jelinek <jakub@redhat.com> PR target/94488 * config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3, ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT. Assume it is a REG after that instead of testing it and doing FAIL otherwise. Formatting fix. * gcc.c-torture/compile/pr94488.c: New test.
Fixed for 8.5 too in r8-10482-g84d649d3c71e80269ebd9764652131c51ff4a895 .