Bug 94488 - [AArch64] ICE on right shift of V2DImode by DImode shift
Summary: [AArch64] ICE on right shift of V2DImode by DImode shift
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords: ice-on-valid-code
Depends on:
Blocks:
 
Reported: 2020-04-04 17:53 UTC by Evan Nemerson
Modified: 2020-09-17 17:28 UTC (History)
2 users (show)

See Also:
Host:
Target: aarch64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-04-04 00:00:00


Attachments
Test case (260 bytes, text/plain)
2020-04-04 17:53 UTC, Evan Nemerson
Details
gcc10-pr94488.patch (986 bytes, patch)
2020-04-06 06:40 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Evan Nemerson 2020-04-04 17:53:29 UTC
Created attachment 48199 [details]
Test case

On AArch64 with optimizations enabled (-O1 is enough), attempting to right-shift an unsigned 64-bit value in an OpenMP SIMD loop generates an internal compiler error.

This happens on at least GCC 9 and 10, and I've tried it cross-compiling to AArch64 and natively (on a Raspberry Pi running Fedora 31 with gcc 9.3.1).

I'm attaching a test case.  Here is the full output from attempting to compile it with `aarch64-linux-gnu-gcc-10 -v -fopenmp-simd -O2 -c -o test.o srl.c`:

Using built-in specs.
COLLECT_GCC=aarch64-linux-gnu-gcc-10
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 10-20200324-1' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --without-target-system-zlib --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --includedir=/usr/aarch64-linux-gnu/include
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536] (Debian 10-20200324-1) 
COLLECT_GCC_OPTIONS='-v' '-fopenmp-simd' '-O2' '-c' '-o' 'test.o' '-mlittle-endian' '-mabi=lp64'
 /usr/lib/gcc-cross/aarch64-linux-gnu/10/cc1 -quiet -v -imultiarch aarch64-linux-gnu srl.c -quiet -dumpbase srl.c -mlittle-endian -mabi=lp64 -auxbase-strip test.o -O2 -version -fopenmp-simd -fasynchronous-unwind-tables -o /tmp/ccGROOBh.s
GNU C17 (Debian 10-20200324-1) version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536] (aarch64-linux-gnu)
	compiled by GNU C version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536], GMP version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc-cross/aarch64-linux-gnu/10/include-fixed"
ignoring nonexistent directory "/usr/include/aarch64-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc-cross/aarch64-linux-gnu/10/include
 /usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include
 /usr/include
End of search list.
GNU C17 (Debian 10-20200324-1) version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536] (aarch64-linux-gnu)
	compiled by GNU C version 10.0.1 20200324 (experimental) [master revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536], GMP version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: b59507ef9cd435e859f115f5f55f1a57
during RTL pass: expand
srl.c: In function ‘l’:
srl.c:14:15: internal compiler error: in expand_shift_1, at expmed.c:2654
   14 |       aj.e[i] = ak.e[i] >> k;
      |       ~~~~~~~~^~~~~~~~~~~~~~
0x613d01 expand_shift_1
	../../src/gcc/expmed.c:2654
0x83dce5 expand_variable_shift(tree_code, machine_mode, rtx_def*, tree_node*, rtx_def*, int)
	../../src/gcc/expmed.c:2695
0x85053b expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
	../../src/gcc/expr.c:9477
0x85725d expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../src/gcc/expr.c:10049
0x864dc1 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../src/gcc/expr.c:8353
0x864dc1 expand_normal
	../../src/gcc/expr.h:288
0x864dc1 store_field
	../../src/gcc/expr.c:7097
0x86178e expand_assignment(tree_node*, tree_node*, bool)
	../../src/gcc/expr.c:5369
0x75c908 expand_gimple_stmt_1
	../../src/gcc/cfgexpand.c:3749
0x75c908 expand_gimple_stmt
	../../src/gcc/cfgexpand.c:3847
0x7627ea expand_gimple_basic_block
	../../src/gcc/cfgexpand.c:5887
0x7627ea execute
	../../src/gcc/cfgexpand.c:6542
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
Comment 1 Andrew Pinski 2020-04-04 18:18:52 UTC
Self contained source:
#define a(b) __attribute__((__vector_size__(b)))
#define c(aa, ab, d) memcpy(aa, ab, d)
typedef __SIZE_TYPE__ size_t;
#define memcpy __builtin_memcpy
typedef unsigned long long uint64_t;
typedef struct {
  uint64_t e a(16);
} f;
f ae, af;
int g;
int l() {
  f aj, ak = af, al = ae;
  int k = al.e[0];
  _Pragma("omp simd") for (size_t i = 0; i < sizeof(aj) / sizeof(aj.e[0]); i++)
      aj.e[i] = ak.e[i] >> k;
  f j = aj;
  c(&g, &j, g);
  return g;
}
Comment 2 Jakub Jelinek 2020-04-04 19:27:50 UTC
Reduced testcase, -O1 and higher:

typedef unsigned long V __attribute__((__vector_size__(16)));

V
foo (V x, unsigned long y)
{
  return x >> y;
}
Comment 3 Evan Nemerson 2020-04-06 02:11:24 UTC
Thanks for looking into this.

Left shift instead of right also seems to be a problem.  The backtrace is a bit different, but I figure it's probably the same issue; if not I can open up a new report.

I actually have something similar in my code with a note that it failed on GCC ≤ 7 (<https://github.com/nemequ/simde/blob/9efa34cddce5a5281f6909d48b11d5639ec0b519/simde/x86/sse2.h#L4409>).  My guess is that GCC 7 fails all the time but GCC 8+ requires optimization, but I don't have convenient access to GCC 7 on AArch64 so I'm not certain.

Here is the output from left shift:

during RTL pass: expand
foo.c: In function ‘foo’:
foo.c:4:12: internal compiler error: in copy_to_mode_reg, at explow.c:632
    4 |   return x << y;
      |          ~~^~~~
0x613b07 copy_to_mode_reg(machine_mode, rtx_def*)
	../../src/gcc/explow.c:632
0xe19ea3 aarch64_expand_vector_init(rtx_def*, rtx_def*)
	../../src/gcc/config/aarch64/aarch64.c:17670
0x10ed6fc ???
	../../src/gcc/config/aarch64/aarch64-simd.md:6140
0xa62722 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
	../../src/gcc/recog.h:317
0xa62722 expand_vector_broadcast(machine_mode, rtx_def*)
	../../src/gcc/optabs.c:438
0xa641b0 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*, int, optab_methods)
	../../src/gcc/optabs.c:1300
0x83d69f expand_shift_1
	../../src/gcc/expmed.c:2624
0x83dce5 expand_variable_shift(tree_code, machine_mode, rtx_def*, tree_node*, rtx_def*, int)
	../../src/gcc/expmed.c:2695
0x85053b expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
	../../src/gcc/expr.c:9477
0x85725d expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../src/gcc/expr.c:10049
0x75cd2a expand_expr
	../../src/gcc/expr.h:282
0x75cd2a expand_return
	../../src/gcc/cfgexpand.c:3611
0x75cd2a expand_gimple_stmt_1
	../../src/gcc/cfgexpand.c:3720
0x75cd2a expand_gimple_stmt
	../../src/gcc/cfgexpand.c:3847
0x7627ea expand_gimple_basic_block
	../../src/gcc/cfgexpand.c:5887
0x7627ea execute
	../../src/gcc/cfgexpand.c:6542
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
Comment 4 Jakub Jelinek 2020-04-06 06:40:11 UTC
Created attachment 48207 [details]
gcc10-pr94488.patch

Seems this bug goes all the way to the introduction of aarch64 port.
The patterns have general_operand predicate on the shift amount, but actually only grok if the amount is a CONST_INT, or REG or MEM and nothing else, while
in this case it is a SUBREG of REG.  There is no reason why it can't handle any general_operand.
Comment 5 GCC Commits 2020-04-07 08:02:41 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7a6588fe65432c0f1a8b5fdefba81700ebf88711

commit r10-7584-g7a6588fe65432c0f1a8b5fdefba81700ebf88711
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Apr 7 10:01:16 2020 +0200

    aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488]
    
    The following testcase ICEs on aarch64 apparently since the introduction of
    the aarch64 port.  The reason is that the {ashl,ashr,lshr}<mode>3 expanders
    completely unnecessarily FAIL; if operands[2] is something other than
    a CONST_INT or REG or MEM and the middle-end code can't cope with the
    pattern giving up in these cases.  All the expanders use general_operand
    predicate for the shift amount operand, but then have just a special case
    for CONST_INT (if in-bound, emit an immediate shift, otherwise force into
    REG), or MEM (force into REG), or REG (that is the case it handles).
    In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid
    general_operand.
    I don't see any reason what is magic about MEMs that it should be forced
    into REG and others like SUBREGs that it shouldn't, there isn't even a
    reason to check for !REG_P because force_reg will do nothing if the operand
    is already a REG, and otherwise can handle general_operand just fine.
    
    2020-04-07  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/94488
            * config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3,
            ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT.
            Assume it is a REG after that instead of testing it and doing FAIL
            otherwise.  Formatting fix.
    
            * gcc.c-torture/compile/pr94488.c: New test.
Comment 6 GCC Commits 2020-04-07 17:04:57 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:2daa92ac4b51387e55e88ee48bdc2fab7ba25981

commit r10-7602-g2daa92ac4b51387e55e88ee48bdc2fab7ba25981
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Apr 7 19:04:31 2020 +0200

    aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488]
    
    The following testcase ICEs on aarch64 apparently since the introduction of
    the aarch64 port.  The reason is that the {ashl,ashr,lshr}<mode>3 expanders
    completely unnecessarily FAIL; if operands[2] is something other than
    a CONST_INT or REG or MEM and the middle-end code can't cope with the
    pattern giving up in these cases.  All the expanders use general_operand
    predicate for the shift amount operand, but then have just a special case
    for CONST_INT (if in-bound, emit an immediate shift, otherwise force into
    REG), or MEM (force into REG), or REG (that is the case it handles).
    In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid
    general_operand.
    I don't see any reason what is magic about MEMs that it should be forced
    into REG and others like SUBREGs that it shouldn't, there isn't even a
    reason to check for !REG_P because force_reg will do nothing if the operand
    is already a REG, and otherwise can handle general_operand just fine.
    
    2020-04-07  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/94488
            * config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3,
            ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT.
            Assume it is a REG after that instead of testing it and doing FAIL
            otherwise.  Formatting fix.
    
            * gcc.c-torture/compile/pr94488.c: New test.
Comment 7 GCC Commits 2020-04-07 19:04:59 UTC
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7f3ac38b3c765d49a46f65f1e5e9a812fb1da49c

commit r9-8480-g7f3ac38b3c765d49a46f65f1e5e9a812fb1da49c
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Apr 7 10:01:16 2020 +0200

    aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488]
    
    The following testcase ICEs on aarch64 apparently since the introduction of
    the aarch64 port.  The reason is that the {ashl,ashr,lshr}<mode>3 expanders
    completely unnecessarily FAIL; if operands[2] is something other than
    a CONST_INT or REG or MEM and the middle-end code can't cope with the
    pattern giving up in these cases.  All the expanders use general_operand
    predicate for the shift amount operand, but then have just a special case
    for CONST_INT (if in-bound, emit an immediate shift, otherwise force into
    REG), or MEM (force into REG), or REG (that is the case it handles).
    In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid
    general_operand.
    I don't see any reason what is magic about MEMs that it should be forced
    into REG and others like SUBREGs that it shouldn't, there isn't even a
    reason to check for !REG_P because force_reg will do nothing if the operand
    is already a REG, and otherwise can handle general_operand just fine.
    
    2020-04-07  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/94488
            * config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3,
            ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT.
            Assume it is a REG after that instead of testing it and doing FAIL
            otherwise.  Formatting fix.
    
            * gcc.c-torture/compile/pr94488.c: New test.
Comment 8 Jakub Jelinek 2020-09-17 17:28:52 UTC
Fixed for 8.5 too in r8-10482-g84d649d3c71e80269ebd9764652131c51ff4a895 .