Created attachment 35920 [details] Source file to repro error Using GCC from a recent trunk checkout (rev 225476), I am seeing the following error on a simple input file. I'm using an ARM AArch64 cross-compiler running on a x86_64 host, though I'm unsure if that makes a difference for this case. Both -O3 and -funroll-loops are required to hit the error. % aarch64-linux-gnu-gcc -c -o place_test.o -O3 -mcpu=cortex-a57 -march=armv8-a -funroll-loops place_test.c place_test.c: In function 'place_and_route': place_test.c:23:1: internal compiler error: in record_operand_use, at regrename.c:215 } ^ 0xabc68d record_operand_use /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/regrename.c:215 0xabcf74 scan_rtx_reg /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/regrename.c:1130 0xabe9ec build_def_use /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/regrename.c:1710 0xabe9ec regrename_analyze(bitmap_head*) /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/regrename.c:726 0xe5dd12 func_fma_steering::execute_fma_steering() /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/config/aarch64/cortex-a57-fma-steering.c:1034 0xe5e85d pass_fma_steering::execute(function*) /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/config/aarch64/cortex-a57-fma-steering.c:1071 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. The function is simple (full file attached) after being pruned as much as possible. void place_and_route (int num_regions) { int i, j; float capacity; for (i=0;i<num_regions;i++) { for (j=0;j<num_regions;j++) { place_region_y[i][j].capacity = capacity; place_region_y[i][j].inv_capacity = 1. / capacity; place_region_y[i][j].occupancy = 0.; place_region_y[i][j].cost = 0.; } } } Setting 'capacity' to a known non-zero value also makes the ICE go away. Explicitly setting it to zero results in the same error and eliminates the obvious -Wunused-value warning. Output of gcc -v (paths trimmed) Using built-in specs. COLLECT_GCC=/gcc-aarch64-225476/bin/aarch64-linux-gnu-gcc COLLECT_LTO_WRAPPER=/gcc-aarch64-225476/libexec/gcc/aarch64-linux-gnu/6.0.0/lto-wrapper Target: aarch64-linux-gnu Configured with: /gcc-trunk/configure CC=/gcc-x86/4.8.1/bin/gcc CXX=/gcc-x86/4.8.1/r0/bin/g++ LDFLAGS=-Wl,-rpath,/gcc-aarch64-225476/lib,-rpath,/gcc-x86/4.8.1/x86_64/lib64 'CFLAGS_FOR_TARGET=-march=armv8-a -mcpu=cortex-a57 -O3' 'CXXFLAGS_FOR_TARGET=-march=armv8-a -mcpu=cortex-a57 -O3' --prefix=/gcc-aarch64-225476 --target=aarch64-linux-gnu --with-sysroot=/gcc-aarch64-225476/sysroot --with-gmp=/gcc-aarch64-225476 --with-mpc=/gcc-aarch64-225476 --with-mpfr=/gcc-aarch64-225476 --with-isl=/gcc-aarch64-225476 --enable-__cxa_atexit --with-gnu-as --with-gnu-ld --enable-shared --disable-libssp --disable-libmudflap --enable-languages=c,c++,fortran --disable-libsanitizer --disable-nls Thread model: posix gcc version 6.0.0 20150706 (experimental) (GCC)
Created attachment 36341 [details] Reduced testcase without Undefined Behaviour Confirmed with the testcase updated to avoid Undefined Behaviour. It isn't immediately clear what is going wrong here, the ICE occurs before any of the cortex-a57-fma-steering analysis kicks in, which implies an issue in the generic register rename analysis code. The assert trips on this line: gcc_assert (cur_operand->n_chains < MAX_REGS_PER_ADDRESS); When handling this rtx: (insn 403 78 417 4 (set (mem:XI (post_inc:DI (reg:DI 17 x17 [orig:205 ivtmp.18 ] [205])) [4 MEM[(float *)vectp_pretmp.12_54]+0 S64 A32]) (unspec:XI [ (reg:XI 32 v0 [orig:165 vect_array.14 ] [165]) (unspec:V4SF [ (const_int 0 [0]) ] UNSPEC_VSTRUCTDUMMY) ] UNSPEC_ST4)) error.c:20 2372 {aarch64_simd_st4v4sf} (expr_list:REG_INC (reg:DI 17 x17 [orig:205 ivtmp.18 ] [205]) (nil))) In this case n_chains is either 2 or 3 and MAX_REGS_PER_ADDRESS for aarch64 is set to 2, so we trip the assert. I don't know regrename at all, so I'm not sure how we get in to this situation.
The operand_rr_info struct has arrays sized by MAX_REGS_PER_ADDRESS. regrename is calling record_operand_use for a write to an operand. It is also calling record_operand_use when a use (read or write) overlaps an earlier write. Hence a mem operand using an address with N regs can overlap N previous writes explaining the use of MAX_REGS_PER_ADDRESS. However, we can also have an operand that is N-regs wide that can overlap N previous 1-reg wide writes. Thus, we need to be able to handle the number of regs in the widest possible mode also. If we can have multiword regs used inside an address, things get even more complicated, but that isn't an issue at the moment. So instead of using just MAX_REGS_PER_ADDRESS, we need here something like #define OPERAND_RR_INFO_ARRAY_SIZE \ MAX (MAX_REGS_PER_ADDRESS, \ (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT) / MIN_UNITS_PER_WORD) Unfortunately, this increases the array size from 2 to 16 for aarch64, as the widest mode is CXImode which takes 16 registers. This change may be wasting too much memory to be useful. This only affects the ports that have their own target dependent regrename pass though, which is only c6x and aarch64 at the moment. I suspect that we only need an array size of 4 for arm, as the widest instructions operate on 4 registers at a time, but I don't see an obvious way to get that info without adding another macro. I don't see any obvious uses of the chains member in operand_rr_info. The c6x and aarch64 ports are only using the heads member. So we could partially offset the memory size increase by dropping the chains member. Or maybe we can modify mode creation to avoid getting the CXI mode, as we don't need it. It is created automatically because we define XI as an 8-wide integer mode. Without CXImode the widest modes are only 8 registers wide. Combined with the removal of the chains member we are only doubling the size of this structure.
Created attachment 36705 [details] untested patch that may be wasting memory
Author: bernds Date: Tue Nov 17 21:20:20 2015 New Revision: 230499 URL: https://gcc.gnu.org/viewcvs?rev=230499&root=gcc&view=rev Log: PR target/66785 * regrename.c (record_operand_use): Keep track of failed operands and stop appending if we see any. * regrename.h (struct operand_rr_info): Add a failed field and shrink n_chains to short. Modified: trunk/gcc/ChangeLog trunk/gcc/regrename.c trunk/gcc/regrename.h
Ah, sorry Jim, I already had a patch. Ramana pointed out this bug but I didn't expect anyone else to actually investigate it. The overflow case can only occur in a failure case if the operand can't be renamed, so I've chosen to just record that fact. Someone might want to take a look at the aarch64 backend to see if it needs to take the new "failed" bit into account.
I looked at it and decided it's probably fine.