Bug 66785 - internal compiler error in record_operand_use
Summary: internal compiler error in record_operand_use
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: ice-on-valid-code
Depends on:
Blocks:
 
Reported: 2015-07-07 06:46 UTC by Chris Jones
Modified: 2015-12-16 15:20 UTC (History)
2 users (show)

See Also:
Host:
Target: aarch64*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-09-16 00:00:00


Attachments
Source file to repro error (206 bytes, text/plain)
2015-07-07 06:46 UTC, Chris Jones
Details
Reduced testcase without Undefined Behaviour (205 bytes, text/plain)
2015-09-16 10:35 UTC, James Greenhalgh
Details
untested patch that may be wasting memory (997 bytes, patch)
2015-11-14 02:30 UTC, Jim Wilson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Jones 2015-07-07 06:46:12 UTC
Created attachment 35920 [details]
Source file to repro error

Using GCC from a recent trunk checkout (rev 225476), I am seeing the following error on a simple input file.  I'm using an ARM AArch64 cross-compiler running on a x86_64 host, though I'm unsure if that makes a difference for this case.  Both -O3 and -funroll-loops are required to hit the error.

% aarch64-linux-gnu-gcc -c -o place_test.o -O3 -mcpu=cortex-a57 -march=armv8-a -funroll-loops place_test.c
place_test.c: In function 'place_and_route':
place_test.c:23:1: internal compiler error: in record_operand_use, at regrename.c:215
 }
 ^
0xabc68d record_operand_use
        /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/regrename.c:215
0xabcf74 scan_rtx_reg
        /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/regrename.c:1130
0xabe9ec build_def_use
        /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/regrename.c:1710
0xabe9ec regrename_analyze(bitmap_head*)
        /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/regrename.c:726
0xe5dd12 func_fma_steering::execute_fma_steering()
        /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/config/aarch64/cortex-a57-fma-steering.c:1034
0xe5e85d pass_fma_steering::execute(function*)
        /local/chrisj/cd110/dev/cpu/tools/compiler/gcc/gcc-trunk/gcc/config/aarch64/cortex-a57-fma-steering.c:1071
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

The function is simple (full file attached) after being pruned as much as possible.

void place_and_route (int num_regions)
{
    int i, j;
    float capacity;

    for (i=0;i<num_regions;i++) {
        for (j=0;j<num_regions;j++) {
            place_region_y[i][j].capacity = capacity;
            place_region_y[i][j].inv_capacity = 1. / capacity;
            place_region_y[i][j].occupancy = 0.;
            place_region_y[i][j].cost = 0.;
        }
    }
}

Setting 'capacity' to a known non-zero value also makes the ICE go away.  Explicitly setting it to zero results in the same error and eliminates the obvious -Wunused-value warning.

Output of gcc -v (paths trimmed)
Using built-in specs.
COLLECT_GCC=/gcc-aarch64-225476/bin/aarch64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/gcc-aarch64-225476/libexec/gcc/aarch64-linux-gnu/6.0.0/lto-wrapper
Target: aarch64-linux-gnu
Configured with: /gcc-trunk/configure CC=/gcc-x86/4.8.1/bin/gcc CXX=/gcc-x86/4.8.1/r0/bin/g++ LDFLAGS=-Wl,-rpath,/gcc-aarch64-225476/lib,-rpath,/gcc-x86/4.8.1/x86_64/lib64 'CFLAGS_FOR_TARGET=-march=armv8-a -mcpu=cortex-a57 -O3' 'CXXFLAGS_FOR_TARGET=-march=armv8-a -mcpu=cortex-a57 -O3' --prefix=/gcc-aarch64-225476 --target=aarch64-linux-gnu --with-sysroot=/gcc-aarch64-225476/sysroot --with-gmp=/gcc-aarch64-225476 --with-mpc=/gcc-aarch64-225476 --with-mpfr=/gcc-aarch64-225476 --with-isl=/gcc-aarch64-225476 --enable-__cxa_atexit --with-gnu-as --with-gnu-ld --enable-shared --disable-libssp --disable-libmudflap --enable-languages=c,c++,fortran --disable-libsanitizer --disable-nls
Thread model: posix
gcc version 6.0.0 20150706 (experimental) (GCC)
Comment 1 James Greenhalgh 2015-09-16 10:35:02 UTC
Created attachment 36341 [details]
Reduced testcase without Undefined Behaviour

Confirmed with the testcase updated to avoid Undefined Behaviour.

It isn't immediately clear what is going wrong here, the ICE occurs before any of the cortex-a57-fma-steering analysis kicks in, which implies an issue in the generic register rename analysis code. The assert trips on this line:

  gcc_assert (cur_operand->n_chains < MAX_REGS_PER_ADDRESS);

When handling this rtx:

  (insn 403 78 417 4 (set (mem:XI (post_inc:DI (reg:DI 17 x17 [orig:205 ivtmp.18 ] [205])) [4 MEM[(float *)vectp_pretmp.12_54]+0 S64 A32])
        (unspec:XI [
                (reg:XI 32 v0 [orig:165 vect_array.14 ] [165])
                (unspec:V4SF [
                        (const_int 0 [0])
                    ] UNSPEC_VSTRUCTDUMMY)
            ] UNSPEC_ST4)) error.c:20 2372 {aarch64_simd_st4v4sf}
     (expr_list:REG_INC (reg:DI 17 x17 [orig:205 ivtmp.18 ] [205])
        (nil)))

In this case n_chains is either 2 or 3 and MAX_REGS_PER_ADDRESS for aarch64 is set to 2, so we trip the assert.

I don't know regrename at all, so I'm not sure how we get in to this situation.
Comment 2 Jim Wilson 2015-11-14 02:27:20 UTC
The operand_rr_info struct has arrays sized by MAX_REGS_PER_ADDRESS.

regrename is calling record_operand_use for a write to an operand.  It is also calling record_operand_use when a use (read or write) overlaps an earlier write.  Hence a mem operand using an address with N regs can overlap N previous writes explaining the use of MAX_REGS_PER_ADDRESS. 

However, we can also have an operand that is N-regs wide that can overlap N
previous 1-reg wide writes.  Thus, we need to be able to handle the number of regs in the widest possible mode also.  If we can have multiword regs used inside an address, things get even more complicated, but that isn't an issue at the moment.

So instead of using just MAX_REGS_PER_ADDRESS, we need here something like
#define OPERAND_RR_INFO_ARRAY_SIZE                                        \
  MAX (MAX_REGS_PER_ADDRESS,                                              \
       (MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT) / MIN_UNITS_PER_WORD)

Unfortunately, this increases the array size from 2 to 16 for aarch64, as the widest mode is CXImode which takes 16 registers.  This change may be wasting too much memory to be useful.  This only affects the ports that have their own target dependent regrename pass though, which is only c6x and aarch64 at the moment.

I suspect that we only need an array size of 4 for arm, as the widest instructions operate on 4 registers at a time, but I don't see an obvious way to get that info without adding another macro.

I don't see any obvious uses of the chains member in operand_rr_info.  The c6x and aarch64 ports are only using the heads member.  So we could partially offset the memory size increase by dropping the chains member.

Or maybe we can modify mode creation to avoid getting the CXI mode, as we don't need it.  It is created automatically because we define XI as an 8-wide integer mode.  Without CXImode the widest modes are only 8 registers wide.  Combined with the removal of the chains member we are only doubling the size of this structure.
Comment 3 Jim Wilson 2015-11-14 02:30:09 UTC
Created attachment 36705 [details]
untested patch that may be wasting memory
Comment 4 Bernd Schmidt 2015-11-17 21:20:52 UTC
Author: bernds
Date: Tue Nov 17 21:20:20 2015
New Revision: 230499

URL: https://gcc.gnu.org/viewcvs?rev=230499&root=gcc&view=rev
Log:
	PR target/66785
	* regrename.c (record_operand_use): Keep track of failed operands
	and stop appending if we see any.
	* regrename.h (struct operand_rr_info): Add a failed field and shrink
	n_chains to short.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/regrename.c
    trunk/gcc/regrename.h
Comment 5 Bernd Schmidt 2015-11-17 21:22:58 UTC
Ah, sorry Jim, I already had a patch. Ramana pointed out this bug but I didn't expect anyone else to actually investigate it. The overflow case can only occur in a failure case if the operand can't be renamed, so I've chosen to just record that fact.
Someone might want to take a look at the aarch64 backend to see if it needs to take the new "failed" bit into account.
Comment 6 Bernd Schmidt 2015-12-16 15:20:26 UTC
I looked at it and decided it's probably fine.