Bug 69461 - [6 Regression] ICE in lra_set_insn_recog_data, at lra.c:964
Summary: [6 Regression] ICE in lra_set_insn_recog_data, at lra.c:964
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.0
: P1 normal
Target Milestone: 6.0
Assignee: Vladimir Makarov
URL:
Keywords:
: 68959 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-01-25 09:33 UTC by Richard Biener
Modified: 2016-02-17 03:36 UTC (History)
9 users (show)

See Also:
Host:
Target: ppc64le-*
Build:
Known to work: 5.3.0
Known to fail: 6.0
Last reconfirmed: 2016-01-25 00:00:00


Attachments
preprocessed source (45.80 KB, application/octet-stream)
2016-01-25 09:34 UTC, Richard Biener
Details
profile data (2.27 KB, application/octet-stream)
2016-01-25 09:35 UTC, Richard Biener
Details
Patch I'm testing to fix the bug (1.57 KB, patch)
2016-01-28 06:28 UTC, Alexandre Oliva
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2016-01-25 09:33:29 UTC
python fails to build with

/usr/lib64/gcc/powerpc64le-suse-linux/6/cc1 -fpreprocessed complexobject.i  -quiet -dumpbase complexobject.c -mtune=power8 -mcpu=power8 -g -O2  -fmessage-length=0 -fmessage-length=0  -fprofile-use  -o complexobject.s
Objects/complexobject.c: In function 'complex_pow':
Objects/complexobject.c:688:1: internal compiler error: in lra_set_insn_recog_data, at lra.c:964

and GCC built with LRA enabled by default (thus add -mlra).

> gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/powerpc64le-suse-linux/6/lto-wrapper
Target: powerpc64le-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,go --enable-checking=yes --with-gxx-include-dir=/usr/include/c++/6 --enable-ssp --disable-libssp --disable-libvtv --disable-libmpx --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --enable-linker-build-id --enable-linux-futex --program-suffix=-6 --without-system-libunwind --with-cpu=power8 --with-tune=power8 --enable-secureplt --with-long-double-128 --enable-targets=powerpcle-linux --disable-multilib --build=powerpc64le-suse-linux --host=powerpc64le-suse-linux
Thread model: posix
gcc version 6.0.0 20160121 (experimental) [trunk revision 232670] (SUSE Linux)
Comment 1 Richard Biener 2016-01-25 09:34:12 UTC
Created attachment 37453 [details]
preprocessed source
Comment 2 Richard Biener 2016-01-25 09:35:17 UTC
Created attachment 37454 [details]
profile data

Preprocessed source and .gcda data (fails only with profile-feedback, that hopefully changes with reduction).
Comment 3 Richard Biener 2016-01-25 09:37:22 UTC
With a cross from x86_64 I see

#1  0x0000000000b91785 in lra_set_insn_recog_data (insn=0x7ffff66cb380)
    at /space/rguenther/src/svn/trunk3/gcc/lra.c:962
962               gcc_assert (GET_CODE (PATTERN (insn)) == USE
(gdb) l
957           if (nop < 0)
958             {
959               /* It is a special insn like USE or CLOBBER.  We should
960                  recognize any regular insn otherwise LRA can do nothing
961                  with this insn.  */
962               gcc_assert (GET_CODE (PATTERN (insn)) == USE
963                           || GET_CODE (PATTERN (insn)) == CLOBBER
964                           || GET_CODE (PATTERN (insn)) == ASM_INPUT);
965               data->insn_static_data = insn_static_data
966                 = get_static_insn_data (-1, 0, 0, nalt);
(gdb) p debug_rtx (insn)
(insn 605 602 604 28 (parallel [
            (set (reg:TI 421)
                (unspec:TI [
                        (reg:TI 424 [421])
                    ] UNSPEC_P8V_RELOAD_FROM_GPR))
            (clobber (reg:TF 425))
        ]) Objects/complexobject.c:671 -1
     (nil))
$1 = void
Comment 4 Markus Trippelsdorf 2016-01-25 09:46:13 UTC
trippels@gcc2-power8 ~ % cat complexobject.i
typedef struct {
  double real;
  double imag;
} Py_complex;
Py_complex a;
Py_complex fn1();
Py_complex fn2() { return fn1(); }
void fn3() {
  _setjmp();
  a = fn2();
}

trippels@gcc2-power8 ~ % gcc -O3 -mlra complexobject.i
complexobject.i: In function ‘fn3’:
complexobject.i:9:3: warning: implicit declaration of function ‘_setjmp’ [-Wimplicit-function-declaration]
   _setjmp();
   ^~~~~~~

complexobject.i:11:1: internal compiler error: in lra_set_insn_recog_data, at lra.c:964
 }
 ^

0x10676757 lra_set_insn_recog_data(rtx_insn*)
        ../../gcc/gcc/lra.c:962
0x10676fb7 lra_get_insn_recog_data
        ../../gcc/gcc/lra-int.h:486
0x10676fb7 lra_update_insn_regno_info
        ../../gcc/gcc/lra.c:1584
0x106774e7 lra_update_insn_regno_info
        ../../gcc/gcc/lra.c:1644
0x106774e7 lra_push_insn_1
        ../../gcc/gcc/lra.c:1649
0x106774e7 lra_push_insn
        ../../gcc/gcc/lra.c:1657
0x106774e7 push_insns
        ../../gcc/gcc/lra.c:1700
0x10679567 lra_process_new_insns(rtx_insn*, rtx_insn*, rtx_insn*, char const*)
        ../../gcc/gcc/lra.c:1746
0x10692363 check_and_process_move
        ../../gcc/gcc/lra-constraints.c:1190
0x10692363 curr_insn_transform
        ../../gcc/gcc/lra-constraints.c:3445
0x106935ab lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:4412
0x10677d37 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2277
0x1061cc9b do_reload
        ../../gcc/gcc/ira.c:5393
0x1061cc9b execute
        ../../gcc/gcc/ira.c:5564
Comment 5 Alexandre Oliva 2016-01-27 21:00:28 UTC
I'm looking into this one
Comment 6 Alexandre Oliva 2016-01-28 06:28:26 UTC
Created attachment 37498 [details]
Patch I'm testing to fix the bug

LRA wants harder than reload to avoid creating a stack slot to satisfy insn constraints.  As a result, it creates an additional REG:TI pseudo to reload a SUBREG:V2DF of a REG:TI, and then it tries to assign that pseudo to VSX_REGS, which in turn causes another reload because there's no way to load a TImode value into a VSX_REG in *mov<mode>_ppc64, and that requires another, and so on, until the limit on reload insns is exceeded.

The first problem is that we shouldn't be creating a TImode reload for VSX_REGS, since we can't possibly satisfy that: TImode values are not ok for VSX_REGS.  I've adjusted in_class_p to check HARD_REGNO_MODE_OK, and that put an end to infinite stream of reloads.

It was still a very long stream, though.  simplify_operand_subreg attempts to turn SUBREGs of MEMs into MEMs, but it will only proceed with the simplification if the resulting address is at least as valid as the original.  

Alas, instead of the simplification, we end up repeatedly generating reloads copying the initial value to stack slots with growing offsets, until the offset grows enough that the address becomes invalid, at which point the subreg simplification is performed.  That's 2047 excess stores and loads, plus insns that compute the stack address for each of them.

In order to fix that, I amended the test on whether to proceed with the subreg simplification to take into account the availability of regs that can hold a value of the intended mode in the goal class for that operand.

With that, we go from 2047 excess stores and loads to only 1.  I couldn't figure out yet how to get rid of this one extra store and load, and the excess stack slot, but I figured I'd share what I have, that I believe to be a solid fix, and save the investigation on an additional LRA improvement for later.
Comment 7 Markus Trippelsdorf 2016-01-28 08:17:17 UTC
Even with your patch applied, I still get the same ICE when I compile e.g.
tramp3d-v4.cpp with -flto on ppc64le:

trippels@gcc2-power8 ~ % g++ -w -Ofast -flto=16 -mlra tramp3d-v4.cpp
tramp3d-v4.cpp: In function ‘_ZN9DomainMapI8IntervalILi1EEiE10initializeERKS1_.isra.1374’:
tramp3d-v4.cpp:12683:0: internal compiler error: in lra_set_insn_recog_data, at lra.c:964
...
Comment 8 Markus Trippelsdorf 2016-01-28 08:31:40 UTC
Hmm,

trippels@gcc2-power8 ~ % cat tramp3d-v4.ii
class Init {
public:
  ~Init();
} a;

trippels@gcc2-power8 ~ % g++ -flto -mlra tramp3d-v4.ii
tramp3d-v4.ii: In function ‘__static_initialization_and_destruction_0’:
tramp3d-v4.ii:4:4: internal compiler error: in lra_set_insn_recog_data, at lra.c:964
 } a;
    ^
0x10553337 lra_set_insn_recog_data(rtx_insn*)
        ../../gcc/gcc/lra.c:962
0x10553b97 lra_get_insn_recog_data
        ../../gcc/gcc/lra-int.h:486
0x10553b97 lra_update_insn_regno_info
        ../../gcc/gcc/lra.c:1584
0x105540c7 lra_update_insn_regno_info
        ../../gcc/gcc/lra.c:1644
0x105540c7 lra_push_insn_1
        ../../gcc/gcc/lra.c:1649
0x105540c7 lra_push_insn
        ../../gcc/gcc/lra.c:1657
0x105540c7 push_insns
        ../../gcc/gcc/lra.c:1700
0x10556147 lra_process_new_insns(rtx_insn*, rtx_insn*, rtx_insn*, char const*)
        ../../gcc/gcc/lra.c:1746
0x1056dfcb curr_insn_transform
        ../../gcc/gcc/lra-constraints.c:3940
0x1057024b lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:4427
0x1055482b lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2277
0x104f987b do_reload
        ../../gcc/gcc/ira.c:5393
0x104f987b execute
        ../../gcc/gcc/ira.c:5564
Comment 9 Vladimir Makarov 2016-01-30 00:42:34 UTC
(In reply to Alexandre Oliva from comment #6)
> Created attachment 37498 [details]
> Patch I'm testing to fix the bug
> 
> LRA wants harder than reload to avoid creating a stack slot to satisfy insn
> constraints.  As a result, it creates an additional REG:TI pseudo to reload
> a SUBREG:V2DF of a REG:TI, and then it tries to assign that pseudo to
> VSX_REGS, which in turn causes another reload because there's no way to load
> a TImode value into a VSX_REG in *mov<mode>_ppc64, and that requires
> another, and so on, until the limit on reload insns is exceeded.
> 
> The first problem is that we shouldn't be creating a TImode reload for
> VSX_REGS, since we can't possibly satisfy that: TImode values are not ok for
> VSX_REGS.  I've adjusted in_class_p to check HARD_REGNO_MODE_OK, and that
> put an end to infinite stream of reloads.
> 
> It was still a very long stream, though.  simplify_operand_subreg attempts
> to turn SUBREGs of MEMs into MEMs, but it will only proceed with the
> simplification if the resulting address is at least as valid as the
> original.  
> 
> Alas, instead of the simplification, we end up repeatedly generating reloads
> copying the initial value to stack slots with growing offsets, until the
> offset grows enough that the address becomes invalid, at which point the
> subreg simplification is performed.  That's 2047 excess stores and loads,
> plus insns that compute the stack address for each of them.
> 
> In order to fix that, I amended the test on whether to proceed with the
> subreg simplification to take into account the availability of regs that can
> hold a value of the intended mode in the goal class for that operand.
> 
> With that, we go from 2047 excess stores and loads to only 1.  I couldn't
> figure out yet how to get rid of this one extra store and load, and the
> excess stack slot, but I figured I'd share what I have, that I believe to be
> a solid fix, and save the investigation on an additional LRA improvement for
> later.

Alex, thanks for working on this.  The location of the fix is right.  But I have a smaller fix, more general fix.  Instead of looking at the operand, i propose to look that address will be valid when we use just base reg (LRA can do this transformation later).  Original address is valid as TImode permits such address (reg + offset).  For V2DFmode it is invalid as the mode permits only reg[+reg] address, therefore we fail to remove the subreg.  Adding check that just base address is valid, resolves the issue.

I'll test my patch on several machines and commit with your test if everything is ok.
Comment 10 Vladimir Makarov 2016-01-30 00:47:43 UTC
(In reply to Alexandre Oliva from comment #6)
> Created attachment 37498 [details]
> Patch I'm testing to fix the bug
> 
> LRA wants harder than reload to avoid creating a stack slot to satisfy insn
> constraints.  As a result, it creates an additional REG:TI pseudo to reload
> a SUBREG:V2DF of a REG:TI, and then it tries to assign that pseudo to
> VSX_REGS, which in turn causes another reload because there's no way to load
> a TImode value into a VSX_REG in *mov<mode>_ppc64, and that requires
> another, and so on, until the limit on reload insns is exceeded.
> 
> The first problem is that we shouldn't be creating a TImode reload for
> VSX_REGS, since we can't possibly satisfy that: TImode values are not ok for
> VSX_REGS.  I've adjusted in_class_p to check HARD_REGNO_MODE_OK, and that
> put an end to infinite stream of reloads.
> 
> It was still a very long stream, though.  simplify_operand_subreg attempts
> to turn SUBREGs of MEMs into MEMs, but it will only proceed with the
> simplification if the resulting address is at least as valid as the
> original.  
> 
> Alas, instead of the simplification, we end up repeatedly generating reloads
> copying the initial value to stack slots with growing offsets, until the
> offset grows enough that the address becomes invalid, at which point the
> subreg simplification is performed.  That's 2047 excess stores and loads,
> plus insns that compute the stack address for each of them.
> 
> In order to fix that, I amended the test on whether to proceed with the
> subreg simplification to take into account the availability of regs that can
> hold a value of the intended mode in the goal class for that operand.
> 
> With that, we go from 2047 excess stores and loads to only 1.  I couldn't
> figure out yet how to get rid of this one extra store and load, and the
> excess stack slot, but I figured I'd share what I have, that I believe to be
> a solid fix, and save the investigation on an additional LRA improvement for
> later.

Alex, thanks for working on this.  The location of the fix is right.  But I have a smaller fix, more general fix.  Instead of looking at the operand, i propose to look that address will be valid when we use just base reg (LRA can do this transformation later).  Original address is valid as TImode permits such address (reg + offset).  For V2DFmode it is invalid as the mode permits only reg[+reg] address, therefore we fail to remove the subreg.  Adding check that just base address is valid, resolves the issue.

I'll test my patch on several machines and commit with your test if everything is ok.
(In reply to Markus Trippelsdorf from comment #8)
> Hmm,
> 
> trippels@gcc2-power8 ~ % cat tramp3d-v4.ii
> class Init {
> public:
>   ~Init();
> } a;
> 
> trippels@gcc2-power8 ~ % g++ -flto -mlra tramp3d-v4.ii
> tramp3d-v4.ii: In function ‘__static_initialization_and_destruction_0’:
> tramp3d-v4.ii:4:4: internal compiler error: in lra_set_insn_recog_data, at
> lra.c:964
>  } a;
>     ^
> 0x10553337 lra_set_insn_recog_data(rtx_insn*)
>         ../../gcc/gcc/lra.c:962
> 0x10553b97 lra_get_insn_recog_data
>         ../../gcc/gcc/lra-int.h:486
> 0x10553b97 lra_update_insn_regno_info
>         ../../gcc/gcc/lra.c:1584
> 0x105540c7 lra_update_insn_regno_info
>         ../../gcc/gcc/lra.c:1644
> 0x105540c7 lra_push_insn_1
>         ../../gcc/gcc/lra.c:1649
> 0x105540c7 lra_push_insn
>         ../../gcc/gcc/lra.c:1657
> 0x105540c7 push_insns
>         ../../gcc/gcc/lra.c:1700
> 0x10556147 lra_process_new_insns(rtx_insn*, rtx_insn*, rtx_insn*, char
> const*)
>         ../../gcc/gcc/lra.c:1746
> 0x1056dfcb curr_insn_transform
>         ../../gcc/gcc/lra-constraints.c:3940
> 0x1057024b lra_constraints(bool)
>         ../../gcc/gcc/lra-constraints.c:4427
> 0x1055482b lra(_IO_FILE*)
>         ../../gcc/gcc/lra.c:2277
> 0x104f987b do_reload
>         ../../gcc/gcc/ira.c:5393
> 0x104f987b execute
>         ../../gcc/gcc/ira.c:5564

That is a different issue although it looks the same.  PPC port changed a lot last year and nobody to pay attention to LRA because it is not a default local RA.

The issue is in a new *toc_fusionload_di insn which uses unspec address and LRA tries to reload it instead of just ignoring to do this.  I have a fix which needs some testing.  The fix will be in the final patch.  I hope to commit it on weekend or Monday.
Comment 11 Vladimir Makarov 2016-02-01 23:28:27 UTC
I have patches fixing the two issues but when I started to test the patches I found that LRA actually has >800 additional failures on power8 in comparison with reload.  So I am going to look at this and try to fix this.
Comment 12 Alexandre Oliva 2016-02-03 04:37:50 UTC
Vlad, thanks for taking this over.  Let me just point out, just in case you missed, that I believe it is important for any register allocator to test HARD_REGNO_MODE_OK, and that AFAICT we were not doing that.  Please let me know if I'm mistaken.  Thanks,
Comment 13 Vladimir Makarov 2016-02-03 17:59:06 UTC
Author: vmakarov
Date: Wed Feb  3 17:58:34 2016
New Revision: 233107

URL: https://gcc.gnu.org/viewcvs?rev=233107&root=gcc&view=rev
Log:
2016-02-03  Vladimir Makarov  <vmakarov@redhat.com>
	    Alexandre Oliva  <aoliva@redhat.com>

	PR target/69461
	* lra-constraints.c (simplify_operand_subreg): Check additionally
	address validity after potential reloading.
	(process_address_1): Check insns validity.  In case of failure do
	nothing.

2016-02-03  Vladimir Makarov  <vmakarov@redhat.com>
	    Alexandre Oliva  <aoliva@redhat.com>

	PR target/69461
	* gcc.target/powerpc/pr69461.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/powerpc/pr69461.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-constraints.c
    trunk/gcc/testsuite/ChangeLog
Comment 14 Markus Trippelsdorf 2016-02-03 18:15:57 UTC
Fixed. Thanks.
Firefox now builds fine with -mlra (and -flto) on gcc112.
Comment 15 Vladimir Makarov 2016-02-03 18:18:35 UTC
(In reply to Alexandre Oliva from comment #12)
> Vlad, thanks for taking this over.  Let me just point out, just in case you
> missed, that I believe it is important for any register allocator to test
> HARD_REGNO_MODE_OK, and that AFAICT we were not doing that.  Please let me
> know if I'm mistaken.  Thanks,


No, you are not mistaken.  But the problem is that not always all hard registers of a class can have HARD_REGNO_MODE_OK the same value.  As we don't know yet the exact hard reg at some point of LRA (which is different from reload) and can not guarantee the pseudo will get a specific hard register later.

Actually, there are pros and cons for each solution.
Comment 16 Michael Meissner 2016-02-04 00:40:06 UTC
Author: meissner
Date: Thu Feb  4 00:39:34 2016
New Revision: 233120

URL: https://gcc.gnu.org/viewcvs?rev=233120&root=gcc&view=rev
Log:
2016-02-03  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Vladimir Makarov  <vmakarov@redhat.com>

	PR target/69461
	* config/rs6000/rs6000.c (rs6000_legitimate_address_p): Fix thinko
	in validating fused toc addresses.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/rs6000.c
Comment 17 Alan Modra 2016-02-17 03:36:52 UTC
*** Bug 68959 has been marked as a duplicate of this bug. ***