Bug 71680 - [7 Regression] ICE: Max. number of generated reload insns per insn is achieved (90) w/ -Os -mlra
Summary: [7 Regression] ICE: Max. number of generated reload insns per insn is achieve...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: 7.0
Assignee: Alan Modra
URL: https://gcc.gnu.org/ml/gcc-patches/20...
Keywords: patch, ra
Depends on:
Blocks:
 
Reported: 2016-06-28 07:48 UTC by Arseny Solokha
Modified: 2016-08-10 23:18 UTC (History)
6 users (show)

See Also:
Host:
Target: powerpc-e500v2-linux-gnuspe powerpc64*-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-06-28 00:00:00


Attachments
save SImode regs in SImode (472 bytes, patch)
2016-08-05 09:53 UTC, Alan Modra
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Arseny Solokha 2016-06-28 07:48:05 UTC
gcc-7.0.0-alpha20160626 snapshot fails to compile the following testcase w/ -Os -mlra:

int om, iz, te;
long long int wm, j6;
short int nd;

void
o1 (void)
{
  short int *m9 = &nd;
  int ty, kc = 2;
  long long int gi = 0;

  while (gi != 0)
    {
      long long int *dk;
      int rl = 0;

 yn:
      if (j6 / kc != 0)
        {
          *m9 = 2;
          while (*m9 != 0)
            {
              m9 = &rl;
              *m9 = 1;
            }
          goto yn;
        }
      m9 = &nd;
      dk = (om != 0) ? &ty : &gi;
      while (rl != 0)
        while (te < 1)
          {
            while (wm != 0)
              {
                ty /= (kc & 1);
                if (((j6 != 0) + ty) != 0)
                  {
                    nd = rl;
                    if (nd != 0)
                      gi = om = (wm != gi);
                  }
                while (j6 != 0)
                  {
                    if (wm != 0 && *dk != 0)
                      *dk = kc = 0;
                    wm |= (rl != 0) ? ty : (ty || 0);
                    ++j6;
                  }
                iz = 2;
              }
            rl /= gi;
            ++te;
          }
    }
}

% powerpc-e500v2-linux-gnuspe-gcc-7.0.0-alpha20160626 -w -c -Os -mlra en3q71xb.c 
en3q71xb.c: In function 'o1':
en3q71xb.c:55:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90)
Comment 1 Anton Blanchard 2016-06-28 10:21:21 UTC
I'm hitting this on powerpc64le with the following test case:

#pragma pack(1)
struct {
        float f0;
} a;

void foo(int);

int main(void)
{
        for (;;)
                foo((int)a.f0);
}
Comment 2 Jiong Wang 2016-06-28 10:26:32 UTC
Does a revert of r237277 fix this issue?
Comment 3 Segher Boessenkool 2016-06-28 10:34:56 UTC
anton: What flags for your test case?  I fail to reproduce it.
Comment 4 Segher Boessenkool 2016-06-28 10:36:43 UTC
Oh never mind, I forgot -mlra, duh.

Confirmed, also on BE, -O2 -mlra fails already.
Comment 5 Arseny Solokha 2016-06-28 10:38:28 UTC
(In reply to Segher Boessenkool from comment #4)
> Oh never mind, I forgot -mlra, duh.
> 
> Confirmed, also on BE, -O2 -mlra fails already.

I can't make it fail on 32-bit BE, though. Segher, is your machine powerpc64?
Comment 6 Segher Boessenkool 2016-06-28 10:43:09 UTC
Yes, but it fails with -m32 too.
Comment 7 Segher Boessenkool 2016-06-28 10:57:13 UTC
We have an insn:

(insn 32 33 34 3 (set (reg:DI 165)
        (unspec:DI [
                (fix:SI (subreg:SF (reg:SI 160 [ a ]) 0))
            ] UNSPEC_FCTIWZ)) 71680.c:11 334 {fctiwz_sf}
     (expr_list:REG_DEAD (reg:SI 160 [ a ])
        (nil)))

160 is allocated memory by IRA.  LRA does:

Changing pseudo 160 in operand 1 of insn 32 on equiv [r162:SI]
      Creating newreg=167, assigning class ALL_REGS to subreg reg r167
   32: r165:DI=unspec[fix(r167:SI#0)] 7
      REG_DEAD r160:SI
    Inserting subreg reload before:
   37: r167:SI=[r162:SI]

and from then on it keeps loading 167 into another (new) SImode reg,
which never magically becomes a float reg ;-)
Comment 8 Alan Modra 2016-07-27 14:14:35 UTC
From -m32 -O2 dumps of Anton's testcase:

(insn 8 9 33 3 (set (reg:SI 160 [ a ])
        (mem/c:SI (reg/f:SI 162) [1 a+0 S4 A8])) /home/alan/src/tmp/pr71680.c:13 464 {*movsi_internal1}
     (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 162) [1 a+0 S4 A8])
        (expr_list:REG_EQUAL (mem/c:SI (symbol_ref:SI ("a") [flags 0x84]  <var_decl 0x7f098caad870 a>) [1 a+0 S4 A8])
            (nil))))

Looking at the insn that loads reg 160 above, you'll notice movsi_internal1, which doesn't have an alternative to allow an fpr as is required by insn 32.  What's more, SImode isn't allowed in fprs (see rs6000_hard_regno_mode_ok) so it doesn't make sense to add such an alternative.

What happens in reload is that reg 160 equiv mem is substituted into insn 32 then reload cleverly reloads the subreg:
Reloads for insn # 32
Reload 0: reload_in (SF) = (mem/c:SF (reg/f:SI 31 31 [162]) [1 a+0 S4 A8])
        FLOAT_REGS, RELOAD_FOR_INPUT (opnum = 1), can't combine
        reload_in_reg: (subreg:SF (reg:SI 160 [ a ]) 0)
        reload_reg_rtx: (reg:SF 44 12)

So we load from mem in SFmode, and insn 8 is deleted.  lra apparently doesn't use the trick of changing to the mode of the subreg.
Comment 9 Alan Modra 2016-07-28 01:48:59 UTC
lra doesn't load in SFmode due to the following condition in lra-constraints.c:simplify_operand_subreg

  /* If we change address for paradoxical subreg of memory, the
     address might violate the necessary alignment or the access might
     be slow.  So take this into consideration.  We should not worry
     about access beyond allocated memory for paradoxical memory
     subregs as we don't substitute such equiv memory (see processing
     equivalences in function lra_constraints) and because for spilled
     pseudos we allocate stack memory enough for the biggest
     corresponding paradoxical subreg.  */
  if (MEM_P (reg)
      && (! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg))
	  || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode)))

MEM_ALIGN here is 8 bits (from #pragma pack, and yes, the mem really is only byte aligned), and rs6000.h does say that this access might be slow if in SFmode.  It's true that an unaligned floating point storage access on power might cause an alignment trap, so leaving aside the issue that mode only maps loosely to register class, I think the rs6000.h definition of SLOW_UNALIGNED_ACCESS is correct and lra is doing the right thing here.  Reload is wrong to use a fp load (if mem align was always accurate, which it isn't).

Hmm, a change that doesn't cure this problem, but the condition might be better as

  if (MEM_P (reg)
      && (! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg))
	  || SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))
	  || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode)))

ie. if both innermode and mode are slow then lra may as well go ahead and use the subreg mode.
Comment 10 Alan Modra 2016-07-30 02:44:43 UTC
Arseny, I could not reproduce the problem using your testcase, and I tried a dozen or so revisions around 20160626 buiding powerpc-e500v2-linux-gnuspe cross-compilers on an x86_64-linux host.  Please specify the svn revision number or git commit that you used, and your gcc configure parameters.
Comment 11 Arseny Solokha 2016-08-01 03:21:58 UTC
(In reply to Alan Modra from comment #10)
> Arseny, I could not reproduce the problem using your testcase, and I tried a
> dozen or so revisions around 20160626 buiding powerpc-e500v2-linux-gnuspe
> cross-compilers on an x86_64-linux host.  Please specify the svn revision
> number or git commit that you used, and your gcc configure parameters.

I test weekly gcc snapshots from ftp://gcc.gnu.org/pub/gcc/snapshots. I'm still able to reproduce it w/ 7.0.0_alpha20160731. svn revisions for 20160626 and 20160731 are r237793 and r238930, respectively, according to [1,2].

% powerpc-e500v2-linux-gnuspe-gcc-7.0.0-alpha20160731 -v                     
Using built-in specs.
COLLECT_GCC=powerpc-e500v2-linux-gnuspe-gcc-7.0.0-alpha20160731
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/lto-wrapper
Target: powerpc-e500v2-linux-gnuspe
Configured with: /var/tmp/portage/cross-powerpc-e500v2-linux-gnuspe/gcc-7.0.0_alpha20160731/work/gcc-7-20160731/configure --host=x86_64-pc-linux-gnu --target=powerpc-e500v2-linux-gnuspe --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/powerpc-e500v2-linux-gnuspe/gcc-bin/7.0.0-alpha20160731 --includedir=/usr/lib/gcc/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/include --datadir=/usr/share/gcc-data/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731 --mandir=/usr/share/gcc-data/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/man --infodir=/usr/share/gcc-data/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/info --with-gxx-include-dir=/usr/lib/gcc/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/include/g++-v7 --with-python-dir=/share/gcc-data/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/python --enable-languages=c,c++ --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --disable-nls --enable-checking=yes --enable-libstdcxx-time --enable-poison-system-directories --with-sysroot=/usr/powerpc-e500v2-linux-gnuspe --disable-bootstrap --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-altivec --disable-fixed-point --enable-e500-double --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --disable-libmpx --disable-vtable-verify --disable-libvtv --disable-libquadmath --enable-lto --with-isl --disable-isl-version-check --enable-libsanitizer --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 7.0.0-alpha20160731 20160731 (experimental) (GCC)

I can attach RTL dumps if necessary.


[1] https://gcc.gnu.org/ml/gcc/2016-06/msg00155.html
[2] https://gcc.gnu.org/ml/gcc/2016-07/msg00253.html
Comment 12 Alan Modra 2016-08-02 13:48:51 UTC
Bug reproduced on powerpc-e500v2-linux-gnuspe.  The testcase needs -Os -mlra -fstack-protector -fPIC, or you need to configure gcc so that -fPIC and -fstack-protector are on by default.  I haven't yet tested whether the lra patch I posted for comment #1 testcase also fixes the e500 testcase.
Comment 13 Alan Modra 2016-08-03 01:58:38 UTC
The e500 issue is quite different, and is not fixed by my lra patch.  From the lra dump,

      Creating newreg=436, assigning class NO_REGS to save r436
  536: r192:SI=0x1
      REG_EQUAL 0x1
    Add reg<-save after:
  621: r362:SI#0=r436:DF

  184: NOTE_INSN_BASIC_BLOCK 21
    Add save<-reg after:
  620: r436:DF=r362:SI#0

So r362 is being saved for some reason, then later:

  Reassigning non-reload pseudos
           Assign 66 to r362 (freq=4000)

so r362 is finding its way into ctr.
Comment 14 Alan Modra 2016-08-05 09:53:26 UTC
Created attachment 39056 [details]
save SImode regs in SImode

Arseny, you might like to try this.  I don't have the means at the moment to properly test e500 support (ie. run the gcc testsuite) without building a whole lot of infrastructure.
Comment 15 Arseny Solokha 2016-08-09 07:01:16 UTC
(In reply to Alan Modra from comment #14)
> Arseny, you might like to try this.  I don't have the means at the moment to
> properly test e500 support (ie. run the gcc testsuite) without building a
> whole lot of infrastructure.

This patch fixed the original issue and survived regression testing w/o introducing any new regression.
Comment 16 Alan Modra 2016-08-10 05:44:08 UTC
Author: amodra
Date: Wed Aug 10 05:43:36 2016
New Revision: 239317

URL: https://gcc.gnu.org/viewcvs?rev=239317&root=gcc&view=rev
Log:
[RS6000] e500 part of pr71680

The fallback part of HARD_REGNO_CALLER_SAVE_MODE, choose_hard_reg_mode,
returns DFmode for SImode when TARGET_E500_DOUBLE.  This confuses
lra when attempting to save ctr around a call.

	PR target/71680
	* config/rs6000/rs6000.h (HARD_REGNO_CALLER_SAVE_MODE): Return
	SImode for TARGET_E500_DOUBLE when given SImode.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/rs6000.h
Comment 17 Alan Modra 2016-08-10 23:12:43 UTC
Author: amodra
Date: Wed Aug 10 23:12:11 2016
New Revision: 239342

URL: https://gcc.gnu.org/viewcvs?rev=239342&root=gcc&view=rev
Log:
[LRA] Reload of slow mems

pr71680.c -m64 -O1 -mlra, ira output showing two problem insns.
(insn 7 5 26 3 (set (reg:SI 159 [ a ])
        (mem/c:SI (reg/f:DI 158) [1 a+0 S4 A8])) pr71680.c:13 464 {*movsi_internal1}
     (expr_list:REG_EQUIV (mem/c:SI (reg/f:DI 158) [1 a+0 S4 A8])
        (nil)))
(insn 26 7 27 3 (set (reg:DI 162)
        (unspec:DI [
                (fix:SI (subreg:SF (reg:SI 159 [ a ]) 0))
            ] UNSPEC_FCTIWZ)) pr71680.c:13 372 {fctiwz_sf}
     (expr_list:REG_DEAD (reg:SI 159 [ a ])
        (nil)))
Insn 26 requires that reg 159 be of class FLOAT_REGS.

first lra action:
deleting insn with uid = 7.
Changing pseudo 159 in operand 1 of insn 26 on equiv [r158:DI]
      Creating newreg=164, assigning class ALL_REGS to subreg reg r164
   26: r162:DI=unspec[fix(r164:SI#0)] 7
      REG_DEAD r159:SI
    Inserting subreg reload before:
   30: r164:SI=[r158:DI]
[snip]
      Change to class FLOAT_REGS for r164

Well, that didn't do much.  lra tried the equiv mem, found that didn't
work, and had to reload.  Effectively getting back to the two original
insns but r159 replaced with r164.  simplify_operand_subreg did not do
anything in this case because SLOW_UNALIGNED_ACCESS was true (wrongly
for power8, but that's beside the point).  So now we have, using
abbreviated rtl notation:
r164:SI=[r158:DI]
r162:DI=unspec[fix(r164:SI)]
The problem here is that the first insn isn't valid, due to the rs6000
backend not supporting SImode in fprs, and r164 must be an fpr to make
the second insn valid.

next lra action:
      Creating newreg=165 from oldreg=164, assigning class GENERAL_REGS to r165
   30: r165:SI=[r158:DI]
    Inserting insn reload after:
   31: r164:SI=r165:SI
so now we have
r165:SI=[r158:DI]
r164:SI=r165:SI
r162:DI=unspec[fix(r164:SI)]

This ought to be good on power8, except for one little thing.
r165 is GENERAL_REGS so the first insn is good, a gpr load from mem.
r164 is FLOAT_REGS, making the last insn good, a fctiwz.
The second insn ought to be a sldi, mtvsrd, xscvspdpn combination, but
that is only supported for SFmode.  So lra continue on reloading the
second insn, but in vain because it never tries anything other than
SImode and as noted above, SImode is not valid in fprs.

What this patch does is arrange to emit the two reloads needed for the
SLOW_UNALIGNED_ACCESS case at once, moving the subreg to the second
insn in order to switch modes, producing:

r164:SI=[r158:DI]
r165:SF=r164:SI#0
r162:DI=unspec[fix(r165:SF)]

I've also tidied a couple of other things:
1) "old" is unnecessary as it duplicated "operand".
2) Rejecting mem subregs due to SLOW_UNALIGNED_ACCESS only makes sense
if the original mode was not slow.

	PR target/71680
	* lra-constraints.c (simplify_operand_subreg): Allow subreg
	mode for mem when SLOW_UNALIGNED_ACCESS if inner mode is also
	slow.  Emit two reloads for slow mem case, first loading in
	fast innermode, then converting to required mode.
testsuite/
	* gcc.target/powerpc/pr71680.c: New.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-constraints.c
    trunk/gcc/testsuite/ChangeLog
Comment 18 Alan Modra 2016-08-10 23:18:36 UTC
Fixed, and the missing testcase went in rev 239343