gcc-7.0.0-alpha20160626 snapshot fails to compile the following testcase w/ -Os -mlra: int om, iz, te; long long int wm, j6; short int nd; void o1 (void) { short int *m9 = &nd; int ty, kc = 2; long long int gi = 0; while (gi != 0) { long long int *dk; int rl = 0; yn: if (j6 / kc != 0) { *m9 = 2; while (*m9 != 0) { m9 = &rl; *m9 = 1; } goto yn; } m9 = &nd; dk = (om != 0) ? &ty : &gi; while (rl != 0) while (te < 1) { while (wm != 0) { ty /= (kc & 1); if (((j6 != 0) + ty) != 0) { nd = rl; if (nd != 0) gi = om = (wm != gi); } while (j6 != 0) { if (wm != 0 && *dk != 0) *dk = kc = 0; wm |= (rl != 0) ? ty : (ty || 0); ++j6; } iz = 2; } rl /= gi; ++te; } } } % powerpc-e500v2-linux-gnuspe-gcc-7.0.0-alpha20160626 -w -c -Os -mlra en3q71xb.c en3q71xb.c: In function 'o1': en3q71xb.c:55:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90)
I'm hitting this on powerpc64le with the following test case: #pragma pack(1) struct { float f0; } a; void foo(int); int main(void) { for (;;) foo((int)a.f0); }
Does a revert of r237277 fix this issue?
anton: What flags for your test case? I fail to reproduce it.
Oh never mind, I forgot -mlra, duh. Confirmed, also on BE, -O2 -mlra fails already.
(In reply to Segher Boessenkool from comment #4) > Oh never mind, I forgot -mlra, duh. > > Confirmed, also on BE, -O2 -mlra fails already. I can't make it fail on 32-bit BE, though. Segher, is your machine powerpc64?
Yes, but it fails with -m32 too.
We have an insn: (insn 32 33 34 3 (set (reg:DI 165) (unspec:DI [ (fix:SI (subreg:SF (reg:SI 160 [ a ]) 0)) ] UNSPEC_FCTIWZ)) 71680.c:11 334 {fctiwz_sf} (expr_list:REG_DEAD (reg:SI 160 [ a ]) (nil))) 160 is allocated memory by IRA. LRA does: Changing pseudo 160 in operand 1 of insn 32 on equiv [r162:SI] Creating newreg=167, assigning class ALL_REGS to subreg reg r167 32: r165:DI=unspec[fix(r167:SI#0)] 7 REG_DEAD r160:SI Inserting subreg reload before: 37: r167:SI=[r162:SI] and from then on it keeps loading 167 into another (new) SImode reg, which never magically becomes a float reg ;-)
From -m32 -O2 dumps of Anton's testcase: (insn 8 9 33 3 (set (reg:SI 160 [ a ]) (mem/c:SI (reg/f:SI 162) [1 a+0 S4 A8])) /home/alan/src/tmp/pr71680.c:13 464 {*movsi_internal1} (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 162) [1 a+0 S4 A8]) (expr_list:REG_EQUAL (mem/c:SI (symbol_ref:SI ("a") [flags 0x84] <var_decl 0x7f098caad870 a>) [1 a+0 S4 A8]) (nil)))) Looking at the insn that loads reg 160 above, you'll notice movsi_internal1, which doesn't have an alternative to allow an fpr as is required by insn 32. What's more, SImode isn't allowed in fprs (see rs6000_hard_regno_mode_ok) so it doesn't make sense to add such an alternative. What happens in reload is that reg 160 equiv mem is substituted into insn 32 then reload cleverly reloads the subreg: Reloads for insn # 32 Reload 0: reload_in (SF) = (mem/c:SF (reg/f:SI 31 31 [162]) [1 a+0 S4 A8]) FLOAT_REGS, RELOAD_FOR_INPUT (opnum = 1), can't combine reload_in_reg: (subreg:SF (reg:SI 160 [ a ]) 0) reload_reg_rtx: (reg:SF 44 12) So we load from mem in SFmode, and insn 8 is deleted. lra apparently doesn't use the trick of changing to the mode of the subreg.
lra doesn't load in SFmode due to the following condition in lra-constraints.c:simplify_operand_subreg /* If we change address for paradoxical subreg of memory, the address might violate the necessary alignment or the access might be slow. So take this into consideration. We should not worry about access beyond allocated memory for paradoxical memory subregs as we don't substitute such equiv memory (see processing equivalences in function lra_constraints) and because for spilled pseudos we allocate stack memory enough for the biggest corresponding paradoxical subreg. */ if (MEM_P (reg) && (! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)) || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode))) MEM_ALIGN here is 8 bits (from #pragma pack, and yes, the mem really is only byte aligned), and rs6000.h does say that this access might be slow if in SFmode. It's true that an unaligned floating point storage access on power might cause an alignment trap, so leaving aside the issue that mode only maps loosely to register class, I think the rs6000.h definition of SLOW_UNALIGNED_ACCESS is correct and lra is doing the right thing here. Reload is wrong to use a fp load (if mem align was always accurate, which it isn't). Hmm, a change that doesn't cure this problem, but the condition might be better as if (MEM_P (reg) && (! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)) || SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)) || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode))) ie. if both innermode and mode are slow then lra may as well go ahead and use the subreg mode.
Arseny, I could not reproduce the problem using your testcase, and I tried a dozen or so revisions around 20160626 buiding powerpc-e500v2-linux-gnuspe cross-compilers on an x86_64-linux host. Please specify the svn revision number or git commit that you used, and your gcc configure parameters.
(In reply to Alan Modra from comment #10) > Arseny, I could not reproduce the problem using your testcase, and I tried a > dozen or so revisions around 20160626 buiding powerpc-e500v2-linux-gnuspe > cross-compilers on an x86_64-linux host. Please specify the svn revision > number or git commit that you used, and your gcc configure parameters. I test weekly gcc snapshots from ftp://gcc.gnu.org/pub/gcc/snapshots. I'm still able to reproduce it w/ 7.0.0_alpha20160731. svn revisions for 20160626 and 20160731 are r237793 and r238930, respectively, according to [1,2]. % powerpc-e500v2-linux-gnuspe-gcc-7.0.0-alpha20160731 -v Using built-in specs. COLLECT_GCC=powerpc-e500v2-linux-gnuspe-gcc-7.0.0-alpha20160731 COLLECT_LTO_WRAPPER=/usr/libexec/gcc/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/lto-wrapper Target: powerpc-e500v2-linux-gnuspe Configured with: /var/tmp/portage/cross-powerpc-e500v2-linux-gnuspe/gcc-7.0.0_alpha20160731/work/gcc-7-20160731/configure --host=x86_64-pc-linux-gnu --target=powerpc-e500v2-linux-gnuspe --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/powerpc-e500v2-linux-gnuspe/gcc-bin/7.0.0-alpha20160731 --includedir=/usr/lib/gcc/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/include --datadir=/usr/share/gcc-data/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731 --mandir=/usr/share/gcc-data/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/man --infodir=/usr/share/gcc-data/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/info --with-gxx-include-dir=/usr/lib/gcc/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/include/g++-v7 --with-python-dir=/share/gcc-data/powerpc-e500v2-linux-gnuspe/7.0.0-alpha20160731/python --enable-languages=c,c++ --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --disable-nls --enable-checking=yes --enable-libstdcxx-time --enable-poison-system-directories --with-sysroot=/usr/powerpc-e500v2-linux-gnuspe --disable-bootstrap --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-altivec --disable-fixed-point --enable-e500-double --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --disable-libmpx --disable-vtable-verify --disable-libvtv --disable-libquadmath --enable-lto --with-isl --disable-isl-version-check --enable-libsanitizer --enable-default-pie --enable-default-ssp Thread model: posix gcc version 7.0.0-alpha20160731 20160731 (experimental) (GCC) I can attach RTL dumps if necessary. [1] https://gcc.gnu.org/ml/gcc/2016-06/msg00155.html [2] https://gcc.gnu.org/ml/gcc/2016-07/msg00253.html
Bug reproduced on powerpc-e500v2-linux-gnuspe. The testcase needs -Os -mlra -fstack-protector -fPIC, or you need to configure gcc so that -fPIC and -fstack-protector are on by default. I haven't yet tested whether the lra patch I posted for comment #1 testcase also fixes the e500 testcase.
The e500 issue is quite different, and is not fixed by my lra patch. From the lra dump, Creating newreg=436, assigning class NO_REGS to save r436 536: r192:SI=0x1 REG_EQUAL 0x1 Add reg<-save after: 621: r362:SI#0=r436:DF 184: NOTE_INSN_BASIC_BLOCK 21 Add save<-reg after: 620: r436:DF=r362:SI#0 So r362 is being saved for some reason, then later: Reassigning non-reload pseudos Assign 66 to r362 (freq=4000) so r362 is finding its way into ctr.
Created attachment 39056 [details] save SImode regs in SImode Arseny, you might like to try this. I don't have the means at the moment to properly test e500 support (ie. run the gcc testsuite) without building a whole lot of infrastructure.
(In reply to Alan Modra from comment #14) > Arseny, you might like to try this. I don't have the means at the moment to > properly test e500 support (ie. run the gcc testsuite) without building a > whole lot of infrastructure. This patch fixed the original issue and survived regression testing w/o introducing any new regression.
Author: amodra Date: Wed Aug 10 05:43:36 2016 New Revision: 239317 URL: https://gcc.gnu.org/viewcvs?rev=239317&root=gcc&view=rev Log: [RS6000] e500 part of pr71680 The fallback part of HARD_REGNO_CALLER_SAVE_MODE, choose_hard_reg_mode, returns DFmode for SImode when TARGET_E500_DOUBLE. This confuses lra when attempting to save ctr around a call. PR target/71680 * config/rs6000/rs6000.h (HARD_REGNO_CALLER_SAVE_MODE): Return SImode for TARGET_E500_DOUBLE when given SImode. Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.h
Author: amodra Date: Wed Aug 10 23:12:11 2016 New Revision: 239342 URL: https://gcc.gnu.org/viewcvs?rev=239342&root=gcc&view=rev Log: [LRA] Reload of slow mems pr71680.c -m64 -O1 -mlra, ira output showing two problem insns. (insn 7 5 26 3 (set (reg:SI 159 [ a ]) (mem/c:SI (reg/f:DI 158) [1 a+0 S4 A8])) pr71680.c:13 464 {*movsi_internal1} (expr_list:REG_EQUIV (mem/c:SI (reg/f:DI 158) [1 a+0 S4 A8]) (nil))) (insn 26 7 27 3 (set (reg:DI 162) (unspec:DI [ (fix:SI (subreg:SF (reg:SI 159 [ a ]) 0)) ] UNSPEC_FCTIWZ)) pr71680.c:13 372 {fctiwz_sf} (expr_list:REG_DEAD (reg:SI 159 [ a ]) (nil))) Insn 26 requires that reg 159 be of class FLOAT_REGS. first lra action: deleting insn with uid = 7. Changing pseudo 159 in operand 1 of insn 26 on equiv [r158:DI] Creating newreg=164, assigning class ALL_REGS to subreg reg r164 26: r162:DI=unspec[fix(r164:SI#0)] 7 REG_DEAD r159:SI Inserting subreg reload before: 30: r164:SI=[r158:DI] [snip] Change to class FLOAT_REGS for r164 Well, that didn't do much. lra tried the equiv mem, found that didn't work, and had to reload. Effectively getting back to the two original insns but r159 replaced with r164. simplify_operand_subreg did not do anything in this case because SLOW_UNALIGNED_ACCESS was true (wrongly for power8, but that's beside the point). So now we have, using abbreviated rtl notation: r164:SI=[r158:DI] r162:DI=unspec[fix(r164:SI)] The problem here is that the first insn isn't valid, due to the rs6000 backend not supporting SImode in fprs, and r164 must be an fpr to make the second insn valid. next lra action: Creating newreg=165 from oldreg=164, assigning class GENERAL_REGS to r165 30: r165:SI=[r158:DI] Inserting insn reload after: 31: r164:SI=r165:SI so now we have r165:SI=[r158:DI] r164:SI=r165:SI r162:DI=unspec[fix(r164:SI)] This ought to be good on power8, except for one little thing. r165 is GENERAL_REGS so the first insn is good, a gpr load from mem. r164 is FLOAT_REGS, making the last insn good, a fctiwz. The second insn ought to be a sldi, mtvsrd, xscvspdpn combination, but that is only supported for SFmode. So lra continue on reloading the second insn, but in vain because it never tries anything other than SImode and as noted above, SImode is not valid in fprs. What this patch does is arrange to emit the two reloads needed for the SLOW_UNALIGNED_ACCESS case at once, moving the subreg to the second insn in order to switch modes, producing: r164:SI=[r158:DI] r165:SF=r164:SI#0 r162:DI=unspec[fix(r165:SF)] I've also tidied a couple of other things: 1) "old" is unnecessary as it duplicated "operand". 2) Rejecting mem subregs due to SLOW_UNALIGNED_ACCESS only makes sense if the original mode was not slow. PR target/71680 * lra-constraints.c (simplify_operand_subreg): Allow subreg mode for mem when SLOW_UNALIGNED_ACCESS if inner mode is also slow. Emit two reloads for slow mem case, first loading in fast innermode, then converting to required mode. testsuite/ * gcc.target/powerpc/pr71680.c: New. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c trunk/gcc/testsuite/ChangeLog
Fixed, and the missing testcase went in rev 239343