A client test case ICEs in reload when compiled with -mcpu=G5 and -O1 (and above) for apple-ppc-darwin. This happens with mainline and tree-ssa gcc. But it may or may not fail with some compilers (behavior is quite random and ICE happens if there is no hard register available during reload, forcing store into temporary stack). Code which causes this problem is (of course this sample does not ICE): typedef unsigned long clock_t; clock_t clock(); void foo() { clock_t clock_start; clock_start=(((double)clock())/((double)(100))); } Problem is that gcc3.5 generates this RTL pattern for apple-ppc-dawin: (insn 30 29 32 2 (set (subreg:DI (reg/v:SI 142 [ clock_start ]) 0) (fix:DI (reg:DF 158))) 216 {fix_truncdfdi2} (insn_list 28 (nil)) (expr_list:REG_DEAD (reg:DF 158) (nil))) Reload initially assigns hard register f0 to pseudo register 158, thus the pattern: (insn 30 29 32 2 (set (subreg:DI (reg/v:SI 142 [ clock_start ]) 0) (fix:DI (reg:DF 32 f0 [158]))) 216 {fix_truncdfdi2} (insn_list 28 (nil)) (expr_list:REG_DEAD (reg:DF 32 f0 [158]) (nil))) When there is no hard register available for pseudo reg 142, reload will allocate memory on stack to do a store/load operation. But to do so, reload generates the following pattern: (subreg:SI (reg:DI 32 f0) 4) This is wrong. There is no movsi of an FPR. gcc then ICEs on: error: unrecognizable insn: (insn 247 30 32 2 (set (mem:SI (plus:SI (reg/f:SI 1 r1) (const_int 492 [0x1ec])) [0 clock_start+0 S4 A8]) (subreg:SI (reg:DI 32 f0) 4)) -1 (nil) (nil))
Of course the code does not even give a sub register: (insn 13 12 15 0 pr15286.c:9 (set (reg:DI 126) (fix:DI (reg:DF 124))) 216 {fix_truncdfdi2} (insn_list 12 (nil)) (expr_list:REG_DEAD (reg:DF 124) (nil))) I cannot think of why the subregister is comming up in the first place. But then again it could be this Apple LOCAL patch which is causing it: - /* APPLE LOCAL assign_stack_local_with_alias is used instead of - assign_stack_temp to get better scheduling, at the cost of some - stack space. */ - rtx mem = assign_stack_local_with_alias (DImode, GET_MODE_SIZE (DImode), - GET_MODE_ALIGNMENT (DImode)); + rtx mem = assign_stack_temp (DImode, GET_MODE_SIZE (DImode), 0); But then again we cannot test with a testcase which ICEs. I cannot think of any reason why there is a sub register there at all. I think you need to figure out where the sub register is coming from and then reduce the testcase around that.
This happens with FSF mainline and tree-ssa. There is no APPLE LOCAL involved. Problem happens in reload phase when it decides that there are no hard register available. I may be able to attach original user code if I get the permission.
I wonder if this is related at all to PR 15289.
Well, I tried lots of things. I have a 'partial' solution for this problem. Bad pattern is produced in routine gen_reload for the following arguments: in: (reg:DI 32 f0) out: (subreg:DI (reg/v:SI 142 [ clock_start ]) 0) resulting in the generation of bad pattern: (insn 247 30 32 2 (set (reg/v:SI 142 [ clock_start ]) (subreg:SI (reg:DI 32 f0) 4)) -1 (nil) (nil)) I fix it by going to memory to move f0 to subreg:DI (reg/v:SI 142 [ clock_start ]) instead: (insn 248 0 249 (set (mem:DI (plus:SI (reg/f:SI 1 r1) (const_int 480 [0x1e0])) [0 S8 A8]) (reg:DI 32 f0)) -1 (nil) (nil)) (insn 249 248 0 (set (subreg:DI (reg/v:SI 142 [ clock_start ]) 0) (mem:DI (plus:SI (reg/f:SI 1 r1) (const_int 480 [0x1e0])) [0 S8 A8])) -1 (nil) (nil)) But there is still a problem. I get an ICE later on because I increased the stack size for the temporary. I also noticed that a similar code is there in the same routine to do a similar thing. How can I create a temporary such that later assertion on stack size is not violated? If this cannot be done, then how come, there is code further down in the same routine which does exactly that? Here is the diff for what it is worth (ignore that checks in the if-condition. It can be sanitized later on). Index: reload1.c ============================================================ ======= RCS file: /cvs/gcc/gcc/gcc/reload1.c,v retrieving revision 1.346.2.33 diff -c -p -r1.346.2.33 reload1.c *** reload1.c 5 Mar 2004 23:48:16 -0000 1.346.2.33 --- reload1.c 5 May 2004 00:33:41 -0000 --- 7344,7350 ---- Returns first insn emitted. */ + rtx gen_reload (rtx out, rtx in, int opnum, enum reload_type type) { *************** gen_reload (rtx out, rtx in, int opnum, *** 7357,7362 **** --- 7358,7374 ---- > GET_MODE_SIZE (GET_MODE (SUBREG_REG (in)))) && (tem = gen_lowpart_common (GET_MODE (SUBREG_REG (in)), out)) != 0) in = SUBREG_REG (in), out = tem; + else if (GET_CODE (out) == SUBREG + && GET_MODE (out) == DImode + && GET_CODE (in) == REG + && GET_MODE (in) == DImode + && FP_REGNO_P (REGNO (in))) + { + /* Get the memory to use and rewrite both registers to its mode. */ + rtx loc = get_secondary_mem (in, GET_MODE (out), opnum, type); + gen_reload (loc, in, opnum, type); + in = loc; + } else if (GET_CODE (out) == SUBREG && (GET_MODE_SIZE (GET_MODE (out)) > GET_MODE_SIZE (GET_MODE (SUBREG_REG (out))))
Created attachment 6260 [details] Test case to reproduce this bug
I think it is caused by the patch mentioned here: <http://gcc.gnu.org/ml/gcc-patches/2002-07/ msg00087.html>, even RTH warns about the approach: <http://gcc.gnu.org/ml/gcc-patches/2002 -07/msg00092.html> which seems to be the problem. Reducing the test case is very randomly done. But confirmed always.
And here is the reduced source: int clock(void); extern int rep_err_count, rep_err_line[10], y[30]; extern const char *rep_err_file[10]; int daxpy (int mg,int a, const int y); int dmatmul_minus (int mg, int tl, const int x, const int M, const int y); int LinearResiduum (int level, int x, int b, int A); int GMRESSolver (int level, int x, int b, int A, int theMG, int *v) { int it,i1; unsigned clock_start; clock_start=clock ()*1.0; for (it=0; it<level; it++) { for (i1=0; i1<=level; i1++) { if (daxpy (theMG,y[i1],v[i1])!= 0) { rep_err_line[1] = 0; rep_err_file[2] = ""; rep_err_count = (rep_err_count+1)%5; return (1); } } if (dmatmul_minus (theMG,level,b,x,A) || LinearResiduum (level,x,b,A)) { rep_err_count = (rep_err_count+1)%5; return (1); } } return clock ()*1.0 - clock_start; } Oh reverting part of the patch I mentioned did not work.
Created attachment 6512 [details] new semi-reduced Since the old reduced source no longer ICE, I figured I would reduce the source some. Attached is the new reduced source which does ICE on the mainline built an hour ago.
Created attachment 7353 [details] A new test case for this bug. Compile this attached file with -O1 -mcpu=G5 with gcc-4.0 build for apple-ppc-darwin: % mygccm5o -c -O1 -mcpu=G5 l.c loader_obj.c: In function 'read_obj': loader_obj.c:392: error: unrecognizable insn: (insn 1054 561 563 49 (set (mem:SI (plus:SI (reg/f:SI 1 r1) (const_int 668 [0x29c])) [0 allocednf+0 S4 A8]) (subreg:SI (reg:DI 32 f0) 4)) -1 (nil) (nil)) loader_obj.c:392: internal compiler error: in extract_insn, at /recog.c:2034 Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://gcc.gnu.org/bugs.html> for instructions.
As I understand it, generation of subregs the hardware does not support should be prevented by the CANNOT_CHANGE_MODE_CLASS mechanism. Reload would reload the full inner reg into a register that allows the desired mode change instead. Could you try to find out why this doesn't work? Does reload fail to call CANNOT_CHANGE_MODE_CLASS somewhere? Or is the rs6000 definition of that macro incorrect?
Subject: Re: ICE cause by reload (subreg:SI (reg:DI)) normally isn't a problem, except when reg:DI is assigned to an FPR. If reg:DI was assigned to an FPR, CLASS probably is NON_SPECIAL_REGS (encompassing both GPRs and FPRs), which would make the result from CANNOT_CHANGE_MODE_CLASS correct. I'm not sure what problems would be created if one redefined CANNOT_CHANGE_MODE_CLASS to return false for that super-class. The problem is that the pseudo was assigned to an FPR, which is valid for the CLASS, but not for the instruction. One way of looking at it is that regclass should not have widened the CLASS to include a sub-class for which CANNOT_CHANGE_MODE_CLASS is true.
On s390, CANNOT_CHANGE_MODE_CLASS does indeed return false for such superclasses: #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ ? reg_classes_intersect_p (FP_REGS, CLASS) : 0) This appears to be working fine ... As I understand, regclass does indeed avoid widening the classs to one for which CANNOT_CHANGE_MODE_CLASS returns true; it does not check sub- classes.
Created attachment 7397 [details] reduced testcase for 4.0.0 (20041021) Here is a new reduced testcase which I got down to 73 lines.
pr15286.c:72: error: unrecognizable insn: (insn 496 167 381 0 (set (mem:SI (plus:SI (reg/f:SI 1 r1) (const_int 204 [0xcc])) [15 clock_start+0 S4 A8]) (subreg:SI (reg:DI 32 f0) 4)) -1 (nil) (nil)) insn 496 is generated by reload, but it is generated for insn 167 (insn:HI 167 427 381 0 (set (subreg:DI (reg/v:SI 279 [ clock_start ]) 0) (fix:DI (reg:DF 294))) 239 {fix_truncdfdi2} (insn_list:REG_DEP_TRUE 484 (nil)) (expr_list:REG_DEAD (reg:DF 294) (nil))) regclass appears to be choosing a correct register class Register 279 costs: BASE_REGS:2620000 GENERAL_REGS:2620000 LINK_REGS:2620474 CTR_REGS:2620474 LINK_OR_CTR_REGS:2620474 SPECIAL_REGS:2620474 SPEC_OR_GEN_REGS:2620474 NON_FLOAT_REGS:2620474 MEM:2620948 Register 279 pref GENERAL_REGS, else NON_FLOAT_REGS Register 279 used 2 times across 70 insns; set 1 time; user var; crosses 5 calls; 4 bytes; pref GENERAL_REGS, else NON_FLOAT_REGS. But then reload chooses an FPR. It places a float value (reg:DF) in a GPR (r0) and places an integer subreg in a FPR (f0). Reloads for insn # 167 Reload 0: FLOAT_REGS, RELOAD_FOR_OUTPUT (opnum = 0) reload_out_reg: (subreg:DI (reg/v:SI 279 [ clock_start ]) 0) Reload 1: reload_in (DF) = (reg:DF 0 r0 [294]) reload_out (DI) = (subreg:DI (reg/v:SI 279 [ clock_start ]) 0) FLOAT_REGS, RELOAD_OTHER (opnum = 1) reload_in_reg: (reg:DF 0 r0 [294]) reload_out_reg: (subreg:DI (reg/v:SI 279 [ clock_start ]) 0) reload_reg_rtx: (reg:DF 32 f0)
This problem appears to be a confluence of issues between fix_truncdfdi2 and limitations of reload, exacerbated by Apple's LL64 mode. The problematic code is typedef unsigned long clock_t; clock_t clock(void); clock_t clock_start; clock_start = (double)clock(); On an LP64 system, clock_start would be DImode, but in the LL64 environment the variable is SImode while GCC wants to perform computations in DImode. (Note that clock() also returns an SImode clock_t, but the code inserts the unnecessary (double) conversion.) It looks like this problem will occur whenever a DImode value is materialized in an FPR (e.g., PPC64 fctid instruction is enabled) and one only accesses it as an SImode value: DFmode d; SImode x = (SImode) (DImode) d; One solution is to convince the compiler to collapse out the DImode and directly convert the DFmode to SImode using fix_truncdfsi2 instead of fix_truncdfdi2. With the intervening DImode remaining, GCC chooses fix_truncdfdi2, which produces a DImode result in an FPR. The only use of the DImode result is the SImode user variable, so GCC uses the SImode pseudo wrapped in a DImode SUBREG directly as the result. When reload needs to spill the value to get it into a GPR, it dies because it cannot spill the SImode value to its home location. One could view this problem as reload not obeying CANNOT_CHANGE_MODE_CLASS because it should not try to spill just the SImode SUBREG instead of creating a DImode temporary on the stack. It needs to spill the entire result, not just the live portion. This also could be solved by the fix_truncdfdi2 pattern explicitly moving the result to a GPR instead of relying on reload to correctly implement that maneuver. There are more and less complicated solutions, but I am not sure which one is correct.
GCC is using fix_truncdfdi2 because no unsfix_truncdfsi2 pattern is defined and clock_t is an unsigned long. A short-term workaround is to implement an unsfix_truncdfsi2 pattern for PPC64 that uses fctid and explicitly moves the value to a GPR. However, reload should not be tripping over itself when faced with this.
David Edelsohn wrote: >One could view this problem as reload not obeying CANNOT_CHANGE_MODE_CLASS >because it should not try to spill just the SImode SUBREG instead of creating a >DImode temporary on the stack. It needs to spill the entire result, not just >the live portion. Exactly, and that's what reload tries to do, at least initially. Note that push_reload checks CANNOT_CHANGE_MODE_CLASS, and correctly decides it needs to reload in DImode. That's why in the debug_reload output you see DImode as the mode to perform the reload in. Now, I haven't actually debugged this live; but from looking at the source code what I assume is going wrong is the following: Right at the end of reload, in gen_reload, there's this hunk of code: /* If IN is a paradoxical SUBREG, remove it and try to put the opposite SUBREG on OUT. Likewise for a paradoxical SUBREG on OUT. */ [snip] else if (GET_CODE (out) == SUBREG && (GET_MODE_SIZE (GET_MODE (out)) > GET_MODE_SIZE (GET_MODE (SUBREG_REG (out)))) && (tem = gen_lowpart_common (GET_MODE (SUBREG_REG (out)), in)) != 0) out = SUBREG_REG (out), in = tem; This is most likely where the DImode reload is replaced by the SImode one. Now, this should not happen if the resulting SUBREG on the input side gets invalid, as it does in the current situation. Here, gen_lowpart_common calls simplify_gen_subreg which calls simplify_subreg. This last function has code to verify using CANNOT_CHANGE_MODE_CLASS that no invalid mode change is being encoded into a SUBREG. However, in *simplify_gen_subreg*, that decision is subsequently simply ignored: newx = simplify_subreg (outermode, op, innermode, byte); if (newx) return newx; if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode) return NULL_RTX; return gen_rtx_SUBREG (outermode, op, byte); Note how the SUBREG is generated anyway. I'm not quite sure what the point of this code is; but I'd suggest somewhere here is the place that needs to get fixed ...
Does this patch help? Index: gcc/simplify-rtx.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v retrieving revision 1.206 diff -c -p -r1.206 simplify-rtx.c *** gcc/simplify-rtx.c 18 Oct 2004 18:46:06 -0000 1.206 --- gcc/simplify-rtx.c 25 Oct 2004 16:17:42 -0000 *************** simplify_gen_subreg (enum machine_mode o *** 3789,3795 **** if (newx) return newx; ! if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode) return NULL_RTX; return gen_rtx_SUBREG (outermode, op, byte); --- 3789,3796 ---- if (newx) return newx; ! if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode ! || (REG_P (op) && REGNO (op) < FIRST_PSEUDO_REGISTER)) return NULL_RTX; return gen_rtx_SUBREG (outermode, op, byte);
Subject: Re: ICE cause by reload >>>>> uweigand at gcc dot gnu dot org writes: Ulrich> Does this patch help? It changes the error, but I'm not sure if it helps. pr15286.c: In function 'GMRESSolver': pr15286.c:72: error: unrecognizable insn: (insn 512 183 397 0 (set (subreg:DI (mem:SI (plus:SI (reg/f:SI 1 r1) (const_int 204 [0xcc])) [15 clock_start+0 S4 A8]) 0) (reg:DI 32 f0)) -1 (nil) (nil)) pr15286.c:72: internal compiler error: in extract_insn, at recog.c:2034 It is storing to (SUBREG:DI (mem:SI)) instead of (mem:DI).
Well, as I understand it, SUBREG (MEM) is *supposed* to be generated for the case of SUBREG (pseudo) where the pseudo gets a stack slot. The scan_paradoxical_subregs mechanism was supposed to have allocated a large enough stack slot, and cleanup_subreg_operands is then supposed to convert the SUBREG (MEM) to a MEM with its address offset accordingly. However, cleanup_subreg_operands first recognizes the insn, and this is where it presumably fails; due to this weird piece of code in general_operand: #ifdef INSN_SCHEDULING /* On machines that have insn scheduling, we want all memory reference to be explicit, so outlaw paradoxical SUBREGs. */ if (MEM_P (sub) && GET_MODE_SIZE (mode) > GET_MODE_SIZE (GET_MODE (sub))) return 0; #endif (What the ... has INSN_SCHEDULING to do with what insns we want to recognize?!) Does it work if you add an && !reload_completed to that if?
Subject: Re: ICE cause by reload With both patches, the testcase works. This probably is a correct fix for reload. The code is a little messy and is cleaned up by implementing fixuns_truncdfsi, which also fixes the testcase. David
Correction. While the reload changes fix the crash, it looks like there still is a bug because the resulting code does not access the correct SUBREG. It does not access the LSW.
I applied the last two patch, but it didn;t help: % mygccf -O2 -mcpu=G5 -c loader_obj.i loader_obj.c: In function 'load_obj': loader_obj.c:92: error: unrecognizable insn: (insn 1395 601 1396 50 (set (subreg:DI (mem:SI (plus:SI (reg/f:SI 1 r1) (const_int 716 [0x2cc])) [0 allocednf+0 S4 A8]) 0) (reg:DI 32 f0)) -1 (nil) (nil)) loader_obj.c:92: internal compiler error: in extract_insn, at recog.c:2034 Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://developer.apple.com/bugreporter> for instructions. Just to be clear, this is the patch I applied. Index: simplify-rtx.c =============================================================== ==== RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v retrieving revision 1.107.2.31.2.9 diff -c -p -r1.107.2.31.2.9 simplify-rtx.c *** simplify-rtx.c 16 Oct 2004 00:06:42 -0000 1.107.2.31.2.9 --- simplify-rtx.c 25 Oct 2004 18:38:20 -0000 *************** simplify_gen_subreg (enum machine_mode o *** 3800,3806 **** if (newx) return newx; ! if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode) return NULL_RTX; return gen_rtx_SUBREG (outermode, op, byte); --- 3800,3808 ---- if (newx) return newx; ! if ((GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode ! || (REG_P (op) && REGNO (op) < FIRST_PSEUDO_REGISTER)) ! && !reload_completed) return NULL_RTX; return gen_rtx_SUBREG (outermode, op, byte);
Subject: Re: ICE cause by reload > I applied the last two patch, but it didn;t help: The simplify-rtx.c patch is not *two* patches.
You referred to them as 'both patches' in comment #21.
The second patch has to due with "However, cleanup_subreg_operands first recognizes the insn, and this is where it presumably fails; due to this weird piece of code in general_operand:"
David Edelsohn wrote: >Correction. While the reload changes fix the crash, it looks like there still >is a bug because the resulting code does not access the correct SUBREG. It >does not access the LSW. That would be because alter_subreg doesn't handle paradoxical subregs on big-endian machines correctly. Something like the following (untested) should fix that one: Index: gcc/final.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/final.c,v retrieving revision 1.339 diff -c -p -r1.339 final.c *** gcc/final.c 22 Oct 2004 17:05:05 -0000 1.339 --- gcc/final.c 25 Oct 2004 20:35:41 -0000 *************** alter_subreg (rtx *xp) *** 2607,2613 **** /* simplify_subreg does not remove subreg from volatile references. We are required to. */ if (MEM_P (y)) ! *xp = adjust_address (y, GET_MODE (x), SUBREG_BYTE (x)); else { rtx new = simplify_subreg (GET_MODE (x), y, GET_MODE (y), --- 2607,2628 ---- /* simplify_subreg does not remove subreg from volatile references. We are required to. */ if (MEM_P (y)) ! { ! int offset = SUBREG_BYTE (x); ! ! /* For paradoxical subregs on big-endian machines, SUBREG_BYTE ! contains 0 instead of the proper offset. See simplify_subreg. */ ! if (offset == 0 && GET_MODE_SIZE (y) < GET_MODE_SIZE (x)) ! { ! int difference = GET_MODE_SIZE (y) - GET_MODE_SIZE (x); ! if (WORDS_BIG_ENDIAN) ! offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD; ! if (BYTES_BIG_ENDIAN) ! offset += difference % UNITS_PER_WORD; ! } ! ! *xp = adjust_address (y, GET_MODE (x), offset); ! } else { rtx new = simplify_subreg (GET_MODE (x), y, GET_MODE (y),
(My last message got messed up somehow, I'm trying again.) David Edelsohn wrote: >Correction. While the reload changes fix the crash, it looks like there still >is a bug because the resulting code does not access the correct SUBREG. It >does not access the LSW. That would be because alter_subreg doesn't handle paradoxical subregs on big-endian machines correctly. Something like the following (untested) should fix that one: Index: gcc/final.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/final.c,v retrieving revision 1.339 diff -c -p -r1.339 final.c *** gcc/final.c 22 Oct 2004 17:05:05 -0000 1.339 --- gcc/final.c 25 Oct 2004 20:35:41 -0000 *************** alter_subreg (rtx *xp) *** 2607,2613 **** /* simplify_subreg does not remove subreg from volatile references. We are required to. */ if (MEM_P (y)) ! *xp = adjust_address (y, GET_MODE (x), SUBREG_BYTE (x)); else { rtx new = simplify_subreg (GET_MODE (x), y, GET_MODE (y), --- 2607,2628 ---- /* simplify_subreg does not remove subreg from volatile references. We are required to. */ if (MEM_P (y)) ! { ! int offset = SUBREG_BYTE (x); ! ! /* For paradoxical subregs on big-endian machines, SUBREG_BYTE ! contains 0 instead of the proper offset. See simplify_subreg. */ ! if (offset == 0 && GET_MODE_SIZE (y) < GET_MODE_SIZE (x)) ! { ! int difference = GET_MODE_SIZE (y) - GET_MODE_SIZE (x); ! if (WORDS_BIG_ENDIAN) ! offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD; ! if (BYTES_BIG_ENDIAN) ! offset += difference % UNITS_PER_WORD; ! } ! ! *xp = adjust_address (y, GET_MODE (x), offset); ! } else { rtx new = simplify_subreg (GET_MODE (x), y, GET_MODE (y),
You need to replace GET_MODE_SIZE (x) with GET_MODE_SIZE (GET_MODE (x)), etc. for a clean compile. But as I mentioned in last comment, I still get the ICE with or without this patch (along with the previous patch) in all the test cases that I tried.
well, as pointed out by David you do need in addition the change I described in comment #20. If you prefer a patch: Index: gcc/recog.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/recog.c,v retrieving revision 1.211 diff -c -p -r1.211 recog.c *** gcc/recog.c 28 Sep 2004 07:59:48 -0000 1.211 --- gcc/recog.c 25 Oct 2004 21:02:03 -0000 *************** general_operand (rtx op, enum machine_mo *** 937,943 **** #ifdef INSN_SCHEDULING /* On machines that have insn scheduling, we want all memory reference to be explicit, so outlaw paradoxical SUBREGs. */ ! if (MEM_P (sub) && GET_MODE_SIZE (mode) > GET_MODE_SIZE (GET_MODE (sub))) return 0; #endif --- 937,943 ---- #ifdef INSN_SCHEDULING /* On machines that have insn scheduling, we want all memory reference to be explicit, so outlaw paradoxical SUBREGs. */ ! if (!reload_completed && MEM_P (sub) && GET_MODE_SIZE (mode) > GET_MODE_SIZE (GET_MODE (sub))) return 0; #endif
Subject: Re: ICE cause by reload With the earlier *two* patches from Ulrich, the compiler no longer ICE. The latest patch to alter_subreg() (with the missing GET_MODEs) does not fix the subreg offset problem for me. bl L_clock$stub std r3,168(r1) lfd f0,168(r1) fcfid f0,f0 stfd f0,224(r1) fctidz f0,f0 stfd f0,204(r1) <--- ... bl L_clock$stub lwz r0,204(r1) <--- std r3,168(r1) std r0,176(r1) lfd f13,168(r1) lfd f0,176(r1) fcfid f12,f0 fcfid f0,f13 fsub f0,f0,f12 fctiwz f13,f0 stfd f13,184(r1) lwz r3,188(r1) The code continues to access the wrong word.
By mistake, I applied the test for !reload_completed to you earlier patch (which was worng). In any case, after correcting the patch and with your latest patch, all my test cases passed. Now, I need to do a complete bootstrap with -mcpu=G5 on apple-ppc-darwin and will let you know how it goes. Thanks.
Created attachment 7410 [details] Reload fix for paradoxical subregs David, could you try again with this patch? I've now built a powerpc-apple-darwin7.0.0 cross-cc1 with this fix, and I'm getting: bl _clock ; 180 *call_value_nonlocal_sysv/1 [length = 4] rldicl r3,r3,0,32 ; 437 *rs6000.md:379/2 [length = 4] std r3,168(r1) ; 438 *movdi_internal64/3 [length = 4] lfd f0,168(r1) ; 439 *movdi_internal64/9 [length = 4] fcfid f0,f0 ; 440 floatdidf2 [length = 4] stfd f0,224(r1) ; 442 *movdf_hardfloat64/6 [length = 4] fctidz f0,f0 ; 184 fix_truncdfdi2 [length = 4] stfd f0,200(r1) ; 446 *movdi_internal64/10 [length = 4] ^^^ ... bl _clock ; 362 *call_value_nonlocal_sysv/1 [length = 4] rldicl r3,r3,0,32 ; 426 *rs6000.md:379/2 [length = 4] std r3,168(r1) ; 427 *movdi_internal64/3 [length = 4] lfd f0,168(r1) ; 428 *movdi_internal64/9 [length = 4] fcfid f13,f0 ; 429 floatdidf2 [length = 4] lwz r0,204(r1) ; 430 *rs6000.md:379/1 [length = 4] ^^^ std r0,176(r1) ; 431 *movdi_internal64/3 [length = 4] lfd f0,176(r1) ; 432 *movdi_internal64/9 [length = 4]
Subject: Re: ICE cause by reload Try again with what patch? I have all patches applied and I consistently get the output I emailed earlier with a native Mac OS X compiler. David
I tried the last patch and for the following statement built with -O2 -mcpu=G5 (aaple's mixed mode) I get the following instruction sequence. It looks OK to me. But David's case might be different than what I am looking at: clock_start=(((double)clock())/((double)(100))); bl L_clock$stub rldicl r3,r3,0,32 lha r0,232(r29) addis r2,r31,ha16(LC40-"L00000000008$pb") std r3,456(r1) lfd f12,lo16(LC40-"L00000000008$pb")(r2) cmpwi cr7,r0,0 nop lfd f13,456(r1) fcfid f0,f13 fdiv f0,f0,f12 fctidz f0,f0 stfd f0,528(r1) nop nop nop ld r19,528(r1) ble cr7,L147 ... ti+=((((double)clock())/((double)(100)))-clock_start); L186: bl L_clock$stub rldicl r3,r3,0,32 rldicl r2,r19,0,32 std r3,464(r1) std r2,472(r1) addis r2,r31,ha16(LC40-"L00000000008$pb") lfd f0,464(r1) lfd f13,472(r1) lwz r0,816(r30) cmpwi cr7,r0,0 fcfid f12,f0 lfd f0,lo16(LC40-"L00000000008$pb")(r2) fcfid f11,f13 addis r2,r31,ha16(LC39-"L00000000008$pb") fdiv f12,f12,f0 lfd f0,lo16(LC39-"L00000000008$pb")(r2) fsub f12,f12,f11 ...
Subject: Re: ICE cause by reload There is a reduced testcase attached to the Bugzilla PR. Please do not confuse discussion with other examples.
Subject: Re: ICE cause by reload I found a typo in my sourcebase introduced when applying one of the patches. Fixing that typo (matching the complete patch attached to the PR) produces the correct offset. I think the patch is the correct change to get reload to behave correctly.
I tested the patch on apple-ppc-darwin; bootstrapped and dejagnu tested (with and without -mcpu=G5). There were no regressions. This is an important bug for us. We have had 4 separate reporting of this bug. It also happens in SPEC2004. - Thanks.
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-10/msg02342.html>.
Subject: Bug 15286 CVSROOT: /cvs/gcc Module name: gcc Changes by: uweigand@gcc.gnu.org 2004-10-28 12:47:29 Modified files: gcc : ChangeLog final.c recog.c simplify-rtx.c Log message: PR target/15286 * final.c (alter_subreg): Compute correct offset to use with paradoxical SUBREGs of memory operands. * recog.c (general_operand): Allow paradoxical SUBREGs of memory operands after reload. * simplify-rtx.c (simplify_gen_subreg): Fail if simplify_subreg has failed when passed a hard register. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6070&r2=2.6071 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/final.c.diff?cvsroot=gcc&r1=1.340&r2=1.341 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/recog.c.diff?cvsroot=gcc&r1=1.211&r2=1.212 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/simplify-rtx.c.diff?cvsroot=gcc&r1=1.206&r2=1.207
Fixed.