Bug 15286

Summary: ICE cause by reload
Product: gcc Reporter: Fariborz Jahanian <fjahanian>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: dje, gcc-bugs, pinskia, uweigand
Priority: P2 Keywords: ice-on-valid-code, patch
Version: 4.0.0   
Target Milestone: 4.0.0   
Host: powerpc-apple-darwin7.0.0 Target: powerpc-apple-darwin7.0.0
Build: powerpc-apple-darwin7.0.0 Known to work:
Known to fail: 3.3.3 3.4.0 4.0.0 Last reconfirmed: 2004-05-11 20:32:44
Attachments: Test case to reproduce this bug
new semi-reduced
A new test case for this bug.
reduced testcase for 4.0.0 (20041021)
Reload fix for paradoxical subregs

Description Fariborz Jahanian 2004-05-04 20:49:51 UTC
A client test case ICEs in reload when compiled with -mcpu=G5 and -O1 (and above)
for apple-ppc-darwin.
This happens with mainline and tree-ssa gcc. But it may or may not fail with some
compilers (behavior is quite random and ICE happens if there is no hard register available
during reload, forcing store into temporary stack).

Code which causes this problem is (of course this sample does not ICE):

typedef unsigned long clock_t;

clock_t clock();

void foo()
{
        clock_t clock_start;

        clock_start=(((double)clock())/((double)(100)));
}

Problem is that gcc3.5 generates this RTL pattern for apple-ppc-dawin:

(insn 30 29 32 2 (set (subreg:DI (reg/v:SI 142 [ clock_start ]) 0)
        (fix:DI (reg:DF 158))) 216 {fix_truncdfdi2} (insn_list 28 (nil))
    (expr_list:REG_DEAD (reg:DF 158)
        (nil)))

Reload initially assigns hard register f0 to pseudo register 158, thus the pattern:

(insn 30 29 32 2 (set (subreg:DI (reg/v:SI 142 [ clock_start ]) 0)
        (fix:DI (reg:DF 32 f0 [158]))) 216 {fix_truncdfdi2} (insn_list 28 (nil))
    (expr_list:REG_DEAD (reg:DF 32 f0 [158])
        (nil)))

When there is no hard register available for  pseudo reg 142, reload will allocate
memory on stack to do a store/load operation. But to do so, reload generates the
following pattern:

(subreg:SI (reg:DI 32 f0) 4)

This is wrong. There is no movsi of an FPR. gcc then ICEs on:

error: unrecognizable insn:
(insn 247 30 32 2 (set (mem:SI (plus:SI (reg/f:SI 1 r1)
                (const_int 492 [0x1ec])) [0 clock_start+0 S4 A8])
        (subreg:SI (reg:DI 32 f0) 4)) -1 (nil)
    (nil))
Comment 1 Andrew Pinski 2004-05-04 21:35:25 UTC
Of course the code does not even give a sub register:
(insn 13 12 15 0 pr15286.c:9 (set (reg:DI 126)
        (fix:DI (reg:DF 124))) 216 {fix_truncdfdi2} (insn_list 12 (nil))
    (expr_list:REG_DEAD (reg:DF 124)
        (nil)))
I cannot think of why the subregister is comming up in the first place.
But then again it could be this Apple LOCAL patch which is causing it:
-      /* APPLE LOCAL assign_stack_local_with_alias is used instead of 
-        assign_stack_temp to get better scheduling, at the cost of some 
-        stack space. */
-      rtx mem = assign_stack_local_with_alias (DImode, GET_MODE_SIZE (DImode),
-                                           GET_MODE_ALIGNMENT (DImode));
+      rtx mem = assign_stack_temp (DImode, GET_MODE_SIZE (DImode), 0);

But then again we cannot test with a testcase which ICEs.  I cannot think of any reason why there is a 
sub register there at all.  I think you need to figure out where the sub register is coming from and then 
reduce the testcase around that.
Comment 2 Fariborz Jahanian 2004-05-04 21:44:15 UTC
This happens with FSF mainline and tree-ssa. There is no APPLE LOCAL involved.
Problem happens in reload phase when it decides that there are no hard register available. 
I may be able to attach original user code if I get the permission. 
Comment 3 Andrew Pinski 2004-05-05 15:25:54 UTC
I wonder if this is related at all to PR 15289.
Comment 4 Fariborz Jahanian 2004-05-05 17:06:13 UTC
Well, I tried lots of things. I have a 'partial' solution for this problem. Bad pattern is produced
in routine gen_reload for the following arguments:

in: (reg:DI 32 f0)

out: (subreg:DI (reg/v:SI 142 [ clock_start ]) 0)

resulting in the generation of bad pattern:

(insn 247 30 32 2 (set (reg/v:SI 142 [ clock_start ])
        (subreg:SI (reg:DI 32 f0) 4)) -1 (nil)
    (nil))


I fix it by going to memory to move f0 to subreg:DI (reg/v:SI 142 [ clock_start ]) instead:

(insn 248 0 249 (set (mem:DI (plus:SI (reg/f:SI 1 r1)
                (const_int 480 [0x1e0])) [0 S8 A8])
        (reg:DI 32 f0)) -1 (nil)
    (nil))

(insn 249 248 0 (set (subreg:DI (reg/v:SI 142 [ clock_start ]) 0)
        (mem:DI (plus:SI (reg/f:SI 1 r1)
                (const_int 480 [0x1e0])) [0 S8 A8])) -1 (nil)
    (nil))

But there is still a problem. I get an ICE later on because I increased the stack size for the
temporary. I also noticed that a similar code is there in the same routine to do a similar
thing. How can I create a temporary such that  later assertion on stack size is not
violated? If this cannot be done, then how come, there is code further down in the same
routine which does exactly that?



Here is the diff for what it is worth (ignore that checks in the if-condition. It can be
sanitized later on).

Index: reload1.c
============================================================
=======
RCS file: /cvs/gcc/gcc/gcc/reload1.c,v
retrieving revision 1.346.2.33
diff -c -p -r1.346.2.33 reload1.c
*** reload1.c   5 Mar 2004 23:48:16 -0000       1.346.2.33
--- reload1.c   5 May 2004 00:33:41 -0000
--- 7344,7350 ----

     Returns first insn emitted.  */

+
  rtx
  gen_reload (rtx out, rtx in, int opnum, enum reload_type type)
  {
*************** gen_reload (rtx out, rtx in, int opnum,
*** 7357,7362 ****
--- 7358,7374 ----
          > GET_MODE_SIZE (GET_MODE (SUBREG_REG (in))))
        && (tem = gen_lowpart_common (GET_MODE (SUBREG_REG (in)), out)) != 0)
      in = SUBREG_REG (in), out = tem;
+   else if (GET_CODE (out) == SUBREG
+          && GET_MODE (out) == DImode
+          && GET_CODE (in) == REG
+          && GET_MODE (in) == DImode
+          && FP_REGNO_P (REGNO (in)))
+   {
+       /* Get the memory to use and rewrite both registers to its mode.  */
+       rtx loc = get_secondary_mem (in, GET_MODE (out), opnum, type);
+       gen_reload (loc, in, opnum, type);
+       in = loc;
+   }
    else if (GET_CODE (out) == SUBREG
           && (GET_MODE_SIZE (GET_MODE (out))
               > GET_MODE_SIZE (GET_MODE (SUBREG_REG (out))))
Comment 5 Fariborz Jahanian 2004-05-11 19:24:14 UTC
Created attachment 6260 [details]
Test case to reproduce this bug
Comment 6 Andrew Pinski 2004-05-11 20:32:44 UTC
I think it is caused by the patch mentioned here: <http://gcc.gnu.org/ml/gcc-patches/2002-07/
msg00087.html>, even RTH warns about the approach: <http://gcc.gnu.org/ml/gcc-patches/2002
-07/msg00092.html> which seems to be the problem.  Reducing the test case is very randomly done.  
But confirmed always.
Comment 7 Andrew Pinski 2004-05-12 16:46:10 UTC
And here is the reduced source:
int clock(void);
extern int rep_err_count, rep_err_line[10], y[30];
extern const char *rep_err_file[10];
int daxpy (int mg,int a, const int y);
int dmatmul_minus (int mg,  int tl, const int x,
                   const int M, const int y);
int LinearResiduum (int level, int x, int b, int A);
int GMRESSolver (int level, int x, int b, int A,
                 int theMG, int *v)
{
  int it,i1;
  unsigned clock_start;
  clock_start=clock ()*1.0;
  for (it=0; it<level; it++)
    {
      for (i1=0; i1<=level; i1++)
        {
          if (daxpy (theMG,y[i1],v[i1])!= 0)
            {
              rep_err_line[1] = 0;
              rep_err_file[2] = "";
              rep_err_count = (rep_err_count+1)%5;
              return (1);
            }
        }
      if (dmatmul_minus (theMG,level,b,x,A)
          || LinearResiduum (level,x,b,A))
        {
          rep_err_count = (rep_err_count+1)%5;
          return (1);
        }
    }
  return clock ()*1.0 - clock_start;
}

Oh reverting part of the patch I mentioned did not work.
Comment 8 Andrew Pinski 2004-06-11 02:40:49 UTC
Created attachment 6512 [details]
new semi-reduced

Since the old reduced source no longer ICE, I figured I would reduce the source
some.  Attached is the new reduced source which does ICE on the mainline built
an hour ago.
Comment 9 Fariborz Jahanian 2004-10-15 00:38:08 UTC
Created attachment 7353 [details]
A new test case for this bug.

Compile this attached file with -O1 -mcpu=G5 with gcc-4.0 build for
apple-ppc-darwin:

% mygccm5o -c -O1 -mcpu=G5 l.c
loader_obj.c: In function 'read_obj':
loader_obj.c:392: error: unrecognizable insn:
(insn 1054 561 563 49 (set (mem:SI (plus:SI (reg/f:SI 1 r1)
		(const_int 668 [0x29c])) [0 allocednf+0 S4 A8])
	(subreg:SI (reg:DI 32 f0) 4)) -1 (nil)
    (nil))
loader_obj.c:392: internal compiler error: in extract_insn, at /recog.c:2034
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
Comment 10 Ulrich Weigand 2004-10-15 10:46:39 UTC
As I understand it, generation of subregs the hardware does not support
should be prevented by the CANNOT_CHANGE_MODE_CLASS mechanism.  Reload
would reload the full inner reg into a register that allows the desired
mode change instead.

Could you try to find out why this doesn't work?  Does reload fail to call
CANNOT_CHANGE_MODE_CLASS somewhere?  Or is the rs6000 definition of that
macro incorrect?
Comment 11 David Edelsohn 2004-10-18 01:50:34 UTC
Subject: Re:  ICE cause by reload 

	(subreg:SI (reg:DI)) normally isn't a problem, except when reg:DI
is assigned to an FPR.  If reg:DI was assigned to an FPR, CLASS probably
is NON_SPECIAL_REGS (encompassing both GPRs and FPRs), which would make
the result from CANNOT_CHANGE_MODE_CLASS correct.  I'm not sure what
problems would be created if one redefined CANNOT_CHANGE_MODE_CLASS to
return false for that super-class.

	The problem is that the pseudo was assigned to an FPR, which is
valid for the CLASS, but not for the instruction.  One way of looking at
it is that regclass should not have widened the CLASS to include a
sub-class for which CANNOT_CHANGE_MODE_CLASS is true.
Comment 12 Ulrich Weigand 2004-10-18 12:47:24 UTC
On s390, CANNOT_CHANGE_MODE_CLASS does indeed return false for such
superclasses:

#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)               \
  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)                   \
   ? reg_classes_intersect_p (FP_REGS, CLASS) : 0)

This appears to be working fine ...

As I understand, regclass does indeed avoid widening the classs to one
for which CANNOT_CHANGE_MODE_CLASS returns true; it does not check sub-
classes.
Comment 13 Andrew Pinski 2004-10-22 13:38:01 UTC
Created attachment 7397 [details]
reduced testcase for 4.0.0 (20041021)

Here is a new reduced testcase which I got down to 73 lines.
Comment 14 David Edelsohn 2004-10-24 02:59:50 UTC
pr15286.c:72: error: unrecognizable insn:
(insn 496 167 381 0 (set (mem:SI (plus:SI (reg/f:SI 1 r1)
                (const_int 204 [0xcc])) [15 clock_start+0 S4 A8])
        (subreg:SI (reg:DI 32 f0) 4)) -1 (nil)
    (nil))

insn 496 is generated by reload, but it is generated for insn 167

(insn:HI 167 427 381 0 (set (subreg:DI (reg/v:SI 279 [ clock_start ]) 0)
        (fix:DI (reg:DF 294))) 239 {fix_truncdfdi2} (insn_list:REG_DEP_TRUE 484
(nil))
    (expr_list:REG_DEAD (reg:DF 294)
        (nil)))

regclass appears to be choosing a correct register class

  Register 279 costs: BASE_REGS:2620000 GENERAL_REGS:2620000 LINK_REGS:2620474
CTR_REGS:2620474 LINK_OR_CTR_REGS:2620474 SPECIAL_REGS:2620474
SPEC_OR_GEN_REGS:2620474 NON_FLOAT_REGS:2620474 MEM:2620948
  Register 279 pref GENERAL_REGS, else NON_FLOAT_REGS
Register 279 used 2 times across 70 insns; set 1 time; user var; crosses 5
calls; 4 bytes; pref GENERAL_REGS, else NON_FLOAT_REGS.

But then reload chooses an FPR.  It places a float value (reg:DF) in a GPR (r0)
and places an integer subreg in a FPR (f0).

Reloads for insn # 167
Reload 0: FLOAT_REGS, RELOAD_FOR_OUTPUT (opnum = 0)
        reload_out_reg: (subreg:DI (reg/v:SI 279 [ clock_start ]) 0)
Reload 1: reload_in (DF) = (reg:DF 0 r0 [294])
        reload_out (DI) = (subreg:DI (reg/v:SI 279 [ clock_start ]) 0)
        FLOAT_REGS, RELOAD_OTHER (opnum = 1)
        reload_in_reg: (reg:DF 0 r0 [294])
        reload_out_reg: (subreg:DI (reg/v:SI 279 [ clock_start ]) 0)
        reload_reg_rtx: (reg:DF 32 f0)
Comment 15 David Edelsohn 2004-10-24 22:52:54 UTC
This problem appears to be a confluence of issues between fix_truncdfdi2 and
limitations of reload, exacerbated by Apple's LL64 mode.

The problematic code is

typedef unsigned long clock_t;
clock_t clock(void);
  clock_t clock_start;
  clock_start = (double)clock();

On an LP64 system, clock_start would be DImode, but in the LL64 environment the
variable is SImode while GCC wants to perform computations in DImode.  (Note
that clock() also returns an SImode clock_t, but the code inserts the
unnecessary (double) conversion.)

It looks like this problem will occur whenever a DImode value is materialized in
an FPR (e.g., PPC64 fctid instruction is enabled) and one only accesses it as an
SImode value:

DFmode d;
SImode x = (SImode) (DImode) d;

One solution is to convince the compiler to collapse out the DImode and directly
convert the DFmode to SImode using fix_truncdfsi2 instead of fix_truncdfdi2.

With the intervening DImode remaining, GCC chooses fix_truncdfdi2, which
produces a DImode result in an FPR.  The only use of the DImode result is the
SImode user variable, so GCC uses the SImode pseudo wrapped in a DImode SUBREG
directly as the result.  When reload needs to spill the value to get it into a
GPR, it dies because it cannot spill the SImode value to its home location.

One could view this problem as reload not obeying CANNOT_CHANGE_MODE_CLASS
because it should not try to spill just the SImode SUBREG instead of creating a
DImode temporary on the stack.  It needs to spill the entire result, not just
the live portion.

This also could be solved by the fix_truncdfdi2 pattern explicitly moving the
result to a GPR instead of relying on reload to correctly implement that maneuver.

There are more and less complicated solutions, but I am not sure which one is
correct.
Comment 16 David Edelsohn 2004-10-25 15:46:11 UTC
GCC is using fix_truncdfdi2 because no unsfix_truncdfsi2 pattern is defined and
clock_t is an unsigned long.  A short-term workaround is to implement an
unsfix_truncdfsi2 pattern for PPC64 that uses fctid and explicitly moves the
value to a GPR.

However, reload should not be tripping over itself when faced with this.
Comment 17 Ulrich Weigand 2004-10-25 15:54:52 UTC
David Edelsohn wrote:

>One could view this problem as reload not obeying CANNOT_CHANGE_MODE_CLASS
>because it should not try to spill just the SImode SUBREG instead of creating a
>DImode temporary on the stack.  It needs to spill the entire result, not just
>the live portion.

Exactly, and that's what reload tries to do, at least initially.  Note that
push_reload checks CANNOT_CHANGE_MODE_CLASS, and correctly decides it needs
to reload in DImode.  That's why in the debug_reload output you see DImode
as the mode to perform the reload in.

Now, I haven't actually debugged this live; but from looking at the source
code what I assume is going wrong is the following:

Right at the end of reload, in gen_reload, there's this hunk of code:

  /* If IN is a paradoxical SUBREG, remove it and try to put the
     opposite SUBREG on OUT.  Likewise for a paradoxical SUBREG on OUT.  */
[snip]
  else if (GET_CODE (out) == SUBREG
           && (GET_MODE_SIZE (GET_MODE (out))
               > GET_MODE_SIZE (GET_MODE (SUBREG_REG (out))))
           && (tem = gen_lowpart_common (GET_MODE (SUBREG_REG (out)), in)) != 0)
    out = SUBREG_REG (out), in = tem;

This is most likely where the DImode reload is replaced by the SImode one.

Now, this should not happen if the resulting SUBREG on the input side
gets invalid, as it does in the current situation.  

Here, gen_lowpart_common calls simplify_gen_subreg which calls
simplify_subreg.  This last function has code to verify using
CANNOT_CHANGE_MODE_CLASS that no invalid mode change is being encoded
into a SUBREG.  However, in *simplify_gen_subreg*, that decision is
subsequently simply ignored:

  newx = simplify_subreg (outermode, op, innermode, byte);
  if (newx)
    return newx;

  if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode)
    return NULL_RTX;

  return gen_rtx_SUBREG (outermode, op, byte);

Note how the SUBREG is generated anyway.  I'm not quite sure what the
point of this code is; but I'd suggest somewhere here is the place that
needs to get fixed ...
Comment 18 Ulrich Weigand 2004-10-25 16:20:26 UTC
Does this patch help?

Index: gcc/simplify-rtx.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v
retrieving revision 1.206
diff -c -p -r1.206 simplify-rtx.c
*** gcc/simplify-rtx.c  18 Oct 2004 18:46:06 -0000      1.206
--- gcc/simplify-rtx.c  25 Oct 2004 16:17:42 -0000
*************** simplify_gen_subreg (enum machine_mode o
*** 3789,3795 ****
    if (newx)
      return newx;

!   if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode)
      return NULL_RTX;

    return gen_rtx_SUBREG (outermode, op, byte);
--- 3789,3796 ----
    if (newx)
      return newx;

!   if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode
!       || (REG_P (op) && REGNO (op) < FIRST_PSEUDO_REGISTER))
      return NULL_RTX;

    return gen_rtx_SUBREG (outermode, op, byte);
Comment 19 David Edelsohn 2004-10-25 16:47:29 UTC
Subject: Re:  ICE cause by reload 

>>>>> uweigand at gcc dot gnu dot org writes:

Ulrich> Does this patch help?

	It changes the error, but I'm not sure if it helps.

pr15286.c: In function 'GMRESSolver':
pr15286.c:72: error: unrecognizable insn:
(insn 512 183 397 0 (set (subreg:DI (mem:SI (plus:SI (reg/f:SI 1 r1)
                    (const_int 204 [0xcc])) [15 clock_start+0 S4 A8]) 0)
        (reg:DI 32 f0)) -1 (nil)
    (nil))
pr15286.c:72: internal compiler error: in extract_insn, at recog.c:2034

It is storing to (SUBREG:DI (mem:SI)) instead of (mem:DI).

Comment 20 Ulrich Weigand 2004-10-25 17:17:00 UTC
Well, as I understand it, SUBREG (MEM) is *supposed* to be generated
for the case of SUBREG (pseudo) where the pseudo gets a stack slot.

The scan_paradoxical_subregs mechanism was supposed to have allocated
a large enough stack slot, and cleanup_subreg_operands is then supposed
to convert the SUBREG (MEM) to a MEM with its address offset accordingly.

However, cleanup_subreg_operands first recognizes the insn, and this is
where it presumably fails; due to this weird piece of code in general_operand:

#ifdef INSN_SCHEDULING
      /* On machines that have insn scheduling, we want all memory
         reference to be explicit, so outlaw paradoxical SUBREGs.  */
      if (MEM_P (sub)
          && GET_MODE_SIZE (mode) > GET_MODE_SIZE (GET_MODE (sub)))
        return 0;
#endif

(What the ... has INSN_SCHEDULING to do with what insns we want
to recognize?!)

Does it work if you add an && !reload_completed to that if?
Comment 21 David Edelsohn 2004-10-25 18:04:43 UTC
Subject: Re:  ICE cause by reload 

	With both patches, the testcase works.  This probably is a correct
fix for reload.

	The code is a little messy and is cleaned up by implementing
fixuns_truncdfsi, which also fixes the testcase.

David
Comment 22 David Edelsohn 2004-10-25 18:33:58 UTC
Correction.  While the reload changes fix the crash, it looks like there still 
is a bug because the resulting code does not access the correct SUBREG.  It 
does not access the LSW.
Comment 23 Fariborz Jahanian 2004-10-25 18:39:26 UTC
I applied the last two patch, but it didn;t help:

% mygccf -O2 -mcpu=G5 -c loader_obj.i
loader_obj.c: In function 'load_obj':
loader_obj.c:92: error: unrecognizable insn:
(insn 1395 601 1396 50 (set (subreg:DI (mem:SI (plus:SI (reg/f:SI 1 r1)
                    (const_int 716 [0x2cc])) [0 allocednf+0 S4 A8]) 0)
        (reg:DI 32 f0)) -1 (nil)
    (nil))
loader_obj.c:92: internal compiler error: in extract_insn, at recog.c:2034
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.

Just to be clear, this is the patch I applied.

Index: simplify-rtx.c
===============================================================
====
RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v
retrieving revision 1.107.2.31.2.9
diff -c -p -r1.107.2.31.2.9 simplify-rtx.c
*** simplify-rtx.c      16 Oct 2004 00:06:42 -0000      1.107.2.31.2.9
--- simplify-rtx.c      25 Oct 2004 18:38:20 -0000
*************** simplify_gen_subreg (enum machine_mode o
*** 3800,3806 ****
    if (newx)
      return newx;
  
!   if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode)
      return NULL_RTX;
  
    return gen_rtx_SUBREG (outermode, op, byte);
--- 3800,3808 ----
    if (newx)
      return newx;
  
!   if ((GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode
!        || (REG_P (op) && REGNO (op) < FIRST_PSEUDO_REGISTER))
!          && !reload_completed)
      return NULL_RTX;
  
    return gen_rtx_SUBREG (outermode, op, byte);
Comment 24 David Edelsohn 2004-10-25 19:07:22 UTC
Subject: Re:  ICE cause by reload 

> I applied the last two patch, but it didn;t help:

	The simplify-rtx.c patch is not *two* patches.
Comment 25 Fariborz Jahanian 2004-10-25 19:12:42 UTC
You referred to them as 'both patches' in comment #21.
Comment 26 Andrew Pinski 2004-10-25 19:14:04 UTC
The second patch has to due with "However, cleanup_subreg_operands first recognizes the insn, and 
this is
where it presumably fails; due to this weird piece of code in general_operand:"
Comment 27 Ulrich Weigand 2004-10-25 20:38:13 UTC
David Edelsohn wrote:  >Correction.  While the reload changes fix the crash, it looks like there still  >is a bug because the resulting code does not access the correct SUBREG.  It  >does not access the LSW.  That would be because alter_subreg doesn't handle paradoxical subregs on big-endian machines correctly.  Something like the following (untested) should fix that one:  Index: gcc/final.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/final.c,v retrieving revision 1.339 diff -c -p -r1.339 final.c *** gcc/final.c 22 Oct 2004 17:05:05 -0000      1.339 --- gcc/final.c 25 Oct 2004 20:35:41 -0000 *************** alter_subreg (rtx *xp) *** 2607,2613 ****     /* simplify_subreg does not remove subreg from volatile references.        We are required to.  */     if (MEM_P (y)) !     *xp = adjust_address (y, GET_MODE (x), SUBREG_BYTE (x));     else       {         rtx new = simplify_subreg (GET_MODE (x), y, GET_MODE (y), --- 2607,2628 ----     /* simplify_subreg does not remove subreg from volatile references.        We are required to.  */     if (MEM_P (y)) !     { !       int offset = SUBREG_BYTE (x); ! !       /* For paradoxical subregs on big-endian machines, SUBREG_BYTE !        contains 0 instead of the proper offset.  See simplify_subreg.  */ !       if (offset == 0 && GET_MODE_SIZE (y) < GET_MODE_SIZE (x)) !         { !           int difference = GET_MODE_SIZE (y) - GET_MODE_SIZE (x); !           if (WORDS_BIG_ENDIAN) !             offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD; !           if (BYTES_BIG_ENDIAN) !             offset += difference % UNITS_PER_WORD; !         } ! !       *xp = adjust_address (y, GET_MODE (x), offset); !     }     else       {         rtx new = simplify_subreg (GET_MODE (x), y, GET_MODE (y),   
Comment 28 Ulrich Weigand 2004-10-25 20:42:54 UTC
(My last message got messed up somehow, I'm trying again.)

David Edelsohn wrote: 
 
>Correction.  While the reload changes fix the crash, it looks like there still  
>is a bug because the resulting code does not access the correct SUBREG.  It  
>does not access the LSW. 
 
That would be because alter_subreg doesn't handle paradoxical subregs 
on big-endian machines correctly.  Something like the following (untested) 
should fix that one: 
 
Index: gcc/final.c 
=================================================================== 
RCS file: /cvs/gcc/gcc/gcc/final.c,v 
retrieving revision 1.339 
diff -c -p -r1.339 final.c 
*** gcc/final.c 22 Oct 2004 17:05:05 -0000      1.339 
--- gcc/final.c 25 Oct 2004 20:35:41 -0000 
*************** alter_subreg (rtx *xp) 
*** 2607,2613 **** 
    /* simplify_subreg does not remove subreg from volatile references. 
       We are required to.  */ 
    if (MEM_P (y)) 
!     *xp = adjust_address (y, GET_MODE (x), SUBREG_BYTE (x)); 
    else 
      { 
        rtx new = simplify_subreg (GET_MODE (x), y, GET_MODE (y), 
--- 2607,2628 ---- 
    /* simplify_subreg does not remove subreg from volatile references. 
       We are required to.  */ 
    if (MEM_P (y)) 
!     { 
!       int offset = SUBREG_BYTE (x); 
! 
!       /* For paradoxical subregs on big-endian machines, SUBREG_BYTE 
!        contains 0 instead of the proper offset.  See simplify_subreg.  */ 
!       if (offset == 0 && GET_MODE_SIZE (y) < GET_MODE_SIZE (x)) 
!         { 
!           int difference = GET_MODE_SIZE (y) - GET_MODE_SIZE (x); 
!           if (WORDS_BIG_ENDIAN) 
!             offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD; 
!           if (BYTES_BIG_ENDIAN) 
!             offset += difference % UNITS_PER_WORD; 
!         } 
! 
!       *xp = adjust_address (y, GET_MODE (x), offset); 
!     } 
    else 
      { 
        rtx new = simplify_subreg (GET_MODE (x), y, GET_MODE (y), 
 
 
Comment 29 Fariborz Jahanian 2004-10-25 20:58:40 UTC
You need to replace GET_MODE_SIZE (x) with GET_MODE_SIZE (GET_MODE (x)), etc. for a clean
compile. But as I mentioned in last comment, I still get the ICE with or without this patch (along
with the previous patch) in all the test cases that I tried.
Comment 30 Ulrich Weigand 2004-10-25 21:04:51 UTC
well, as pointed out by David you do need in addition the change I 
described in comment #20.

If you prefer a patch:

Index: gcc/recog.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/recog.c,v
retrieving revision 1.211
diff -c -p -r1.211 recog.c
*** gcc/recog.c 28 Sep 2004 07:59:48 -0000      1.211
--- gcc/recog.c 25 Oct 2004 21:02:03 -0000
*************** general_operand (rtx op, enum machine_mo
*** 937,943 ****
  #ifdef INSN_SCHEDULING
        /* On machines that have insn scheduling, we want all memory
         reference to be explicit, so outlaw paradoxical SUBREGs.  */
!       if (MEM_P (sub)
          && GET_MODE_SIZE (mode) > GET_MODE_SIZE (GET_MODE (sub)))
        return 0;
  #endif
--- 937,943 ----
  #ifdef INSN_SCHEDULING
        /* On machines that have insn scheduling, we want all memory
         reference to be explicit, so outlaw paradoxical SUBREGs.  */
!       if (!reload_completed && MEM_P (sub)
          && GET_MODE_SIZE (mode) > GET_MODE_SIZE (GET_MODE (sub)))
        return 0;
  #endif
Comment 31 David Edelsohn 2004-10-25 21:11:14 UTC
Subject: Re:  ICE cause by reload 

	With the earlier *two* patches from Ulrich, the compiler no longer
ICE. 

	The latest patch to alter_subreg() (with the missing GET_MODEs)
does not fix the subreg offset problem for me.

        bl L_clock$stub
        std r3,168(r1)
        lfd f0,168(r1)
        fcfid f0,f0
        stfd f0,224(r1)
        fctidz f0,f0
        stfd f0,204(r1)   <---
...
        bl L_clock$stub
        lwz r0,204(r1)    <---
        std r3,168(r1)
        std r0,176(r1)
        lfd f13,168(r1)
        lfd f0,176(r1)
        fcfid f12,f0
        fcfid f0,f13
        fsub f0,f0,f12
        fctiwz f13,f0
        stfd f13,184(r1)
        lwz r3,188(r1)

The code continues to access the wrong word.
Comment 32 Fariborz Jahanian 2004-10-25 21:14:27 UTC
By mistake, I applied the test for !reload_completed  to you earlier patch (which was worng). 
In any case,
after correcting the patch and with your latest patch, all my test cases passed. Now, I need
to do a complete bootstrap with -mcpu=G5 on apple-ppc-darwin and will let you know
how it goes. Thanks.
Comment 33 Ulrich Weigand 2004-10-25 22:00:15 UTC
Created attachment 7410 [details]
Reload fix for paradoxical subregs

David, could you try again with this patch?  I've now built a 
powerpc-apple-darwin7.0.0 cross-cc1 with this fix, and I'm getting:

	bl _clock	; 180	*call_value_nonlocal_sysv/1	[length = 4]
	rldicl r3,r3,0,32	; 437	*rs6000.md:379/2	[length = 4]
	std r3,168(r1)	; 438	*movdi_internal64/3	[length = 4]
	lfd f0,168(r1)	; 439	*movdi_internal64/9	[length = 4]
	fcfid f0,f0	; 440	floatdidf2	[length = 4]
	stfd f0,224(r1) ; 442	*movdf_hardfloat64/6	[length = 4]
	fctidz f0,f0	; 184	fix_truncdfdi2	[length = 4]
	stfd f0,200(r1) ; 446	*movdi_internal64/10	[length = 4]
		^^^
...
	bl _clock	; 362	*call_value_nonlocal_sysv/1	[length = 4]
	rldicl r3,r3,0,32	; 426	*rs6000.md:379/2	[length = 4]
	std r3,168(r1)	; 427	*movdi_internal64/3	[length = 4]
	lfd f0,168(r1)	; 428	*movdi_internal64/9	[length = 4]
	fcfid f13,f0	; 429	floatdidf2	[length = 4]
	lwz r0,204(r1)	; 430	*rs6000.md:379/1	[length = 4]
	       ^^^
	std r0,176(r1)	; 431	*movdi_internal64/3	[length = 4]
	lfd f0,176(r1)	; 432	*movdi_internal64/9	[length = 4]
Comment 34 David Edelsohn 2004-10-25 23:43:16 UTC
Subject: Re:  ICE cause by reload 

	Try again with what patch?  I have all patches applied and I
consistently get the output I emailed earlier with a native Mac OS X
compiler. 

David
Comment 35 Fariborz Jahanian 2004-10-25 23:58:34 UTC
I tried the last patch and for the following statement built with -O2 -mcpu=G5 (aaple's mixed mode)
I get the following instruction sequence. It looks OK to me. But David's case might be different
than what I am looking at:


clock_start=(((double)clock())/((double)(100)));


        bl L_clock$stub
        rldicl r3,r3,0,32
        lha r0,232(r29)
        addis r2,r31,ha16(LC40-"L00000000008$pb")
        std r3,456(r1)
        lfd f12,lo16(LC40-"L00000000008$pb")(r2)
        cmpwi cr7,r0,0
        nop
        lfd f13,456(r1)
        fcfid f0,f13
        fdiv f0,f0,f12
        fctidz f0,f0
        stfd f0,528(r1)
        nop
        nop
        nop
        ld r19,528(r1)
        ble cr7,L147
         ...

ti+=((((double)clock())/((double)(100)))-clock_start);

L186:
        bl L_clock$stub
        rldicl r3,r3,0,32
        rldicl r2,r19,0,32
        std r3,464(r1)
        std r2,472(r1)
        addis r2,r31,ha16(LC40-"L00000000008$pb")
        lfd f0,464(r1)
        lfd f13,472(r1)
        lwz r0,816(r30)
        cmpwi cr7,r0,0
        fcfid f12,f0
        lfd f0,lo16(LC40-"L00000000008$pb")(r2)
        fcfid f11,f13
        addis r2,r31,ha16(LC39-"L00000000008$pb")
        fdiv f12,f12,f0
        lfd f0,lo16(LC39-"L00000000008$pb")(r2)
        fsub f12,f12,f11
        ...

Comment 36 David Edelsohn 2004-10-26 00:13:06 UTC
Subject: Re:  ICE cause by reload 

	There is a reduced testcase attached to the Bugzilla PR.  Please
do not confuse discussion with other examples.

Comment 37 David Edelsohn 2004-10-26 04:28:03 UTC
Subject: Re:  ICE cause by reload 

	I found a typo in my sourcebase introduced when applying one of
the patches.  Fixing that typo (matching the complete patch attached to
the PR) produces the correct offset.

	I think the patch is the correct change to get reload to behave
correctly.
Comment 38 Fariborz Jahanian 2004-10-26 15:17:36 UTC
I tested the patch on apple-ppc-darwin; bootstrapped and dejagnu tested (with and without 
-mcpu=G5). There were no regressions. This is an important bug for us. We have had 4
separate reporting of this bug. It also happens in SPEC2004. 

- Thanks.
Comment 39 Andrew Pinski 2004-10-27 15:56:32 UTC
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-10/msg02342.html>.
Comment 40 GCC Commits 2004-10-28 12:47:42 UTC
Subject: Bug 15286

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	uweigand@gcc.gnu.org	2004-10-28 12:47:29

Modified files:
	gcc            : ChangeLog final.c recog.c simplify-rtx.c 

Log message:
	PR target/15286
	* final.c (alter_subreg): Compute correct offset to use with
	paradoxical SUBREGs of memory operands.
	* recog.c (general_operand): Allow paradoxical SUBREGs of
	memory operands after reload.
	* simplify-rtx.c (simplify_gen_subreg): Fail if simplify_subreg
	has failed when passed a hard register.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6070&r2=2.6071
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/final.c.diff?cvsroot=gcc&r1=1.340&r2=1.341
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/recog.c.diff?cvsroot=gcc&r1=1.211&r2=1.212
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/simplify-rtx.c.diff?cvsroot=gcc&r1=1.206&r2=1.207

Comment 41 Andrew Pinski 2004-10-28 13:44:00 UTC
Fixed.