This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Fix save and restore of PIC register on PA


> On Wed, Jan 22, 2003 at 06:57:28PM -0500, John David Anglin wrote:
> > However, when a call can throw or return to the nonlocal goto
> > handler, the basic block with the call is terminated after the
> > call, and there is no in_post_call_group_p group.  The pic restore
> > insn lies in a new basic block and is no longer scheduled with
> > the call.
> 
> I believe the correct solution is to not expose the pic restore
> before reload.  This should be done after reload in splitters
> or peep2 expanders.  See how Alpha does this.  I've been meaning
> to do the same thing for ia64, but havn't gotten around to it.

I have installed the following patch using the above approach to
correct the problem on the main and 3.3.

Tested on hppa2.0w-hp-hpux11.*, hppa64-hp-hpux11.*, and
hppa-unknown-linux-gnu, 3.3 and 3.4 with no regressions.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

2003-02-02  John David Anglin  <dave.anglin@nrc-cnrc.gc.ca>

	* pa-protos.h (attr_length_millicode_call): Remove second argument.
	(attr_length_indirect_call, attr_length_indirect_call,
	attr_length_save_restore_dltp): New prototypes.
	* pa.c (attr_length_millicode_call): Remove second argument.  Check
	INSN_ADDRESSES_SET_P in distance calculation.
	(output_millicode_call): Check INSN_ADDRESSES_SET_P before using
	INSN_ADDRESSES.
	(attr_length_call): Check INSN_ADDRESSES_SET_P in distance calculation.
	(output_call): Check INSN_ADDRESSES_SET_P before using INSN_ADDRESSES.
	Call attr_length_call directly.
	(attr_length_indirect_call, output_indirect_call,
	attr_length_save_restore_dltp): New functions.
	* pa.md (attr_length_millicode_call): Drop second argument from all
	patterns.
	(return_internal_pic): Delete.
	(return_external_pic): Remove use of PIC register and pic operand and
	flag checks.
	(epilogue): Use return_internal for both normal and pic code.
	(call, call_value): Emit new 32-bit pic patterns for symref and
	indirect calls.  Remove uses for arg pointer and pic register.
	(call_symref_pic, call_symref_pic_post_reload, call_reg_pic,
	call_reg_pic_post_reload, call_val_symref_pic,
	call_val_symref_pic_post_reload, call_val_reg_pic,
	call_val_reg_pic_post_reload): New pre and post reload insn patterns.
	Implement define_split and define_peephole2 patterns for pre reload
	patterns.
	(call_symref_64bit, call_internal_reg_64bit, call_value_symref_64bit,
	call_value_internal_reg_64bit): Shorten names.
	(all call patterns): Explicitly indicate registers used and clobbered.
	Use attr_length_indirect_call and attr_length_save_restore_dltp for
	attribute length calculation.  Move code generation for indirect calls
	to output_indirect_call.
	(sibcall, sibcall_value): Don't restore PIC register.
	(exception_receiver, builtin_setjmp_receiver): Add blockage after PIC
	register retore.

Index: config/pa/pa-protos.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa-protos.h,v
retrieving revision 1.21
diff -u -3 -p -r1.21 pa-protos.h
--- config/pa/pa-protos.h	28 Jan 2003 18:08:53 -0000	1.21
+++ config/pa/pa-protos.h	1 Feb 2003 20:24:22 -0000
@@ -51,6 +51,7 @@ extern const char *output_movb PARAMS ((
 extern const char *output_parallel_movb PARAMS ((rtx *, int));
 extern const char *output_parallel_addb PARAMS ((rtx *, int));
 extern const char *output_call PARAMS ((rtx, rtx, int));
+extern const char *output_indirect_call PARAMS ((rtx, rtx));
 extern const char *output_millicode_call PARAMS ((rtx, rtx));
 extern const char *output_mul_insn PARAMS ((int, rtx));
 extern const char *output_div_insn PARAMS ((rtx *, int, rtx));
@@ -104,8 +105,10 @@ extern int jump_in_call_delay PARAMS ((r
 extern enum reg_class secondary_reload_class PARAMS ((enum reg_class,
 						      enum machine_mode, rtx));
 extern int hppa_fpstore_bypass_p PARAMS ((rtx, rtx));
-extern int attr_length_millicode_call PARAMS ((rtx, int));
+extern int attr_length_millicode_call PARAMS ((rtx));
 extern int attr_length_call PARAMS ((rtx, int));
+extern int attr_length_indirect_call PARAMS ((rtx));
+extern int attr_length_save_restore_dltp PARAMS ((rtx));
 
 /* Declare functions defined in pa.c and used in templates.  */
 
Index: config/pa/pa.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.c,v
retrieving revision 1.196
diff -u -3 -p -r1.196 pa.c
--- config/pa/pa.c	1 Feb 2003 04:26:29 -0000	1.196
+++ config/pa/pa.c	1 Feb 2003 20:24:23 -0000
@@ -6279,37 +6279,42 @@ length_fp_args (insn)
   return length;
 }
 
-/* We include the delay slot in the returned length as it is better to
+/* Return the attribute length for the millicode call instruction INSN.
+   The length must match the code generated by output_millicode_call.
+   We include the delay slot in the returned length as it is better to
    over estimate the length than to under estimate it.  */
 
 int
-attr_length_millicode_call (insn, length)
+attr_length_millicode_call (insn)
      rtx insn;
-     int length;
 {
-  unsigned long distance = total_code_bytes + INSN_ADDRESSES (INSN_UID (insn));
+  unsigned long distance = -1;
 
-  if (distance < total_code_bytes)
-    distance = -1;
+  if (INSN_ADDRESSES_SET_P ())
+    {
+      distance = (total_code_bytes + insn_current_reference_address (insn));
+      if (distance < total_code_bytes)
+	distance = -1;
+    }
 
   if (TARGET_64BIT)
     {
       if (!TARGET_LONG_CALLS && distance < 7600000)
-	return length + 8;
+	return 8;
 
-      return length + 20;
+      return 20;
     }
   else if (TARGET_PORTABLE_RUNTIME)
-    return length + 24;
+    return 24;
   else
     {
       if (!TARGET_LONG_CALLS && distance < 240000)
-	return length + 8;
+	return 8;
 
       if (TARGET_LONG_ABS_CALL && !flag_pic)
-	return length + 12;
+	return 12;
 
-      return length + 24;
+      return 24;
     }
 }
 
@@ -6439,16 +6444,22 @@ output_millicode_call (insn, call_dest)
 
   /* See if the return address can be adjusted.  Use the containing
      sequence insn's address.  */
-  seq_insn = NEXT_INSN (PREV_INSN (XVECEXP (final_sequence, 0, 0)));
-  distance = (INSN_ADDRESSES (INSN_UID (JUMP_LABEL (NEXT_INSN (insn))))
-	      - INSN_ADDRESSES (INSN_UID (seq_insn)) - 8);
-
-  if (VAL_14_BITS_P (distance))
+  if (INSN_ADDRESSES_SET_P ())
     {
-      xoperands[1] = gen_label_rtx ();
-      output_asm_insn ("ldo %0-%1(%2),%2", xoperands);
-      (*targetm.asm_out.internal_label) (asm_out_file, "L",
-				 CODE_LABEL_NUMBER (xoperands[1]));
+      seq_insn = NEXT_INSN (PREV_INSN (XVECEXP (final_sequence, 0, 0)));
+      distance = (INSN_ADDRESSES (INSN_UID (JUMP_LABEL (NEXT_INSN (insn))))
+		  - INSN_ADDRESSES (INSN_UID (seq_insn)) - 8);
+
+      if (VAL_14_BITS_P (distance))
+	{
+	  xoperands[1] = gen_label_rtx ();
+	  output_asm_insn ("ldo %0-%1(%2),%2", xoperands);
+	  (*targetm.asm_out.internal_label) (asm_out_file, "L",
+					     CODE_LABEL_NUMBER (xoperands[1]));
+	}
+      else
+	/* ??? This branch may not reach its target.  */
+	output_asm_insn ("nop\n\tb,n %0", xoperands);
     }
   else
     /* ??? This branch may not reach its target.  */
@@ -6462,18 +6473,25 @@ output_millicode_call (insn, call_dest)
   return "";
 }
 
-/* We include the delay slot in the returned length as it is better to
-   over estimate the length than to under estimate it.  */
+/* Return the attribute length of the call instruction INSN.  The SIBCALL
+   flag indicates whether INSN is a regular call or a sibling call.  The
+   length must match the code generated by output_call.  We include the delay
+   slot in the returned length as it is better to over estimate the length
+   than to under estimate it.  */
 
 int
 attr_length_call (insn, sibcall)
      rtx insn;
      int sibcall;
 {
-  unsigned long distance = total_code_bytes + INSN_ADDRESSES (INSN_UID (insn));
+  unsigned long distance = -1;
 
-  if (distance < total_code_bytes)
-    distance = -1;
+  if (INSN_ADDRESSES_SET_P ())
+    {
+      distance = (total_code_bytes + insn_current_reference_address (insn));
+      if (distance < total_code_bytes)
+	distance = -1;
+    }
 
   if (TARGET_64BIT)
     {
@@ -6535,7 +6553,6 @@ output_call (insn, call_dest, sibcall)
 {
   int delay_insn_deleted = 0;
   int delay_slot_filled = 0;
-  int attr_length = get_attr_length (insn);
   int seq_length = dbr_sequence_length ();
   rtx xoperands[2];
 
@@ -6543,9 +6560,7 @@ output_call (insn, call_dest, sibcall)
 
   /* Handle the common case where we're sure that the branch will reach
      the beginning of the $CODE$ subspace.  */
-  if (!TARGET_LONG_CALLS
-      && ((seq_length == 0 && attr_length == 12)
-	  || (seq_length != 0 && attr_length == 8)))
+  if (!TARGET_LONG_CALLS && attr_length_call (insn, sibcall) == 8)
     {
       xoperands[1] = gen_rtx_REG (word_mode, sibcall ? 0 : 2);
       output_asm_insn ("{bl|b,l} %0,%1", xoperands);
@@ -6773,7 +6788,7 @@ output_call (insn, call_dest, sibcall)
   /* This call has an unconditional jump in its delay slot.  */
   xoperands[0] = XEXP (PATTERN (NEXT_INSN (insn)), 1);
 
-  if (!delay_slot_filled)
+  if (!delay_slot_filled && INSN_ADDRESSES_SET_P ())
     {
       /* See if the return address can be adjusted.  Use the containing
          sequence insn's address.  */
@@ -6802,6 +6817,117 @@ output_call (insn, call_dest, sibcall)
   NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
 
   return "";
+}
+
+/* Return the attribute length of the indirect call instruction INSN.
+   The length must match the code generated by output_indirect call.
+   The returned length includes the delay slot.  Currently, the delay
+   slot of an indirect call sequence is not exposed and it is used by
+   the sequence itself.  */
+
+int
+attr_length_indirect_call (insn)
+     rtx insn;
+{
+  unsigned long distance = -1;
+
+  if (INSN_ADDRESSES_SET_P ())
+    {
+      distance = (total_code_bytes + insn_current_reference_address (insn));
+      if (distance < total_code_bytes)
+	distance = -1;
+    }
+
+  if (TARGET_64BIT)
+    return 12;
+
+  if (TARGET_FAST_INDIRECT_CALLS
+      || (!TARGET_PORTABLE_RUNTIME
+	  && ((TARGET_PA_20 && distance < 7600000) || distance < 240000)))
+    return 8;
+
+  if (flag_pic)
+    return 24;
+
+  if (TARGET_PORTABLE_RUNTIME)
+    return 20;
+
+  /* Out of reach, can use ble.  */
+  return 12;
+}
+
+const char *
+output_indirect_call (insn, call_dest)
+     rtx insn;
+     rtx call_dest;
+{
+  rtx xoperands[1];
+
+  if (TARGET_64BIT)
+    {
+      xoperands[0] = call_dest;
+      output_asm_insn ("ldd 16(%0),%%r2", xoperands);
+      output_asm_insn ("bve,l (%%r2),%%r2\n\tldd 24(%0),%%r27", xoperands);
+      return "";
+    }
+
+  /* First the special case for kernels, level 0 systems, etc.  */
+  if (TARGET_FAST_INDIRECT_CALLS)
+    return "ble 0(%%sr4,%%r22)\n\tcopy %%r31,%%r2"; 
+
+  /* Now the normal case -- we can reach $$dyncall directly or
+     we're sure that we can get there via a long-branch stub. 
+
+     No need to check target flags as the length uniquely identifies
+     the remaining cases.  */
+  if (attr_length_indirect_call (insn) == 8)
+    return ".CALL\tARGW0=GR\n\t{bl|b,l} $$dyncall,%%r31\n\tcopy %%r31,%%r2";
+
+  /* Long millicode call, but we are not generating PIC or portable runtime
+     code.  */
+  if (attr_length_indirect_call (insn) == 12)
+    return ".CALL\tARGW0=GR\n\tldil L'$$dyncall,%%r2\n\tble R'$$dyncall(%%sr4,%%r2)\n\tcopy %%r31,%%r2";
+
+  /* Long millicode call for portable runtime.  */
+  if (attr_length_indirect_call (insn) == 20)
+    return "ldil L'$$dyncall,%%r31\n\tldo R'$$dyncall(%%r31),%%r31\n\tblr %%r0,%%r2\n\tbv,n %%r0(%%r31)\n\tnop";
+
+  /* We need a long PIC call to $$dyncall.  */
+  xoperands[0] = NULL_RTX;
+  output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
+  if (TARGET_SOM || !TARGET_GAS)
+    {
+      xoperands[0] = gen_label_rtx ();
+      output_asm_insn ("addil L'$$dyncall-%0,%%r1", xoperands);
+      (*targetm.asm_out.internal_label) (asm_out_file, "L",
+					 CODE_LABEL_NUMBER (xoperands[0]));
+      output_asm_insn ("ldo R'$$dyncall-%0(%%r1),%%r1", xoperands);
+    }
+  else
+    {
+      output_asm_insn ("addil L'$$dyncall-$PIC_pcrel$0+4,%%r1", xoperands);
+      output_asm_insn ("ldo R'$$dyncall-$PIC_pcrel$0+8(%%r1),%%r1",
+		       xoperands);
+    }
+  output_asm_insn ("blr %%r0,%%r2", xoperands);
+  output_asm_insn ("bv,n %%r0(%%r1)\n\tnop", xoperands);
+  return "";
+}
+
+/* Return the total length of the save and restore instructions needed for
+   the data linkage table pointer (i.e., the PIC register) across the call
+   instruction INSN.  No-return calls do not require a save and restore.
+   In addition, we may be able to avoid the save and restore for calls
+   within the same translation unit.  */
+
+int
+attr_length_save_restore_dltp (insn)
+     rtx insn;
+{
+  if (find_reg_note (insn, REG_NORETURN, NULL_RTX))
+    return 0;
+
+  return 8;
 }
 
 /* In HPUX 8.0's shared library scheme, special relocations are needed
Index: config/pa/pa.md
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.md,v
retrieving revision 1.118
diff -u -3 -p -r1.118 pa.md
--- config/pa/pa.md	18 Jan 2003 14:51:10 -0000	1.118
+++ config/pa/pa.md	1 Feb 2003 20:24:24 -0000
@@ -4086,7 +4086,7 @@
   "!TARGET_64BIT"
   "* return output_mul_insn (0, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_insn ""
   [(set (reg:SI 29) (mult:SI (reg:SI 26) (reg:SI 25)))
@@ -4097,7 +4097,7 @@
   "TARGET_64BIT"
   "* return output_mul_insn (0, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_expand "muldi3"
   [(set (match_operand:DI 0 "register_operand" "")
@@ -4188,7 +4188,7 @@
   "*
    return output_div_insn (operands, 0, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_insn ""
   [(set (reg:SI 29)
@@ -4202,7 +4202,7 @@
   "*
    return output_div_insn (operands, 0, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_expand "udivsi3"
   [(set (reg:SI 26) (match_operand:SI 1 "move_operand" ""))
@@ -4245,7 +4245,7 @@
   "*
    return output_div_insn (operands, 1, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_insn ""
   [(set (reg:SI 29)
@@ -4259,7 +4259,7 @@
   "*
    return output_div_insn (operands, 1, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_expand "modsi3"
   [(set (reg:SI 26) (match_operand:SI 1 "move_operand" ""))
@@ -4298,7 +4298,7 @@
   "*
   return output_mod_insn (0, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_insn ""
   [(set (reg:SI 29) (mod:SI (reg:SI 26) (reg:SI 25)))
@@ -4311,7 +4311,7 @@
   "*
   return output_mod_insn (0, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_expand "umodsi3"
   [(set (reg:SI 26) (match_operand:SI 1 "move_operand" ""))
@@ -4350,7 +4350,7 @@
   "*
   return output_mod_insn (1, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 (define_insn ""
   [(set (reg:SI 29) (umod:SI (reg:SI 26) (reg:SI 25)))
@@ -4363,7 +4363,7 @@
   "*
   return output_mod_insn (1, insn);"
   [(set_attr "type" "milli")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])
+   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn)"))])
 
 ;;- and instructions
 ;; We define DImode `and` so with DImode `not` we can get
@@ -5614,23 +5614,7 @@
   [(return)
    (use (reg:SI 2))
    (const_int 1)]
-  "! flag_pic"
-  "*
-{
-  if (TARGET_PA_20)
-    return \"bve%* (%%r2)\";
-  return \"bv%* %%r0(%%r2)\";
-}"
-  [(set_attr "type" "branch")
-   (set_attr "length" "4")])
-
-;; Use the PIC register to ensure it's restored after a
-;; call in PIC mode.
-(define_insn "return_internal_pic"
-  [(return)
-   (use (match_operand 0 "register_operand" "r"))
-   (use (reg:SI 2))]
-  "flag_pic && true_regnum (operands[0]) == PIC_OFFSET_TABLE_REGNUM"
+  ""
   "*
 {
   if (TARGET_PA_20)
@@ -5640,17 +5624,12 @@
   [(set_attr "type" "branch")
    (set_attr "length" "4")])
 
-;; Use the PIC register to ensure it's restored after a
-;; call in PIC mode.  This is used for eh returns which
-;; bypass the return stub.
+;; This is used for eh returns which bypass the return stub.
 (define_insn "return_external_pic"
   [(return)
-   (use (match_operand 0 "register_operand" "r"))
-   (use (reg:SI 2))
-   (clobber (reg:SI 1))]
-  "flag_pic
-   && current_function_calls_eh_return
-   && true_regnum (operands[0]) == PIC_OFFSET_TABLE_REGNUM"
+   (clobber (reg:SI 1))
+   (use (reg:SI 2))]
+  "flag_pic && current_function_calls_eh_return"
   "ldsid (%%sr0,%%r2),%%r1\;mtsp %%r1,%%sr0\;be%* 0(%%sr0,%%r2)"
   [(set_attr "type" "branch")
    (set_attr "length" "12")])
@@ -5683,20 +5662,15 @@
       rtx x;
 
       hppa_expand_epilogue ();
-      if (flag_pic)
-	{
-	  rtx pic = gen_rtx_REG (word_mode, PIC_OFFSET_TABLE_REGNUM);
 
-	  /* EH returns bypass the normal return stub.  Thus, we must do an
-	     interspace branch to return from functions that call eh_return.
-	     This is only a problem for returns from shared code.  */
-	  if (current_function_calls_eh_return)
-	    x = gen_return_external_pic (pic);
-	  else
-	    x = gen_return_internal_pic (pic);
-	}
+      /* EH returns bypass the normal return stub.  Thus, we must do an
+	 interspace branch to return from functions that call eh_return.
+	 This is only a problem for returns from shared code.  */
+      if (flag_pic && current_function_calls_eh_return)
+	x = gen_return_external_pic ();
       else
 	x = gen_return_internal ();
+
       emit_jump_insn (x);
     }
   DONE;
@@ -5901,8 +5875,8 @@
   ""
   "
 {
-  rtx op;
-  rtx call_insn;
+  rtx op, call_insn;
+  rtx nb = operands[1];
 
   if (TARGET_PORTABLE_RUNTIME)
     op = force_reg (SImode, XEXP (operands[0], 0));
@@ -5918,43 +5892,105 @@
      and calls through function pointers.  This is necessary as these two
      types of calls use different calling conventions, and CSE might try
      to change the named call into an indirect call in some cases (using
-     two patterns keeps CSE from performing this optimization).  */
-  if (GET_CODE (op) == SYMBOL_REF)
-    call_insn = emit_call_insn (gen_call_internal_symref (op, operands[1]));
-  else if (TARGET_64BIT)
+     two patterns keeps CSE from performing this optimization).
+     
+     We now use even more call patterns as there was a subtle bug in
+     attempting to restore the pic register after a call using a simple
+     move insn.  During reload, a instruction involving a pseudo register
+     with no explicit dependence on the PIC register can be converted
+     to an equivalent load from memory using the PIC register.  If we
+     emit a simple move to restore the PIC register in the initial rtl
+     generation, then it can potentially be repositioned during scheduling.
+     and an instruction that eventually uses the PIC register may end up
+     between the call and the PIC register restore.
+     
+     This only worked because there is a post call group of instructions
+     that are scheduled with the call.  These instructions are included
+     in the same basic block as the call.  However, calls can throw in
+     C++ code and a basic block has to terminate at the call if the call
+     can throw.  This results in the PIC register restore being scheduled
+     independently from the call.  So, we now hide the save and restore
+     of the PIC register in the call pattern until after reload.  Then,
+     we split the moves out.  A small side benefit is that we now don't
+     need to have a use of the PIC register in the return pattern and
+     the final save/restore operation is not needed.
+     
+     I elected to just clobber %r4 in the PIC patterns and use it instead
+     of trying to force hppa_pic_save_rtx () to a callee saved register.
+     This might have required a new register class and constraint.  It
+     was also simpler to just handle the restore from a register than a
+     generic pseudo.  */
+  if (TARGET_64BIT)
     {
-      rtx tmpreg = force_reg (word_mode, op);
-      call_insn = emit_call_insn (gen_call_internal_reg_64bit (tmpreg,
-							       operands[1]));
+      if (GET_CODE (op) == SYMBOL_REF)
+	call_insn = emit_call_insn (gen_call_symref_64bit (op, nb));
+      else
+	{
+	  op = force_reg (word_mode, op);
+	  call_insn = emit_call_insn (gen_call_reg_64bit (op, nb));
+	}
     }
   else
     {
-      rtx tmpreg = gen_rtx_REG (word_mode, 22);
-      emit_move_insn (tmpreg, force_reg (word_mode, op));
-      call_insn = emit_call_insn (gen_call_internal_reg (operands[1]));
-    }
-
-  if (TARGET_64BIT)
-    use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);
-
-  if (flag_pic)
-    {
-      use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
+      if (GET_CODE (op) == SYMBOL_REF)
+	{
+	  if (flag_pic)
+	    call_insn = emit_call_insn (gen_call_symref_pic (op, nb));
+	  else
+	    call_insn = emit_call_insn (gen_call_symref (op, nb));
+	}
+      else
+	{
+	  rtx tmpreg = gen_rtx_REG (word_mode, 22);
 
-      /* After each call we must restore the PIC register, even if it
-	 doesn't appear to be used.  */
-      emit_move_insn (pic_offset_table_rtx, hppa_pic_save_rtx ());
+	  emit_move_insn (tmpreg, force_reg (word_mode, op));
+	  if (flag_pic)
+	    call_insn = emit_call_insn (gen_call_reg_pic (nb));
+	  else
+	    call_insn = emit_call_insn (gen_call_reg (nb));
+	}
     }
+
   DONE;
 }")
 
-(define_insn "call_internal_symref"
+;; We use function calls to set the attribute length of calls and millicode
+;; calls.  This is necessary because of the large variety of call sequences.
+;; Implementing the calculation in rtl is difficult as well as ugly.  As
+;; we need the same calculation in several places, maintenance becomes a
+;; nightmare.
+;;
+;; However, this has a subtle impact on branch shortening.  When the
+;; expression used to set the length attribute of an instruction depends
+;; on a relative address (e.g., pc or a branch address), genattrtab
+;; notes that the insn's length is variable, and attempts to determine a
+;; worst-case default length and code to compute an insn's current length.
+
+;; The use of a function call hides the variable dependence of our calls
+;; and millicode calls.  The result is genattrtab doesn't treat the operation
+;; as variable and it only generates code for the default case using our
+;; function call.  Because of this, calls and millicode calls have a fixed
+;; length in the branch shortening pass, and some branches will use a longer
+;; code sequence than necessary.  However, the length of any given call
+;; will still reflect its final code location and it may be shorter than
+;; the initial length estimate.
+
+;; It's possible to trick genattrtab by adding an expression involving `pc'
+;; in the set.  However, when genattrtab hits a function call in its attempt
+;; to compute the default length, it marks the result as unknown and sets
+;; the default result to MAX_INT ;-(  One possible fix that would allow
+;; calls to participate in branch shortening would be to make the call to
+;; insn_default_length a target option.  Then, we could massage unknown
+;; results.  Another fix might be to change genattrtab so that it just does
+;; the call in the variable case as it already does for the fixed case.
+
+(define_insn "call_symref"
   [(call (mem:SI (match_operand 0 "call_operand_address" ""))
 	 (match_operand 1 "" "i"))
    (clobber (reg:SI 1))
    (clobber (reg:SI 2))
    (use (const_int 0))]
-  "! TARGET_PORTABLE_RUNTIME"
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT"
   "*
 {
   output_arg_descriptor (insn);
@@ -5963,102 +5999,339 @@
   [(set_attr "type" "call")
    (set (attr "length") (symbol_ref "attr_length_call (insn, 0)"))])
 
-(define_insn "call_internal_reg_64bit"
-  [(call (mem:SI (match_operand:DI 0 "register_operand" "r"))
+(define_insn "call_symref_pic"
+  [(call (mem:SI (match_operand 0 "call_operand_address" ""))
 	 (match_operand 1 "" "i"))
+   (clobber (reg:SI 1))
    (clobber (reg:SI 2))
-   (use (const_int 1))]
+   (clobber (reg:SI 4))
+   (use (reg:SI 19))
+   (use (const_int 0))]
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT"
+  "*
+{
+  output_arg_descriptor (insn);
+  return output_call (insn, operands[0], 0);
+}"
+  [(set_attr "type" "call")
+   (set (attr "length")
+	(plus (symbol_ref "attr_length_call (insn, 0)")
+	      (symbol_ref "attr_length_save_restore_dltp (insn)")))])
+
+;; Split out the PIC register save and restore after reload.  This is
+;; done if the function doesn't return.
+(define_split
+  [(parallel [(call (mem:SI (match_operand 0 "call_operand_address" ""))
+		    (match_operand 1 "" ""))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (clobber (reg:SI 4))
+	      (use (reg:SI 19))
+	      (use (const_int 0))])]
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT
+   && reload_completed
+   && !find_reg_note (insn, REG_NORETURN, NULL_RTX)"
+  [(set (reg:SI 4) (reg:SI 19))
+   (parallel [(call (mem:SI (match_dup 0))
+		    (match_dup 1))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (use (reg:SI 19))
+	      (use (const_int 0))])
+   (set (reg:SI 19) (reg:SI 4))]
+  "")
+
+;; Remove the clobber of register 4 when optimizing.  This has to be
+;; done with a peephole optimization rather than a split because the
+;; split sequence for a call must be longer than one instruction.
+(define_peephole2
+  [(parallel [(call (mem:SI (match_operand 0 "call_operand_address" ""))
+		    (match_operand 1 "" ""))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (clobber (reg:SI 4))
+	      (use (reg:SI 19))
+	      (use (const_int 0))])]
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT && reload_completed"
+  [(parallel [(call (mem:SI (match_dup 0))
+		    (match_dup 1))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (use (reg:SI 19))
+	      (use (const_int 0))])]
+  "")
+
+(define_insn "*call_symref_pic_post_reload"
+  [(call (mem:SI (match_operand 0 "call_operand_address" ""))
+	 (match_operand 1 "" "i"))
+   (clobber (reg:SI 1))
+   (clobber (reg:SI 2))
+   (use (reg:SI 19))
+   (use (const_int 0))]
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT"
+  "*
+{
+  output_arg_descriptor (insn);
+  return output_call (insn, operands[0], 0);
+}"
+  [(set_attr "type" "call")
+   (set (attr "length") (symbol_ref "attr_length_call (insn, 0)"))])
+
+;; This pattern is split if it is necessary to save and restore the
+;; PIC register.
+(define_insn "call_symref_64bit"
+  [(call (mem:SI (match_operand 0 "call_operand_address" ""))
+	 (match_operand 1 "" "i"))
+   (clobber (reg:DI 1))
+   (clobber (reg:DI 2))
+   (clobber (reg:DI 4))
+   (use (reg:DI 27))
+   (use (reg:DI 29))
+   (use (const_int 0))]
   "TARGET_64BIT"
   "*
 {
-  /* ??? Needs more work.  Length computation, split into multiple insns,
-     expose delay slot.  */
-  return \"ldd 16(%0),%%r2\;bve,l (%%r2),%%r2\;ldd 24(%0),%%r27\";
+  output_arg_descriptor (insn);
+  return output_call (insn, operands[0], 0);
+}"
+  [(set_attr "type" "call")
+   (set (attr "length")
+	(plus (symbol_ref "attr_length_call (insn, 0)")
+	      (symbol_ref "attr_length_save_restore_dltp (insn)")))])
+
+;; Split out the PIC register save and restore after reload.  This is
+;; done if the function doesn't return.
+(define_split
+  [(parallel [(call (mem:SI (match_operand 0 "call_operand_address" ""))
+		    (match_operand 1 "" ""))
+	      (clobber (reg:DI 1))
+	      (clobber (reg:DI 2))
+	      (clobber (reg:DI 4))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 0))])]
+  "TARGET_64BIT
+   && reload_completed
+   && !find_reg_note (insn, REG_NORETURN, NULL_RTX)"
+  [(set (reg:DI 4) (reg:DI 27))
+   (parallel [(call (mem:SI (match_dup 0))
+		    (match_dup 1))
+	      (clobber (reg:DI 1))
+	      (clobber (reg:DI 2))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 0))])
+   (set (reg:DI 27) (reg:DI 4))]
+  "")
+
+;; Remove the clobber of register 4 when optimizing.  This has to be
+;; done with a peephole optimization rather than a split because the
+;; split sequence for a call must be longer than one instruction.
+(define_peephole2
+  [(parallel [(call (mem:SI (match_operand 0 "call_operand_address" ""))
+		    (match_operand 1 "" ""))
+	      (clobber (reg:DI 1))
+	      (clobber (reg:DI 2))
+	      (clobber (reg:DI 4))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 0))])]
+  "TARGET_64BIT && reload_completed"
+  [(parallel [(call (mem:SI (match_dup 0))
+		    (match_dup 1))
+	      (clobber (reg:DI 1))
+	      (clobber (reg:DI 2))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 0))])]
+  "")
+
+(define_insn "*call_symref_64bit_post_reload"
+  [(call (mem:SI (match_operand 0 "call_operand_address" ""))
+	 (match_operand 1 "" "i"))
+   (clobber (reg:DI 1))
+   (clobber (reg:DI 2))
+   (use (reg:DI 27))
+   (use (reg:DI 29))
+   (use (const_int 0))]
+  "TARGET_64BIT"
+  "*
+{
+  output_arg_descriptor (insn);
+  return output_call (insn, operands[0], 0);
+}"
+  [(set_attr "type" "call")
+   (set (attr "length") (symbol_ref "attr_length_call (insn, 0)"))])
+
+(define_insn "call_reg"
+  [(call (mem:SI (reg:SI 22))
+	 (match_operand 0 "" "i"))
+   (clobber (reg:SI 1))
+   (clobber (reg:SI 2))
+   (use (const_int 1))]
+  "!TARGET_64BIT"
+  "*
+{
+  return output_indirect_call (insn, gen_rtx_REG (word_mode, 22));
 }"
   [(set_attr "type" "dyncall")
-   (set (attr "length") (const_int 12))])
+   (set (attr "length") (symbol_ref "attr_length_indirect_call (insn)"))])
 
-(define_insn "call_internal_reg"
+;; This pattern is split if it is necessary to save and restore the
+;; PIC register.
+(define_insn "call_reg_pic"
   [(call (mem:SI (reg:SI 22))
 	 (match_operand 0 "" "i"))
    (clobber (reg:SI 1))
    (clobber (reg:SI 2))
+   (clobber (reg:SI 4))
+   (use (reg:SI 19))
    (use (const_int 1))]
-  ""
+  "!TARGET_64BIT"
   "*
 {
-  rtx xoperands[2];
+  return output_indirect_call (insn, gen_rtx_REG (word_mode, 22));
+}"
+  [(set_attr "type" "dyncall")
+   (set (attr "length")
+	(plus (symbol_ref "attr_length_indirect_call (insn)")
+	      (symbol_ref "attr_length_save_restore_dltp (insn)")))])
+
+;; Split out the PIC register save and restore after reload.  This is
+;; done if the function doesn't return.
+(define_split
+  [(parallel [(call (mem:SI (reg:SI 22))
+		    (match_operand 0 "" ""))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (clobber (reg:SI 4))
+	      (use (reg:SI 19))
+	      (use (const_int 1))])]
+  "!TARGET_64BIT
+   && reload_completed
+   && !find_reg_note (insn, REG_NORETURN, NULL_RTX)"
+  [(set (reg:SI 4) (reg:SI 19))
+   (parallel [(call (mem:SI (reg:SI 22))
+		    (match_dup 0))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (use (reg:SI 19))
+	      (use (const_int 1))])
+   (set (reg:SI 19) (reg:SI 4))]
+  "")
 
-  /* First the special case for kernels, level 0 systems, etc.  */
-  if (TARGET_FAST_INDIRECT_CALLS)
-    return \"ble 0(%%sr4,%%r22)\;copy %%r31,%%r2\";
-
-  /* Now the normal case -- we can reach $$dyncall directly or
-     we're sure that we can get there via a long-branch stub. 
-
-     No need to check target flags as the length uniquely identifies
-     the remaining cases.  */
-  if (get_attr_length (insn) == 8)
-    return \".CALL\\tARGW0=GR\;{bl|b,l} $$dyncall,%%r31\;copy %%r31,%%r2\";
-
-  /* Long millicode call, but we are not generating PIC or portable runtime
-     code.  */
-  if (get_attr_length (insn) == 12)
-    return \".CALL\\tARGW0=GR\;ldil L%%$$dyncall,%%r2\;ble R%%$$dyncall(%%sr4,%%r2)\;copy %%r31,%%r2\";
-
-  /* Long millicode call for portable runtime.  */
-  if (get_attr_length (insn) == 20)
-    return \"ldil L%%$$dyncall,%%r31\;ldo R%%$$dyncall(%%r31),%%r31\;blr %%r0,%%r2\;bv,n %%r0(%%r31)\;nop\";
+;; Remove the clobber of register 4 when optimizing.  This has to be
+;; done with a peephole optimization rather than a split because the
+;; split sequence for a call must be longer than one instruction.
+(define_peephole2
+  [(parallel [(call (mem:SI (reg:SI 22))
+		    (match_operand 0 "" ""))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (clobber (reg:SI 4))
+	      (use (reg:SI 19))
+	      (use (const_int 1))])]
+  "!TARGET_64BIT && reload_completed"
+  [(parallel [(call (mem:SI (reg:SI 22))
+		    (match_dup 0))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (use (reg:SI 19))
+	      (use (const_int 1))])]
+  "")
 
-  /* If we're generating PIC code.  */
-  xoperands[0] = operands[0];
-  if (TARGET_SOM || ! TARGET_GAS)
-    xoperands[1] = gen_label_rtx ();
-  output_asm_insn (\"{bl|b,l} .+8,%%r1\", xoperands);
-  if (TARGET_SOM || ! TARGET_GAS)
-    {
-      output_asm_insn (\"addil L%%$$dyncall-%1,%%r1\", xoperands);
-      (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
-				 CODE_LABEL_NUMBER (xoperands[1]));
-      output_asm_insn (\"ldo R%%$$dyncall-%1(%%r1),%%r1\", xoperands);
-    }
-  else
-    {
-      output_asm_insn (\"addil L%%$$dyncall-$PIC_pcrel$0+4,%%r1\", xoperands);
-      output_asm_insn (\"ldo R%%$$dyncall-$PIC_pcrel$0+8(%%r1),%%r1\",
-      		       xoperands);
-    }
-  output_asm_insn (\"blr %%r0,%%r2\", xoperands);
-  output_asm_insn (\"bv,n %%r0(%%r1)\\n\\tnop\", xoperands);
-  return \"\";
+(define_insn "*call_reg_pic_post_reload"
+  [(call (mem:SI (reg:SI 22))
+	 (match_operand 0 "" "i"))
+   (clobber (reg:SI 1))
+   (clobber (reg:SI 2))
+   (use (reg:SI 19))
+   (use (const_int 1))]
+  "!TARGET_64BIT"
+  "*
+{
+  return output_indirect_call (insn, gen_rtx_REG (word_mode, 22));
+}"
+  [(set_attr "type" "dyncall")
+   (set (attr "length") (symbol_ref "attr_length_indirect_call (insn)"))])
+
+;; This pattern is split if it is necessary to save and restore the
+;; PIC register.
+(define_insn "call_reg_64bit"
+  [(call (mem:SI (match_operand:DI 0 "register_operand" "r"))
+	 (match_operand 1 "" "i"))
+   (clobber (reg:DI 2))
+   (clobber (reg:DI 4))
+   (use (reg:DI 27))
+   (use (reg:DI 29))
+   (use (const_int 1))]
+  "TARGET_64BIT"
+  "*
+{
+  return output_indirect_call (insn, operands[0]);
 }"
   [(set_attr "type" "dyncall")
    (set (attr "length")
-     (cond [
-;; First FAST_INDIRECT_CALLS
-	    (ne (symbol_ref "TARGET_FAST_INDIRECT_CALLS")
-		(const_int 0))
-	    (const_int 8)
-
-;; Target (or stub) within reach
-	    (and (lt (plus (symbol_ref "total_code_bytes") (pc))
-		     (const_int 240000))
-		 (eq (symbol_ref "TARGET_PORTABLE_RUNTIME")
-		     (const_int 0)))
-	    (const_int 8)
-
-;; Out of reach PIC
-	    (ne (symbol_ref "flag_pic")
-		(const_int 0))
-	    (const_int 24)
-
-;; Out of reach PORTABLE_RUNTIME
-	    (ne (symbol_ref "TARGET_PORTABLE_RUNTIME")
-		(const_int 0))
-	    (const_int 20)]
+	(plus (symbol_ref "attr_length_indirect_call (insn)")
+	      (symbol_ref "attr_length_save_restore_dltp (insn)")))])
+
+;; Split out the PIC register save and restore after reload.  This is
+;; done if the function doesn't return.
+(define_split
+  [(parallel [(call (mem:SI (match_operand 0 "register_operand" ""))
+		    (match_operand 1 "" ""))
+	      (clobber (reg:DI 2))
+	      (clobber (reg:DI 4))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 1))])]
+  "TARGET_64BIT
+   && reload_completed
+   && !find_reg_note (insn, REG_NORETURN, NULL_RTX)"
+  [(set (reg:DI 4) (reg:DI 27))
+   (parallel [(call (mem:SI (match_dup 0))
+		    (match_dup 1))
+	      (clobber (reg:DI 2))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 1))])
+   (set (reg:DI 27) (reg:DI 4))]
+  "")
+
+;; Remove the clobber of register 4 when optimizing.  This has to be
+;; done with a peephole optimization rather than a split because the
+;; split sequence for a call must be longer than one instruction.
+(define_peephole2
+  [(parallel [(call (mem:SI (match_operand 0 "register_operand" ""))
+		    (match_operand 1 "" ""))
+	      (clobber (reg:DI 2))
+	      (clobber (reg:DI 4))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 1))])]
+  "TARGET_64BIT && reload_completed"
+  [(parallel [(call (mem:SI (match_dup 0))
+		    (match_dup 1))
+	      (clobber (reg:DI 2))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 1))])]
+  "")
 
-;; Out of reach, can use ble
-	  (const_int 12)))])
+(define_insn "*call_reg_64bit_post_reload"
+  [(call (mem:SI (match_operand:DI 0 "register_operand" "r"))
+	 (match_operand 1 "" "i"))
+   (clobber (reg:DI 2))
+   (use (reg:DI 27))
+   (use (reg:DI 29))
+   (use (const_int 1))]
+  "TARGET_64BIT"
+  "*
+{
+  return output_indirect_call (insn, operands[0]);
+}"
+  [(set_attr "type" "dyncall")
+   (set (attr "length") (symbol_ref "attr_length_indirect_call (insn)"))])
 
 (define_expand "call_value"
   [(parallel [(set (match_operand 0 "" "")
@@ -6068,11 +6341,12 @@
   ""
   "
 {
-  rtx op;
-  rtx call_insn;
+  rtx op, call_insn;
+  rtx dst = operands[0];
+  rtx nb = operands[2];
 
   if (TARGET_PORTABLE_RUNTIME)
-    op = force_reg (word_mode, XEXP (operands[1], 0));
+    op = force_reg (SImode, XEXP (operands[1], 0));
   else
     op = XEXP (operands[1], 0);
 
@@ -6085,50 +6359,76 @@
      and calls through function pointers.  This is necessary as these two
      types of calls use different calling conventions, and CSE might try
      to change the named call into an indirect call in some cases (using
-     two patterns keeps CSE from performing this optimization).  */
-  if (GET_CODE (op) == SYMBOL_REF)
-    call_insn = emit_call_insn (gen_call_value_internal_symref (operands[0],
-								op,
-								operands[2]));
-  else if (TARGET_64BIT)
-    {
-      rtx tmpreg = force_reg (word_mode, op);
-      call_insn
-	= emit_call_insn (gen_call_value_internal_reg_64bit (operands[0],
-							     tmpreg,
-							     operands[2]));
-    }
-  else
-    {
-      rtx tmpreg = gen_rtx_REG (word_mode, 22);
-      emit_move_insn (tmpreg, force_reg (word_mode, op));
-      call_insn = emit_call_insn (gen_call_value_internal_reg (operands[0],
-							       operands[2]));
-    }
+     two patterns keeps CSE from performing this optimization).
 
+     We now use even more call patterns as there was a subtle bug in
+     attempting to restore the pic register after a call using a simple
+     move insn.  During reload, a instruction involving a pseudo register
+     with no explicit dependence on the PIC register can be converted
+     to an equivalent load from memory using the PIC register.  If we
+     emit a simple move to restore the PIC register in the initial rtl
+     generation, then it can potentially be repositioned during scheduling.
+     and an instruction that eventually uses the PIC register may end up
+     between the call and the PIC register restore.
+     
+     This only worked because there is a post call group of instructions
+     that are scheduled with the call.  These instructions are included
+     in the same basic block as the call.  However, calls can throw in
+     C++ code and a basic block has to terminate at the call if the call
+     can throw.  This results in the PIC register restore being scheduled
+     independently from the call.  So, we now hide the save and restore
+     of the PIC register in the call pattern until after reload.  Then,
+     we split the moves out.  A small side benefit is that we now don't
+     need to have a use of the PIC register in the return pattern and
+     the final save/restore operation is not needed.
+     
+     I elected to just clobber %r4 in the PIC patterns and use it instead
+     of trying to force hppa_pic_save_rtx () to a callee saved register.
+     This might have required a new register class and constraint.  It
+     was also simpler to just handle the restore from a register than a
+     generic pseudo.  */
   if (TARGET_64BIT)
-    use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);
-
-  if (flag_pic)
     {
-      use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
+      if (GET_CODE (op) == SYMBOL_REF)
+	call_insn = emit_call_insn (gen_call_val_symref_64bit (dst, op, nb));
+      else
+	{
+	  op = force_reg (word_mode, op);
+	  call_insn = emit_call_insn (gen_call_val_reg_64bit (dst, op, nb));
+	}
+    }
+  else
+    {
+      if (GET_CODE (op) == SYMBOL_REF)
+	{
+	  if (flag_pic)
+	    call_insn = emit_call_insn (gen_call_val_symref_pic (dst, op, nb));
+	  else
+	    call_insn = emit_call_insn (gen_call_val_symref (dst, op, nb));
+	}
+      else
+	{
+	  rtx tmpreg = gen_rtx_REG (word_mode, 22);
 
-      /* After each call we must restore the PIC register, even if it
-	 doesn't appear to be used.  */
-      emit_move_insn (pic_offset_table_rtx, hppa_pic_save_rtx ());
+	  emit_move_insn (tmpreg, force_reg (word_mode, op));
+	  if (flag_pic)
+	    call_insn = emit_call_insn (gen_call_val_reg_pic (dst, nb));
+	  else
+	    call_insn = emit_call_insn (gen_call_val_reg (dst, nb));
+	}
     }
+
   DONE;
 }")
 
-(define_insn "call_value_internal_symref"
+(define_insn "call_val_symref"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand 1 "call_operand_address" ""))
 	      (match_operand 2 "" "i")))
    (clobber (reg:SI 1))
    (clobber (reg:SI 2))
    (use (const_int 0))]
-  ;;- Don't use operand 1 for most machines.
-  "! TARGET_PORTABLE_RUNTIME"
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT"
   "*
 {
   output_arg_descriptor (insn);
@@ -6137,104 +6437,364 @@
   [(set_attr "type" "call")
    (set (attr "length") (symbol_ref "attr_length_call (insn, 0)"))])
 
-(define_insn "call_value_internal_reg_64bit"
+(define_insn "call_val_symref_pic"
   [(set (match_operand 0 "" "")
-         (call (mem:SI (match_operand:DI 1 "register_operand" "r"))
-	       (match_operand 2 "" "i")))
+	(call (mem:SI (match_operand 1 "call_operand_address" ""))
+	      (match_operand 2 "" "i")))
+   (clobber (reg:SI 1))
    (clobber (reg:SI 2))
-   (use (const_int 1))]
+   (clobber (reg:SI 4))
+   (use (reg:SI 19))
+   (use (const_int 0))]
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT"
+  "*
+{
+  output_arg_descriptor (insn);
+  return output_call (insn, operands[1], 0);
+}"
+  [(set_attr "type" "call")
+   (set (attr "length")
+	(plus (symbol_ref "attr_length_call (insn, 0)")
+	      (symbol_ref "attr_length_save_restore_dltp (insn)")))])
+
+;; Split out the PIC register save and restore after reload.  This is
+;; done if the function doesn't return.
+(define_split
+  [(parallel [(set (match_operand 0 "" "")
+	      (call (mem:SI (match_operand 1 "call_operand_address" ""))
+		    (match_operand 2 "" "")))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (clobber (reg:SI 4))
+	      (use (reg:SI 19))
+	      (use (const_int 0))])]
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT
+   && reload_completed
+   && !find_reg_note (insn, REG_NORETURN, NULL_RTX)"
+  [(set (reg:SI 4) (reg:SI 19))
+   (parallel [(set (match_dup 0)
+	      (call (mem:SI (match_dup 1))
+		    (match_dup 2)))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (use (reg:SI 19))
+	      (use (const_int 0))])
+   (set (reg:SI 19) (reg:SI 4))]
+  "")
+
+;; Remove the clobber of register 4 when optimizing.  This has to be
+;; done with a peephole optimization rather than a split because the
+;; split sequence for a call must be longer than one instruction.
+(define_peephole2
+  [(parallel [(set (match_operand 0 "" "")
+	      (call (mem:SI (match_operand 1 "call_operand_address" ""))
+		    (match_operand 2 "" "")))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (clobber (reg:SI 4))
+	      (use (reg:SI 19))
+	      (use (const_int 0))])]
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT && reload_completed"
+  [(parallel [(set (match_dup 0)
+	      (call (mem:SI (match_dup 1))
+		    (match_dup 2)))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (use (reg:SI 19))
+	      (use (const_int 0))])]
+  "")
+
+(define_insn "*call_val_symref_pic_post_reload"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand 1 "call_operand_address" ""))
+	      (match_operand 2 "" "i")))
+   (clobber (reg:SI 1))
+   (clobber (reg:SI 2))
+   (use (reg:SI 19))
+   (use (const_int 0))]
+  "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT"
+  "*
+{
+  output_arg_descriptor (insn);
+  return output_call (insn, operands[1], 0);
+}"
+  [(set_attr "type" "call")
+   (set (attr "length") (symbol_ref "attr_length_call (insn, 0)"))])
+
+;; This pattern is split if it is necessary to save and restore the
+;; PIC register.
+(define_insn "call_val_symref_64bit"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand 1 "call_operand_address" ""))
+	      (match_operand 2 "" "i")))
+   (clobber (reg:DI 1))
+   (clobber (reg:DI 2))
+   (clobber (reg:DI 4))
+   (use (reg:DI 27))
+   (use (reg:DI 29))
+   (use (const_int 0))]
   "TARGET_64BIT"
   "*
 {
-  /* ??? Needs more work.  Length computation, split into multiple insns,
-     expose delay slot.  */
-  return \"ldd 16(%1),%%r2\;bve,l (%%r2),%%r2\;ldd 24(%1),%%r27\";
+  output_arg_descriptor (insn);
+  return output_call (insn, operands[1], 0);
+}"
+  [(set_attr "type" "call")
+   (set (attr "length")
+	(plus (symbol_ref "attr_length_call (insn, 0)")
+	      (symbol_ref "attr_length_save_restore_dltp (insn)")))])
+
+;; Split out the PIC register save and restore after reload.  This is
+;; done if the function doesn't return.
+(define_split
+  [(parallel [(set (match_operand 0 "" "")
+	      (call (mem:SI (match_operand 1 "call_operand_address" ""))
+		    (match_operand 2 "" "")))
+	      (clobber (reg:DI 1))
+	      (clobber (reg:DI 2))
+	      (clobber (reg:DI 4))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 0))])]
+  "TARGET_64BIT
+   && reload_completed
+   && !find_reg_note (insn, REG_NORETURN, NULL_RTX)"
+  [(set (reg:DI 4) (reg:DI 27))
+   (parallel [(set (match_dup 0)
+	      (call (mem:SI (match_dup 1))
+		    (match_dup 2)))
+	      (clobber (reg:DI 1))
+	      (clobber (reg:DI 2))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 0))])
+   (set (reg:DI 27) (reg:DI 4))]
+  "")
+
+;; Remove the clobber of register 4 when optimizing.  This has to be
+;; done with a peephole optimization rather than a split because the
+;; split sequence for a call must be longer than one instruction.
+(define_peephole2
+  [(parallel [(set (match_operand 0 "" "")
+	      (call (mem:SI (match_operand 1 "call_operand_address" ""))
+		    (match_operand 2 "" "")))
+	      (clobber (reg:DI 1))
+	      (clobber (reg:DI 2))
+	      (clobber (reg:DI 4))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 0))])]
+  "TARGET_64BIT && reload_completed"
+  [(parallel [(set (match_dup 0)
+	      (call (mem:SI (match_dup 1))
+		    (match_dup 2)))
+	      (clobber (reg:DI 1))
+	      (clobber (reg:DI 2))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 0))])]
+  "")
+
+(define_insn "*call_val_symref_64bit_post_reload"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand 1 "call_operand_address" ""))
+	      (match_operand 2 "" "i")))
+   (clobber (reg:DI 1))
+   (clobber (reg:DI 2))
+   (use (reg:DI 27))
+   (use (reg:DI 29))
+   (use (const_int 0))]
+  "TARGET_64BIT"
+  "*
+{
+  output_arg_descriptor (insn);
+  return output_call (insn, operands[1], 0);
+}"
+  [(set_attr "type" "call")
+   (set (attr "length") (symbol_ref "attr_length_call (insn, 0)"))])
+
+(define_insn "call_val_reg"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (reg:SI 22))
+	      (match_operand 1 "" "i")))
+   (clobber (reg:SI 1))
+   (clobber (reg:SI 2))
+   (use (const_int 1))]
+  "!TARGET_64BIT"
+  "*
+{
+  return output_indirect_call (insn, gen_rtx_REG (word_mode, 22));
 }"
   [(set_attr "type" "dyncall")
-   (set (attr "length") (const_int 12))])
+   (set (attr "length") (symbol_ref "attr_length_indirect_call (insn)"))])
 
-(define_insn "call_value_internal_reg"
+;; This pattern is split if it is necessary to save and restore the
+;; PIC register.
+(define_insn "call_val_reg_pic"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (reg:SI 22))
 	      (match_operand 1 "" "i")))
    (clobber (reg:SI 1))
    (clobber (reg:SI 2))
+   (clobber (reg:SI 4))
+   (use (reg:SI 19))
    (use (const_int 1))]
-  ""
+  "!TARGET_64BIT"
   "*
 {
-  rtx xoperands[2];
+  return output_indirect_call (insn, gen_rtx_REG (word_mode, 22));
+}"
+  [(set_attr "type" "dyncall")
+   (set (attr "length")
+	(plus (symbol_ref "attr_length_indirect_call (insn)")
+	      (symbol_ref "attr_length_save_restore_dltp (insn)")))])
 
-  /* First the special case for kernels, level 0 systems, etc.  */
-  if (TARGET_FAST_INDIRECT_CALLS)
-    return \"ble 0(%%sr4,%%r22)\;copy %%r31,%%r2\";
-
-  /* Now the normal case -- we can reach $$dyncall directly or
-     we're sure that we can get there via a long-branch stub. 
-
-     No need to check target flags as the length uniquely identifies
-     the remaining cases.  */
-  if (get_attr_length (insn) == 8)
-    return \".CALL\\tARGW0=GR\;{bl|b,l} $$dyncall,%%r31\;copy %%r31,%%r2\";
-
-  /* Long millicode call, but we are not generating PIC or portable runtime
-     code.  */
-  if (get_attr_length (insn) == 12)
-    return \".CALL\\tARGW0=GR\;ldil L%%$$dyncall,%%r2\;ble R%%$$dyncall(%%sr4,%%r2)\;copy %%r31,%%r2\";
-
-  /* Long millicode call for portable runtime.  */
-  if (get_attr_length (insn) == 20)
-    return \"ldil L%%$$dyncall,%%r31\;ldo R%%$$dyncall(%%r31),%%r31\;blr %%r0,%%r2\;bv,n %%r0(%%r31)\;nop\";
-
-  /* If we're generating PIC code.  */
-  xoperands[0] = operands[1];
-  if (TARGET_SOM || ! TARGET_GAS)
-    xoperands[1] = gen_label_rtx ();
-  output_asm_insn (\"{bl|b,l} .+8,%%r1\", xoperands);
-  if (TARGET_SOM || ! TARGET_GAS)
-    {
-      output_asm_insn (\"addil L%%$$dyncall-%1,%%r1\", xoperands);
-      (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
-				 CODE_LABEL_NUMBER (xoperands[1]));
-      output_asm_insn (\"ldo R%%$$dyncall-%1(%%r1),%%r1\", xoperands);
-    }
-  else
-    {
-      output_asm_insn (\"addil L%%$$dyncall-$PIC_pcrel$0+4,%%r1\", xoperands);
-      output_asm_insn (\"ldo R%%$$dyncall-$PIC_pcrel$0+8(%%r1),%%r1\",
-      		       xoperands);
-    }
-  output_asm_insn (\"blr %%r0,%%r2\", xoperands);
-  output_asm_insn (\"bv,n %%r0(%%r1)\\n\\tnop\", xoperands);
-  return \"\";
+;; Split out the PIC register save and restore after reload.  This is
+;; done if the function doesn't return.
+(define_split
+  [(parallel [(set (match_operand 0 "" "")
+		   (call (mem:SI (reg:SI 22))
+			 (match_operand 1 "" "")))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (clobber (reg:SI 4))
+	      (use (reg:SI 19))
+	      (use (const_int 1))])]
+  "!TARGET_64BIT
+   && reload_completed
+   && !find_reg_note (insn, REG_NORETURN, NULL_RTX)"
+  [(set (reg:SI 4) (reg:SI 19))
+   (parallel [(set (match_dup 0)
+		   (call (mem:SI (reg:SI 22))
+			 (match_dup 1)))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (use (reg:SI 19))
+	      (use (const_int 1))])
+   (set (reg:SI 19) (reg:SI 4))]
+  "")
+
+;; Remove the clobber of register 4 when optimizing.  This has to be
+;; done with a peephole optimization rather than a split because the
+;; split sequence for a call must be longer than one instruction.
+(define_peephole2
+  [(parallel [(set (match_operand 0 "" "")
+		   (call (mem:SI (reg:SI 22))
+			 (match_operand 1 "" "")))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (clobber (reg:SI 4))
+	      (use (reg:SI 19))
+	      (use (const_int 1))])]
+  "!TARGET_64BIT && reload_completed"
+  [(parallel [(set (match_dup 0)
+		   (call (mem:SI (reg:SI 22))
+			 (match_dup 1)))
+	      (clobber (reg:SI 1))
+	      (clobber (reg:SI 2))
+	      (use (reg:SI 19))
+	      (use (const_int 1))])]
+  "")
+
+(define_insn "*call_val_reg_pic_post_reload"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (reg:SI 22))
+	      (match_operand 1 "" "i")))
+   (clobber (reg:SI 1))
+   (clobber (reg:SI 2))
+   (use (reg:SI 19))
+   (use (const_int 1))]
+  "!TARGET_64BIT"
+  "*
+{
+  return output_indirect_call (insn, gen_rtx_REG (word_mode, 22));
+}"
+  [(set_attr "type" "dyncall")
+   (set (attr "length") (symbol_ref "attr_length_indirect_call (insn)"))])
+
+;; This pattern is split if it is necessary to save and restore the
+;; PIC register.
+(define_insn "call_val_reg_64bit"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand:DI 1 "register_operand" "r"))
+	      (match_operand 2 "" "i")))
+   (clobber (reg:DI 2))
+   (clobber (reg:DI 4))
+   (use (reg:DI 27))
+   (use (reg:DI 29))
+   (use (const_int 1))]
+  "TARGET_64BIT"
+  "*
+{
+  return output_indirect_call (insn, operands[1]);
 }"
   [(set_attr "type" "dyncall")
    (set (attr "length")
-     (cond [
-;; First FAST_INDIRECT_CALLS
-	    (ne (symbol_ref "TARGET_FAST_INDIRECT_CALLS")
-		(const_int 0))
-	    (const_int 8)
-
-;; Target (or stub) within reach
-	    (and (lt (plus (symbol_ref "total_code_bytes") (pc))
-		     (const_int 240000))
-		 (eq (symbol_ref "TARGET_PORTABLE_RUNTIME")
-		     (const_int 0)))
-	    (const_int 8)
-
-;; Out of reach PIC
-	    (ne (symbol_ref "flag_pic")
-		(const_int 0))
-	    (const_int 24)
-
-;; Out of reach PORTABLE_RUNTIME
-	    (ne (symbol_ref "TARGET_PORTABLE_RUNTIME")
-		(const_int 0))
-	    (const_int 20)]
+	(plus (symbol_ref "attr_length_indirect_call (insn)")
+	      (symbol_ref "attr_length_save_restore_dltp (insn)")))])
+
+;; Split out the PIC register save and restore after reload.  This is
+;; done if the function doesn't return.
+(define_split
+  [(parallel [(set (match_operand 0 "" "")
+		   (call (mem:SI (match_operand:DI 1 "register_operand" ""))
+			 (match_operand 2 "" "")))
+	      (clobber (reg:DI 2))
+	      (clobber (reg:DI 4))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 1))])]
+  "TARGET_64BIT
+   && reload_completed
+   && !find_reg_note (insn, REG_NORETURN, NULL_RTX)"
+  [(set (reg:DI 4) (reg:DI 27))
+   (parallel [(set (match_dup 0)
+		   (call (mem:SI (match_dup 1))
+			 (match_dup 2)))
+	      (clobber (reg:DI 2))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 1))])
+   (set (reg:DI 27) (reg:DI 4))]
+  "")
+
+;; Remove the clobber of register 4 when optimizing.  This has to be
+;; done with a peephole optimization rather than a split because the
+;; split sequence for a call must be longer than one instruction.
+(define_peephole2
+  [(parallel [(set (match_operand 0 "" "")
+		   (call (mem:SI (match_operand:DI 1 "register_operand" ""))
+			 (match_operand 2 "" "")))
+	      (clobber (reg:DI 2))
+	      (clobber (reg:DI 4))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 1))])]
+  "TARGET_64BIT && reload_completed"
+  [(parallel [(set (match_dup 0)
+		   (call (mem:SI (match_dup 1))
+			 (match_dup 2)))
+	      (clobber (reg:DI 2))
+	      (use (reg:DI 27))
+	      (use (reg:DI 29))
+	      (use (const_int 1))])]
+  "")
 
-;; Out of reach, can use ble
-	  (const_int 12)))])
+(define_insn "*call_val_reg_64bit_post_reload"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand:DI 1 "register_operand" "r"))
+	      (match_operand 2 "" "i")))
+   (clobber (reg:DI 2))
+   (use (reg:DI 27))
+   (use (reg:DI 29))
+   (use (const_int 1))]
+  "TARGET_64BIT"
+  "*
+{
+  return output_indirect_call (insn, operands[1]);
+}"
+  [(set_attr "type" "dyncall")
+   (set (attr "length") (symbol_ref "attr_length_indirect_call (insn)"))])
 
 ;; Call subroutine returning any type.
 
@@ -6292,14 +6852,10 @@
   if (TARGET_64BIT)
     use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);
 
+  /* We don't have to restore the PIC register.  */
   if (flag_pic)
-    {
-      use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
+    use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
 
-      /* After each call we must restore the PIC register, even if it
-	 doesn't appear to be used.  */
-      emit_move_insn (pic_offset_table_rtx, hppa_pic_save_rtx ());
-    }
   DONE;
 }")
 
@@ -6321,9 +6877,8 @@
 (define_insn "sibcall_internal_symref_64bit"
   [(call (mem:SI (match_operand 0 "call_operand_address" ""))
 	 (match_operand 1 "" "i"))
-   (clobber (reg:SI 1))
-   (clobber (reg:SI 27))
-   (use (reg:SI 2))
+   (clobber (reg:DI 1))
+   (use (reg:DI 2))
    (use (const_int 0))]
   "TARGET_64BIT"
   "*
@@ -6364,14 +6919,10 @@
   if (TARGET_64BIT)
     use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);
 
+  /* We don't have to restore the PIC register.  */
   if (flag_pic)
-    {
-      use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
+    use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
 
-      /* After each call we must restore the PIC register, even if it
-	 doesn't appear to be used.  */
-      emit_move_insn (pic_offset_table_rtx, hppa_pic_save_rtx ());
-    }
   DONE;
 }")
 
@@ -6395,9 +6946,8 @@
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand 1 "call_operand_address" ""))
 	      (match_operand 2 "" "i")))
-   (clobber (reg:SI 1))
-   (clobber (reg:SI 27))
-   (use (reg:SI 2))
+   (clobber (reg:DI 1))
+   (use (reg:DI 2))
    (use (const_int 0))]
   "TARGET_64BIT"
   "*
@@ -7340,7 +7890,9 @@
 						    \"$$sh_func_adrs\"));
 }"
   [(set_attr "type" "multi")
-   (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 20)"))])
+   (set (attr "length")
+	(plus (symbol_ref "attr_length_millicode_call (insn)")
+	      (const_int 20)))])
 
 ;; On the PA, the PIC register is call clobbered, so it must
 ;; be saved & restored around calls by the caller.  If the call
@@ -7361,6 +7913,7 @@
   /* Restore the PIC register using hppa_pic_save_rtx ().  The
      PIC register is not saved in the frame in 64-bit ABI.  */
   emit_move_insn (pic_offset_table_rtx, hppa_pic_save_rtx ());
+  emit_insn (gen_blockage ());
   DONE;
 }")
 
@@ -7375,5 +7928,6 @@
      a stack slot.  The only registers that are valid after a
      builtin_longjmp are the stack and frame pointers.  */
   emit_move_insn (pic_offset_table_rtx, hppa_pic_save_rtx ());
+  emit_insn (gen_blockage ());
   DONE;
 }")


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]