This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: SH optimized software floating point routines

From: Joern Rennecke <joern dot rennecke at embecosm dot com>
To: "Naveen H. S" <Naveen dot S at kpitcummins dot com>, Kaz Kojima <kkojima at rr dot iij4u dot or dot jp>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Cc: Prafulla Thakare <Prafulla dot Thakare at kpitcummins dot com>
Date: Sun, 18 Jul 2010 20:58:38 -0400
Subject: RE: SH optimized software floating point routines
References: <371569CBCFB2E745B891DBB88B2DFDDD19CAF5484D@KCINPUNHJCMS01.kpit.com> <20100614.101458.229239801.kkojima@rr.iij4u.or.jp> <371569CBCFB2E745B891DBB88B2DFDDD19DC264F56@KCINPUNHJCMS01.kpit.com> <20100717092859.dkxjsdzg0okk8o4c-nzlynne@webmail.spamcop.net>

I've found two bugs in truncdfsf2;
I've also added back a number of hunks that Naveen had dropped.

Note that most of the patch has been prepared in 2006, so that is the
proper most recent copyright date for those files that haven't been touched
save for swapping the Copyright notice.

TODO:

- Test & submit companion patches separately.
- Test.

2010-07-18  Joern Rennecke  <joern.rennecke@embecosm.com>

	* config/sh/IEEE-754/divsf3.S (divsf3):
	Fix sign for zero r4 input.
	Fix comments for NaN return.
	Remove redundant some code.
	* config/sh/ieee-754-df.S: Add comments on
	RETURN_R0_MAIN / RETURN_R0 / RETURN_FR0.
	(RETURN_FR0): Add missing backslash.
	[!DYN_SHIFT] (extendsfdf2) <zero_denorm>: Fix mask used in
	shift_byte loop.
	[!DYN_SHIFT] (extendsfdf2) <x00ff0000>: New constant.
	[!DYN_SHIFT] (truncdfsf2) <inf>: Fix returned value.
	[DYN_SHIFT] (truncdfsf2) <inf>: Likewise.
	[!DYN_SHIFT] (truncdfsf2) <xffe00000>: Remove now unused constant.
	[DYN_SHIFT] (truncdfsf2) <xffe00000>: Likewise.
	* config/sh/sh.c (sh_expand_float_condop): Changed parameters to
	allow separate passing of comparison operands and destination.
	Changed callers.
	Replace use of from_compare.
	Use emit instead of emit_jump_insn.
	(sh_soft_fp_cmp): Remove REG_LIBCALL / REG_RETCAL code.
	Use set_unique_reg_note.
	(expand_sfunc_op): Likewise.
	* config/sh/sh.md (cstoresf4): Add support for software floating point.
	(cstoredf4, cbranchsf4, cbranchdf4): Likewise.
	(cmpnedf_i1): Fix predicate.
	(truncdfsf2): Add TARGET_SH2E case.
	(mulsf3): Fix condition for emitting mulsf3_i3.
	* config/sh/IEEE-754/adddf3.S: Adjust NaN value returned for
	+inf + -inf to agree with DF_NAN_MASK mask.

	* config/sh/t-sh (gt-sh.h): Remove redundant rule.
	(LIB1ASMFUNCS): Add _unordsf2 and _unorddf2.

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>

	* targhooks.c (regs.h): #include.
	(default_match_adjust): New function.
	* targhooks.h (default_match_adjust): Declare.
	* reload.c (operands_match_p): Use targetm.match_adjust.
	* target.h (struct gcc_target): Add member match_adjust.
	* target-def.h (TARGET_MATCH_ADJUST): New macro.
	* Makefile.in (targhooks.o): Depend on $(REGS_H).
	* config/sh/sh-protos.h (sh_match_adjust): Declare.
	* config/sh/sh.c (TARGET_MATCH_ADJUST): Define as sh_match_adjust.
	(sh_match_adjust): New function.

2006-09-15  J"orn Rennecke  <joern.rennecke@st.com>

	* sched-deps.c (sched_analyze_2): When a likely spilled register
	is used, put in into a scheduling group with the insn that
	sets it and with all the insns in-between.

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>

	config/sh/t-sh: ($(T)ic_invalidate_array_4-100.o): Add -I. .
	($(T)ic_invalidate_array_4-200.o): Likewise.
	($(T)ic_invalidate_array_4a.o): Likewise.

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>

	* config/sh/sh.h (LIBGCC2_DOUBLE_TYPE_SIZE): Define

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>

	* sh.md (*movsicc_t_false, *movsicc_t_true): Add mode.

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>
	    Aanchal Khanna   <aanchalk@noida.hcltech.com>
	    Rakesh Kumar  <rakesh.kumar@noida.hcltech.com>

	* config/sh/sh-protos.h (sh_function_kind): New enumerator
	SFUNC_FREQUENT.
	(expand_sfunc_unop, expand_sfunc_binop): Declare.
	(sh_expand_float_cbranch): Likewise.
	* config/sh/lib1funcs.asm (ieee-754-sf.S, ieee-754-df.S): #include.
	* config/sh/t-sh (LIB1ASMFUNCS): Add nesf2, _nedf2, _gtsf2t, _gtdf2t,
	_gesf2f, _gedf2f, _extendsfdf2, , _truncdfsf2, _add_sub_sf3, _mulsf3,
	_hypotf, _muldf3, _add_sub_df3, _divsf3, _divdf3, _fixunssfsi,
	_fixsfsi, _fixunsdfsi, _fixdfsi, _floatunssisf, _floatsisf,
	_floatunssidf and _floatsidf.
	(FPBIT, DPBIT, dp-bit.c, fp-bit.c): Removed.
	* config/sh/ieee-754-df.S, config/sh/ieee-754-sf.S: New files.
	* config/sh/predicates.md (soft_fp_comparison_operand): New predicate.
	(soft_fp_comparison_operator): Likewise.
	* config/sh/sh.c (sh_soft_fp_cmp, expand_sfunc_op): New functions.
	(expand_sfunc_unop, expand_sfunc_binop): Likewise.
	(sh_expand_float_cbranch): Likewise.
	(sh_expand_float_condop, sh_expand_float_scc): Likewise.
	(from_compare): Add support for software floating point.
	(function_symbol): Always look up name.  Add SFUNC_FREQUENT case.
	* config/sh/sh.h (TARGET_SH1_SOFTFP): New macro.
	(TARGET_SH1_SOFTFP_MODE): Likewise.
	* config/sh/sh-modes.def (CC_FP_NE, CC_FP_GT, CC_FP_UNLT): New modes.
	* config/sh/lib1funcs.h (SLC, SLI, SLCMP, DMULU_SAVE): New macros.
	(DMULUL, DMULUH, DMULU_RESTORE, SHLL4, SHLR4, SHLL6, SHLR6): Likewise.
	(SHLL12, SHLR12, SHLR19, SHLL23, SHLR24, SHLR21, SHLL21): Likewise.
	(SHLR11, SHLR22, SHLR23, SHLR20, SHLL20, SHLD_COUNT, SHLRN): Likewise.
	(SHLLN, DYN_SHIFT): Likewise.
	(SUPPORT_SH3_OSFP, SUPPORT_SH3E_OSFP): Likewise.
	(SUPPORT_SH4_NOFPU_OSFP, SUPPORT_SH4_SINGLE_ONLY_OSFP): Likewise.
	(TARGET_OSFP): Likewise.
	* config/sh/IEEE-754/m3/divsf3.S: New file.
	* config/sh/IEEE-754/m3/divdf3.S: Likewise.
	* config/sh/IEEE-754/m3/floatunssisf.S: Likewise.
	* config/sh/IEEE-754/m3/floatunssidf.S: Likewise.
	* config/sh/IEEE-754/m3/fixunsdfsi.S: Likewise.
	* config/sh/IEEE-754/m3/divdf3-rt.S: Likewise.
	* config/sh/IEEE-754/m3/addsf3.S: Likewise.
	* config/sh/IEEE-754/m3/adddf3.S: Likewise.
	* config/sh/IEEE-754/m3/mulsf3.S: Likewise.
	* config/sh/IEEE-754/m3/muldf3.S: Likewise.
	* config/sh/IEEE-754/m3/floatsisf.S: Likewise.
	* config/sh/IEEE-754/m3/floatsidf.S: Likewise.
	* config/sh/IEEE-754/m3/fixdfsi.S: Likewise.
	* config/sh/IEEE-754/divdf3.S: Likewise.
	* config/sh/IEEE-754/floatunssisf.S: Likewise.
	* config/sh/IEEE-754/fixunsdfsi.S: Likewise.
	* config/sh/IEEE-754/adddf3.S: Likewise.
	* config/sh/IEEE-754/floatsisf.S: Likewise.
	* config/sh/IEEE-754/muldf3.S: Likewise.
	* config/sh/IEEE-754/fixdfsi.S: Likewise.
	* config/sh/IEEE-754/divsf3.S: Likewise.
	* config/sh/IEEE-754/fixunssfsi.S: Likewise.
	* config/sh/IEEE-754/floatunssidf.S: Likewise.
	* config/sh/IEEE-754/addsf3.S: Likewise.
	* config/sh/IEEE-754/mulsf3.S: Likewise.
	* config/sh/IEEE-754/floatsidf.S: Likewise.
	* config/sh/IEEE-754/fixsfsi.S: Likewise.
	* config/sh/sh.md (SF_NAN_MASK, DF_NAN_MASK, FR4_REG): New constants.
	(fpcmp_i1, addsf3_i3, subsf3_i3): New patterns.
	(mulsf3_i3, cmpnesf_i1, cmpgtsf_i1, cmpunltsf_i1): Likewise.
	(cmpeqsf_i1_finite, cmplesf_i1_finite, cmpunsf_i1): Likewise.
	(cmpuneqsf_i1, movcc_fp_ne, movcc_fp_gtmovcc_fp_unlt): Likewise.
	(cmpltgtsf_t, cmporderedsf_t, cmpltgtsf_t_4): Likewise.
	(cmporderedsf_t_4, abssc2, adddf3_i3_wrap, adddf3_i3): Likewise.
	(muldf3_i3_wrap, muldf3_i3, cmpnedf_i1, cmpgtdf_i1): Likewise.
	(cmpunltdf_i1, cmpeqdf_i1_finite, cmpundf_i1, cmpuneqdf_i1): Likewise.
	(cmpltgtdf_t, cmpordereddf_t_4, extendsfdf2_i1): Likewise.
	(extendsfdf2_i2e, extendsfdf2_i2e_r0, truncdfsf2_i2e): Likewise.
	(extendsfdf2_i1_r0, truncdfsf2_i1): Likewise.
	(cmpun_sdf, cmpuneq_sdf): Likewise.
	(addsf3, subsf3, mulsf3): Add support for software floating point.
	(adddf3, subdf3, muldf3, extendsfdf2, truncdfsf2): Likewise.
	(cmpsf, cmpdf): Don't enable for TARGET_SH2E.
	(movnegt): Match only one operand.  Changed user.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 162269)
+++ gcc/doc/tm.texi	(working copy)
@@ -2753,6 +2753,10 @@ of the individual moves due to expected 
 forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_MATCH_ADJUST (rtx, @var{int})
+This hook is documented in @file{target.def} / @file{targhooks.c}.
+@end deftypefn
+
 @defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 162269)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -2753,6 +2753,8 @@ of the individual moves due to expected 
 forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
 @end deftypefn
 
+@hook TARGET_MATCH_ADJUST
+
 @defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 162269)
+++ gcc/targhooks.c	(working copy)
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  
 #include "reload.h"
 #include "optabs.h"
 #include "recog.h"
+#include "regs.h"
 
 
 bool
@@ -906,6 +907,27 @@ default_secondary_reload (bool in_p ATTR
   return rclass;
 }
 
+/*  Given an rtx and its regno, return a regno value that shall be used for
+    purposes of comparison in operands_match_p.
+    Generally, we say that integer registers are subject to big-endian
+    adjustment.  This default target hook should generally work if the mode
+    of a register is a sufficient indication if this adjustment is to take
+    place; this will not work when software floating point is done in integer
+    registers.  */
+int
+default_match_adjust (rtx x, int regno)
+{
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.  */
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+      && SCALAR_INT_MODE_P (GET_MODE (x))
+      && regno < FIRST_PSEUDO_REGISTER)
+    regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+  return regno;
+}
+
 void
 default_target_option_override (void)
 {
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 162269)
+++ gcc/targhooks.h	(working copy)
@@ -121,6 +121,7 @@ extern const reg_class_t *default_ira_co
 extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
 					     enum machine_mode,
 					     secondary_reload_info *);
+extern int default_match_adjust (rtx, int);
 extern void default_target_option_override (void);
 extern void hook_void_bitmap (bitmap);
 extern bool default_handle_c_option (size_t, const char *, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 162269)
+++ gcc/target.def	(working copy)
@@ -1945,6 +1945,14 @@ DEFHOOK
   secondary_reload_info *sri),
  default_secondary_reload)
 
+/* Take an rtx and its regno, and return the regno for purposes of
+   checking a matching constraint.  */
+DEFHOOK
+(match_adjust,
+ "This hook is documented in @file{target.def} / @file{targhooks.c}.",
+ int, (rtx, int),
+ default_match_adjust)
+
 /* This target hook allows the backend to perform additional
    processing while initializing for variable expansion.  */
 DEFHOOK
Index: gcc/reload.c
===================================================================
--- gcc/reload.c	(revision 162269)
+++ gcc/reload.c	(working copy)
@@ -2216,14 +2216,8 @@ operands_match_p (rtx x, rtx y)
 	 multiple hard register group of scalar integer registers, so that
 	 for example (reg:DI 0) and (reg:SI 1) will be considered the same
 	 register.  */
-      if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
-	  && SCALAR_INT_MODE_P (GET_MODE (x))
-	  && i < FIRST_PSEUDO_REGISTER)
-	i += hard_regno_nregs[i][GET_MODE (x)] - 1;
-      if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (y)) > UNITS_PER_WORD
-	  && SCALAR_INT_MODE_P (GET_MODE (y))
-	  && j < FIRST_PSEUDO_REGISTER)
-	j += hard_regno_nregs[j][GET_MODE (y)] - 1;
+      i = targetm.match_adjust (x, i);
+      j = targetm.match_adjust (y, j);
 
       return i == j;
     }
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 162269)
+++ gcc/Makefile.in	(working copy)
@@ -2806,7 +2806,7 @@ opts-common.o : opts-common.c opts.h opt
 targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
    $(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
    $(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
-   $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h
+   $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h $(REGS_H)
 
 bversion.h: s-bversion; @true
 s-bversion: BASE-VER
Index: gcc/config/sh/sh-protos.h
===================================================================
--- gcc/config/sh/sh-protos.h	(revision 162269)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -25,8 +25,13 @@ along with GCC; see the file COPYING3.  
 #define GCC_SH_PROTOS_H
 
 enum sh_function_kind {
-  /* A function with normal C ABI  */
+  /* A function with normal C ABI, or an SH1..SH4 sfunc that may resolved via
+     a PLT.  */
   FUNCTION_ORDINARY,
+  /* A function that is a bit large to put it in every calling dso, but that's
+     typically used often enough so that calling via GOT makes sense for
+     speed.  */
+  SFUNC_FREQUENT,
   /* A special function that guarantees that some otherwise call-clobbered
      registers are not clobbered.  These can't go through the SH5 resolver,
      because it only saves argument passing registers.  */
@@ -115,6 +120,10 @@ extern void expand_sf_binop (rtx (*)(rtx
 extern void expand_df_unop (rtx (*)(rtx, rtx, rtx), rtx *);
 extern void expand_df_binop (rtx (*)(rtx, rtx, rtx, rtx), rtx *);
 extern void expand_fp_branch (rtx (*)(void), rtx (*)(void));
+extern void expand_sfunc_unop (enum machine_mode, rtx (*) (rtx, rtx),
+			       const char *, enum rtx_code code, rtx *);
+extern void expand_sfunc_binop (enum machine_mode, rtx (*) (rtx, rtx),
+				const char *, enum rtx_code code, rtx *);
 extern int sh_insn_length_adjustment (rtx);
 extern int sh_can_redirect_branch (rtx, rtx);
 extern void sh_expand_unop_v2sf (enum rtx_code, rtx, rtx);
@@ -132,6 +141,8 @@ extern struct rtx_def *get_fpscr_rtx (vo
 extern int sh_media_register_for_return (void);
 extern void sh_expand_prologue (void);
 extern void sh_expand_epilogue (bool);
+extern void sh_expand_float_cbranch (rtx operands[4]);
+extern void sh_expand_float_scc (rtx operands[4]);
 extern int sh_need_epilogue (void);
 extern void sh_set_return_address (rtx, rtx);
 extern int initial_elimination_offset (int, int);
@@ -176,6 +187,7 @@ struct secondary_reload_info;
 extern reg_class_t sh_secondary_reload (bool, rtx, reg_class_t,
 					enum machine_mode,
 					struct secondary_reload_info *);
+extern int sh_match_adjust (rtx, int);
 extern int sh2a_get_function_vector_number (rtx);
 extern int sh2a_is_function_vector_call (rtx);
 extern void sh_fix_range (const char *);
Index: gcc/config/sh/lib1funcs.asm
===================================================================
--- gcc/config/sh/lib1funcs.asm	(revision 162269)
+++ gcc/config/sh/lib1funcs.asm	(working copy)
@@ -3931,3 +3931,6 @@ GLOBAL(udiv_qrnnd_16):
 	ENDFUNC(GLOBAL(udiv_qrnnd_16))
 #endif /* !__SHMEDIA__ */
 #endif /* L_udiv_qrnnd_16 */
+
+#include "ieee-754-sf.S"
+#include "ieee-754-df.S"
Index: gcc/config/sh/t-sh
===================================================================
--- gcc/config/sh/t-sh	(revision 162269)
+++ gcc/config/sh/t-sh	(working copy)
@@ -25,30 +25,16 @@ sh-c.o: $(srcdir)/config/sh/sh-c.c \
 LIB1ASMSRC = sh/lib1funcs.asm
 LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movmem \
   _movmem_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr \
-  _div_table _udiv_qrnnd_16 \
+  _div_table _udiv_qrnnd_16 _unordsf2 _unorddf2 \
+  _nesf2 _nedf2 _gtsf2t _gtdf2t _gesf2f _gedf2f _extendsfdf2  _truncdfsf2 \
+  _add_sub_sf3 _mulsf3 _hypotf _muldf3 _add_sub_df3 _divsf3 _divdf3 \
+  _fixunssfsi _fixsfsi _fixunsdfsi _fixdfsi _floatunssisf _floatsisf \
+  _floatunssidf _floatsidf \
   $(LIB1ASMFUNCS_CACHE)
 LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
 
 TARGET_LIBGCC2_CFLAGS = -mieee
 
-# We want fine grained libraries, so use the new code to build the
-# floating point emulation libraries.
-FPBIT = fp-bit.c
-DPBIT = dp-bit.c
-
-dp-bit.c: $(srcdir)/config/fp-bit.c
-	echo '#ifdef __LITTLE_ENDIAN__' > dp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >>dp-bit.c
-	echo '#endif' 		>> dp-bit.c
-	cat $(srcdir)/config/fp-bit.c >> dp-bit.c
-
-fp-bit.c: $(srcdir)/config/fp-bit.c
-	echo '#define FLOAT' > fp-bit.c
-	echo '#ifdef __LITTLE_ENDIAN__' >> fp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >>fp-bit.c
-	echo '#endif' 		>> fp-bit.c
-	cat $(srcdir)/config/fp-bit.c >> fp-bit.c
-
 DEFAULT_ENDIAN = $(word 1,$(TM_ENDIAN_CONFIG))
 OTHER_ENDIAN = $(word 2,$(TM_ENDIAN_CONFIG))
 
@@ -120,7 +106,6 @@ $(T)crtn.o: $(srcdir)/config/sh/crtn.asm
 	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)crtn.o -x assembler-with-cpp $(srcdir)/config/sh/crtn.asm
 
 $(out_object_file): gt-sh.h
-gt-sh.h : s-gtype ; @true
 
 # These are not suitable for COFF.
 # EXTRA_MULTILIB_PARTS= crt1.o crti.o crtn.o crtbegin.o crtend.o
@@ -131,17 +116,17 @@ OPT_EXTRA_PARTS= libgcc-Os-4-200.a libgc
 EXTRA_MULTILIB_PARTS= $(IC_EXTRA_PARTS) $(OPT_EXTRA_PARTS)
 
 $(T)ic_invalidate_array_4-100.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4-100.a: $(T)ic_invalidate_array_4-100.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-100.a $(T)ic_invalidate_array_4-100.o
 
 $(T)ic_invalidate_array_4-200.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4-200.a: $(T)ic_invalidate_array_4-200.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-200.a $(T)ic_invalidate_array_4-200.o
 
 $(T)ic_invalidate_array_4a.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4a.a: $(T)ic_invalidate_array_4a.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4a.a $(T)ic_invalidate_array_4a.o
 
Index: gcc/config/sh/sh.opt
===================================================================
--- gcc/config/sh/sh.opt	(revision 162269)
+++ gcc/config/sh/sh.opt	(working copy)
@@ -21,7 +21,7 @@
 ;; Used for various architecture options.
 Mask(SH_E)
 
-;; Set if the default precision of th FPU is single.
+;; Set if the default precision of the FPU is single.
 Mask(FPU_SINGLE)
 
 ;; Set if we should generate code using type 2A insns.
Index: gcc/config/sh/ieee-754-df.S
===================================================================
--- gcc/config/sh/ieee-754-df.S	(revision 0)
+++ gcc/config/sh/ieee-754-df.S	(revision 0)
@@ -0,0 +1,791 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_DOUBLE__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Double-precision floating-point emulation.
+   We handle NANs, +-infinity, and +-zero.
+   However, we assume that for NANs, the topmost bit of the fraction is set.  */
+
+#ifdef __LITTLE_ENDIAN__
+#define DBL0L r4
+#define DBL0H r5
+#define DBL1L r6
+#define DBL1H r7
+#define DBLRL r0
+#define DBLRH r1
+#else
+#define DBL0L r5
+#define DBL0H r4
+#define DBL1L r7
+#define DBL1H r6
+#define DBLRL r1
+#define DBLRH r0
+#endif
+
+/* The SH[123] ABI returns floats in r0, -m4-single returns it in fr0.
+   To abstract from this, a function that returns the single-precision
+   float value in r0 should use as in-line epilogue:
+     RETURN_R0_MAIN
+     <delay-slot insn>
+     RETURN_FR0
+   and may branch to that epilogue with:
+     RETURN_R0
+     <delay-slot insn> */
+#ifdef __SH_FPU_ANY__
+#define RETURN_R0_MAIN
+#define RETURN_R0 bra LOCAL(return_r0)
+#define RETURN_FR0 \
+LOCAL(return_r0): \
+ lds r0,fpul; \
+ rts; \
+ fsts fpul,fr0
+#define ARG_TO_R4 \
+ flds fr4,fpul; \
+ sts fpul,r4
+#else /* ! __SH_FPU_ANY__ */
+#define RETURN_R0_MAIN rts
+#define RETURN_R0 rts
+#define RETURN_FR0
+#define ARG_TO_R4
+#endif /* ! __SH_FPU_ANY__ */
+
+#ifdef L_nedf2
+/* -ffinite-math-only -mb inline version, T := r4:DF == r6:DF
+	cmp/eq	r5,r7
+	mov	r4,r0
+	bf	0f
+	cmp/eq	r4,r6
+	bt	0f
+	or	r6,r0
+	add	r0,r0
+	or	r5,r0
+	tst	r0,r0
+	0:			*/
+	.balign 4
+	.global GLOBAL(nedf2)
+	HIDDEN_FUNC(GLOBAL(nedf2))
+GLOBAL(nedf2):
+	cmp/eq	DBL0L,DBL1L
+	mov.l   LOCAL(c_DF_NAN_MASK),r1
+	bf LOCAL(ne)
+	cmp/eq	DBL0H,DBL1H
+	not	DBL0H,r0
+	bt	LOCAL(check_nan)
+	mov	DBL0H,r0
+	or	DBL1H,r0
+	add	r0,r0
+	rts
+	or	DBL0L,r0
+LOCAL(check_nan):
+	tst	r1,r0
+	rts
+	movt	r0
+LOCAL(ne):
+	rts
+	mov #1,r0
+	.balign 4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(nedf2))
+#endif /* L_nedf2 */
+
+#ifdef L_unorddf2
+	.balign 4
+	.global GLOBAL(unorddf2)
+	HIDDEN_FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	not	DBL0H,r0
+	tst	r1,r0
+	not	r6,r0
+	bt	LOCAL(unord)
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
+#if defined(L_gtdf2t) || defined(L_gtdf2t_trap)
+#ifdef L_gtdf2t
+#define fun_label GLOBAL(gtdf2t)
+#else
+#define fun_label GLOBAL(gtdf2t_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater, the result true, unless
+	   any of them is a nan (but infinity is fine), or both values are
+	   +- zero.  Otherwise, the result false.  */
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	cmp/pz	DBL0H
+	not	DBL1H,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	DBL0H,r0
+	bt	LOCAL(nan) /* return zero if DBL1 is NAN.  */
+	cmp/eq	DBL1H,DBL0H
+	bt	LOCAL(cmp_low)
+	cmp/gt	DBL1H,DBL0H
+	or	DBL1H,r0
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/gt	DBL0H,r1)
+	add	r0,r0
+	bf	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	or	DBL0L,r0
+	rts
+	or	DBL1L,r0 /* non-zero unless both DBL0 and DBL1 are +-zero.  */
+LOCAL(cmp_low):
+	cmp/hi	DBL1L,DBL0L
+	rts
+	movt	r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan) /* return zero if DBL1 is NAN.  */
+	cmp/eq	DBL1H,DBL0H
+	SLC(bt,	LOCAL(neg_cmp_low),
+	 cmp/hi	DBL0L,DBL1L)
+	not	DBL0H,r0
+	tst	r1,r0
+	bt	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	cmp/hi	DBL0H,DBL1H
+	SLI(rts	!,)
+	SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+	SLI(cmp/hi	DBL0L,DBL1L)
+	rts
+	movt	r0
+LOCAL(check_nan):
+#ifdef L_gtdf2t
+LOCAL(nan):
+	rts
+	mov	#0,r0
+#else
+	SLI(cmp/gt DBL0H,r1)
+	bf	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	rts
+	mov	#0,r0
+LOCAL(nan):
+	mov	#0,r0
+	trapa	#0
+#endif
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(fun_label)
+#endif /* defined(L_gtdf2t) || defined(L_gtdf2t_trap) */
+
+#ifdef L_gedf2f
+	.balign 4
+	.global GLOBAL(gedf2f)
+	HIDDEN_FUNC(GLOBAL(gedf2f))
+GLOBAL(gedf2f):
+	/* If the raw values compare greater or equal, the result is
+	   true, unless any of them is a nan, or both are the
+	   same infinity.  If both are -+zero, the result is true;
+	   otherwise, it is false.
+	   We use 0 as true and nonzero as false for this function.  */
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	cmp/pz	DBL1H
+	not	DBL0H,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	DBL0H,r0
+	bt	LOCAL(nan)
+	cmp/eq	DBL0H,DBL1H
+	bt	LOCAL(cmp_low)
+	cmp/gt	DBL0H,DBL1H
+	or	DBL1H,r0
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/ge	r1,DBL1H)
+	add	r0,r0
+	bt	LOCAL(nan)
+	or	DBL0L,r0
+	rts
+	or	DBL1L,r0
+LOCAL(cmp_low):
+	cmp/hi	DBL0L,DBL1L
+#if defined(L_gedf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+	rts
+	movt	r0
+#if defined(L_gedf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,DBL1H)
+LOCAL(nan):
+	rts
+	movt	r0
+#elif defined(L_gedf2f_trap)
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,DBL1H)
+	bt	LOCAL(nan)
+	rts
+LOCAL(nan):
+	movt	r0
+	trapa	#0
+#endif /* L_gedf2f_trap */
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	cmp/eq	DBL0H,DBL1H
+	not	DBL1H,r0
+	SLC(bt,	LOCAL(neg_cmp_low),
+	 cmp/hi	DBL1L,DBL0L)
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	DBL1H,DBL0H
+	SLI(rts !,)
+	SLI(movt	r0 !,)
+LOCAL(neg_cmp_low):
+	SLI(cmp/hi	DBL1L,DBL0L)
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(gedf2f))
+#endif /* L_gedf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_extendsfdf2
+	.balign 4
+	.global GLOBAL(extendsfdf2)
+	FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+	ARG_TO_R4
+	mov.l	LOCAL(x7f800000),r3
+	mov	r4,DBLRL
+	tst	r3,r4
+	bt	LOCAL(zero_denorm)
+	mov.l	LOCAL(xe0000000),r2
+	rotr	DBLRL
+	rotr	DBLRL
+	rotr	DBLRL
+	and	r2,DBLRL
+	mov	r4,DBLRH
+	not	r4,r2
+	tst	r3,r2
+	mov.l	LOCAL(x38000000),r2
+	bf	0f
+	add	r2,r2	! infinity / NaN adjustment
+0:	shll	DBLRH
+	shlr2	DBLRH
+	shlr2	DBLRH
+	add	DBLRH,DBLRH
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+LOCAL(zero_denorm):
+	mov.l	r4,@-r15
+	add	r4,r4
+	tst	r4,r4
+	bt	LOCAL(zero)
+	mov.l	LOCAL(x00ff0000),r3
+	mov.w	LOCAL(x389),r2
+LOCAL(shift_byte):
+	tst	r3,r4
+	shll8	r4
+	SL(bt,	LOCAL(shift_byte),
+	 add	#-8,r2)
+LOCAL(shift_bit):
+	shll	r4
+	SL(bf,	LOCAL(shift_bit),
+	 add	#-1,r2)
+	mov	#0,DBLRL
+	mov	r4,DBLRH
+	mov.l	@r15+,r4
+	shlr8	DBLRH
+	shlr2	DBLRH
+	shlr	DBLRH
+	rotcr	DBLRL
+	cmp/gt	r4,DBLRH	! get sign
+	rotcr	DBLRH
+	rotcr	DBLRL
+	shll16	r2
+	shll8	r2
+	rts
+	add	r2,DBLRH
+LOCAL(zero):
+	mov.l	@r15+,DBLRH
+	rts
+	mov	#0,DBLRL
+LOCAL(x389):	.word 0x389
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(xe0000000):
+	.long	0xe0000000
+LOCAL(x00ff0000):
+	.long	0x00ff0000
+	ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+	.balign 4
+	.global GLOBAL(truncdfsf2)
+	FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+	mov.l	LOCAL(x38000000),r3	! exponent adjustment DF -> SF
+	mov	DBL0H,r1
+	mov.l	LOCAL(x70000000),r2	! mask for out-of-range exponent bits
+	mov	DBL0H,r0
+	mov.l	DBL0L,@-r15
+	sub	r3,r1
+	tst	r2,r1
+	shll8	r0			!
+	shll2	r0			! Isolate highpart fraction.
+	shll2	r0			!
+	bf	LOCAL(ill_exp)
+	shll2	r1
+	mov.l	LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits.  */
+	shll2	r1
+	mov.l	LOCAL(xff000000),r3
+	shlr8	r0
+	tst	r2,DBL0L /* Check if msb guard bit wants rounding up.  */
+	shlr16	DBL0L
+	shlr8	DBL0L
+	shlr2	DBL0L
+	SL1(bt,	LOCAL(add_frac),
+	 shlr2	DBL0L)
+	add	#1,DBL0L
+LOCAL(add_frac):
+	add	DBL0L,r0
+	mov.l	LOCAL(x01000000),r2
+	and	r3,r1
+	mov.l	@r15+,DBL0L
+	add	r1,r0
+	tst	r3,r0
+	bt	LOCAL(inf_denorm0)
+	cmp/hs	r3,r0
+LOCAL(denorm_noup_sh1):
+	bt	LOCAL(inf)
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	RETURN_R0_MAIN
+	rotcr	r0
+RETURN_FR0
+LOCAL(inf_denorm0):	!  We might need to undo previous rounding.
+	mov.l	LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits.  */
+	tst	r1,r1
+	bf	LOCAL(inf)
+	add	#-1,r0
+	tst	r3,DBL0L /* Check if msb guard bit was rounded up.  */
+	mov.l	LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits.  */
+	addc	r2,r0
+	shlr	r0
+	tst	r3,DBL0L /* Check if msb guard bit wants rounding up.  */
+#ifdef DELAYED_BRANCHES
+	bt/s	LOCAL(denorm_noup)
+#else
+	bt	LOCAL(denorm_noup_sh1)
+#endif
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	add	#1,r0
+LOCAL(denorm_noup):
+	RETURN_R0
+	rotcr	r0
+LOCAL(ill_exp):
+	div0s	DBL0H,r1
+	mov.l	LOCAL(x7ff80000),r2
+	add	r1,r1
+	bf	LOCAL(inf_nan)
+	mov.w	LOCAL(m32),r3 /* Handle denormal or zero.  */
+	shlr16	r1
+	exts.w	r1,r1
+	shll2	r1
+	add	r1,r1
+	shlr8	r1
+	exts.w	r1,r1
+	add	#-8,r1	/* Go from 9 to 1 guard bit in MSW.  */
+	cmp/gt	r3,r1
+	mov.l	@r15+,r3 /* DBL0L */
+	bf	LOCAL(zero)
+	mov.l	DBL0L, @-r15
+	shll8	DBL0L
+	rotcr	r0	/* Insert leading 1.  */
+	shlr16	r3
+	shll2	r3
+	add	r3,r3
+	shlr8	r3
+	cmp/pl	DBL0L	/* Check lower 23 guard bits if guard bit 23 is 0.  */
+	addc	r3,r0	/* Assemble fraction with compressed guard bits.  */
+	mov.l	@r15+,DBL0L
+	mov	#0,r2
+	neg	r1,r1
+LOCAL(denorm_loop):
+	shlr	r0
+	rotcl	r2
+	dt	r1
+	bf	LOCAL(denorm_loop)
+	tst	#2,r0
+	rotcl	r0
+	tst	r2,r2
+	rotcl	r0
+	xor	#3,r0
+	add	#3,r0	/* Even overflow gives the correct result.  */
+	shlr2	r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(zero):
+	mov	#0,r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(inf_nan):
+	not	DBL0H,r0
+	tst	r2,r0
+	mov.l	@r15+,DBL0L
+	bf	LOCAL(inf)
+	RETURN_R0
+	mov	#-1,r0	/* NAN */
+LOCAL(inf):	/* r2 must be positive here.  */
+	mov.l	LOCAL(xff000000),r0
+	div0s	r2,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(m32):
+	.word	-32
+	.balign	4
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(x70000000):
+	.long	0x70000000
+LOCAL(x2fffffff):
+	.long	0x2fffffff
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x5fffffff):
+	.long	0x5fffffff
+LOCAL(x7ff80000):
+	.long	0x7ff80000
+	ENDFUNC(GLOBAL(truncdfsf2))
+#endif /*  L_truncdfsf2 */
+#ifdef L_add_sub_df3
+#include "IEEE-754/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift.  Supporting SH1 / SH2 here would
+   make this code too hard to maintain, so if you want to add SH1 / SH2
+   support, do it in a separate copy.  */
+#ifdef DYN_SHIFT
+#ifdef L_extendsfdf2
+	.balign 4
+	.global GLOBAL(extendsfdf2)
+	FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+	ARG_TO_R4
+	mov.l	LOCAL(x7f800000),r2
+	mov	#29,r3
+	mov	r4,DBLRL
+	not	r4,DBLRH
+	tst	r2,r4
+	shld	r3,DBLRL
+	bt	LOCAL(zero_denorm)
+	mov	#-3,r3
+	tst	r2,DBLRH
+	mov	r4,DBLRH
+	mov.l	LOCAL(x38000000),r2
+	bt/s	LOCAL(inf_nan)
+	 shll	DBLRH
+	shld	r3,DBLRH
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+	.balign	4
+LOCAL(inf_nan):
+	shld	r3,DBLRH
+	add	r2,r2
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+LOCAL(zero_denorm):
+	mov.l	r4,@-r15
+	add	r4,r4
+	tst	r4,r4
+	extu.w	r4,r2
+	bt	LOCAL(zero)
+	cmp/eq	r4,r2
+	extu.b	r4,r1
+	bf/s	LOCAL(three_bytes)
+	 mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r4,r1
+	mov	#22,DBLRH
+	bt	LOCAL(one_byte)
+	shlr8	r2
+	mov	#14,DBLRH
+LOCAL(one_byte):
+#ifdef __pic__
+	add	r0,r2
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r2),r2
+	mov	#21,r3
+	mov.w	LOCAL(x0),DBLRL
+	sub	r2,DBLRH
+LOCAL(norm_shift):
+	shld	DBLRH,r4
+	mov.l	@r15+,r2
+	shld	r3,DBLRH
+	mov.l	LOCAL(xb7ffffff),r3
+	add	r4,DBLRH
+	cmp/pz	r2
+	mov	r2,r4
+	rotcr	DBLRH
+	rts
+	sub	r3,DBLRH
+LOCAL(three_bytes):
+	mov	r4,r2
+	shlr16	r2
+#ifdef __pic__
+	add	r0,r2
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r2),r2
+	mov	#21,r3
+	mov	#6-32,DBLRH
+	sub	r2,DBLRH
+	mov	r4,DBLRL
+	shld	DBLRH,DBLRL
+	bra	LOCAL(norm_shift)
+	add	#32,DBLRH
+LOCAL(zero):
+	rts	/* DBLRL has already been zeroed above.  */
+	mov.l @r15+,DBLRH
+LOCAL(x0):
+	.word 0
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(xb7ffffff):
+	/* Flip sign back, do exponent adjustment, and remove leading one.  */
+	.long 0x80000000 + 0x38000000 - 1
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+	ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+	.balign 4
+	.global GLOBAL(truncdfsf2)
+	FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+	mov.l	LOCAL(x38000000),r3
+	mov	DBL0H,r1
+	mov.l	LOCAL(x70000000),r2
+	mov	DBL0H,r0
+	sub	r3,r1
+	mov.l	DBL0L,@-r15
+	tst	r2,r1
+	mov	#12,r3
+	shld	r3,r0			! Isolate highpart fraction.
+	bf	LOCAL(ill_exp)
+	shll2	r1
+	mov.l	LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits.  */
+	shll2	r1
+	mov.l	LOCAL(xff000000),r3
+	shlr8	r0
+	tst	r2,DBL0L /* Check if msb guard bit wants rounding up.  */
+	mov	#-28,r2
+	bt/s	LOCAL(add_frac)
+	 shld	r2,DBL0L
+	add	#1,DBL0L
+LOCAL(add_frac):
+	add	DBL0L,r0
+	mov.l	LOCAL(x01000000),r2
+	and	r3,r1
+	mov.l	@r15+,DBL0L
+	add	r1,r0
+	tst	r3,r0
+	bt	LOCAL(inf_denorm0)
+#if 0	// No point checking overflow -> infinity if we dont't raise a signal.
+	cmp/hs	r3,r0
+	bt	LOCAL(inf)
+#endif
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	RETURN_R0_MAIN
+	rotcr	r0
+RETURN_FR0
+LOCAL(inf_denorm0):	! We might need to undo previous rounding.
+	mov.l	LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits.  */
+	tst	r1,r1
+	bf	LOCAL(inf)
+	add	#-1,r0
+	tst	r3,DBL0L /* Check if msb guard bit was rounded up.  */
+	mov.l	LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits.  */
+	addc	r2,r0
+	shlr	r0
+	tst	r3,DBL0L /* Check if msb guard bit wants rounding up.  */
+	bt/s	LOCAL(denorm_noup)
+	 div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	add	#1,r0
+LOCAL(denorm_noup):
+	RETURN_R0
+	rotcr	r0
+LOCAL(ill_exp):
+	div0s	DBL0H,r1
+	mov.l	LOCAL(x7ff80000),r2
+	add	r1,r1
+	bf	LOCAL(inf_nan)
+	mov.w	LOCAL(m32),r3 /* Handle denormal or zero.  */
+	mov	#-21,r2
+	shad	r2,r1
+	add	#-8,r1	/* Go from 9 to 1 guard bit in MSW.  */
+	cmp/gt	r3,r1
+	mov.l	@r15+,r3 /* DBL0L */
+	bf	LOCAL(zero)
+	mov.l	DBL0L, @-r15
+	shll8	DBL0L
+	rotcr	r0	/* Insert leading 1.  */
+	shld	r2,r3
+	cmp/pl	DBL0L	/* Check lower 23 guard bits if guard bit 23 is 0.  */
+	addc	r3,r0	/* Assemble fraction with compressed guard bits.  */
+	mov	r0,r2
+	shld	r1,r0
+	mov.l	@r15+,DBL0L
+	add	#32,r1
+	shld	r1,r2
+	tst	#2,r0
+	rotcl	r0
+	tst	r2,r2
+	rotcl	r0
+	xor	#3,r0
+	add	#3,r0	/* Even overflow gives the correct result.  */
+	shlr2	r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(zero):
+	mov	#0,r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(inf_nan):
+	not	DBL0H,r0
+	tst	r2,r0
+	mov.l	@r15+,DBL0L
+	bf	LOCAL(inf)
+	RETURN_R0
+	mov	#-1,r0	/* NAN */
+LOCAL(inf):	/* r2 must be positive here.  */
+	mov.l	LOCAL(xff000000),r0
+	div0s	r2,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(m32):
+	.word	-32
+	.balign	4
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(x70000000):
+	.long	0x70000000
+LOCAL(x2fffffff):
+	.long	0x2fffffff
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x5fffffff):
+	.long	0x5fffffff
+LOCAL(x7ff80000):
+	.long	0x7ff80000
+	ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+
+
+#ifdef L_add_sub_df3
+#include "IEEE-754/m3/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/m3/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/m3/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/m3/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/m3/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/m3/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/m3/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_DOUBLE__ */
Index: gcc/config/sh/predicates.md
===================================================================
--- gcc/config/sh/predicates.md	(revision 162269)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -719,6 +719,33 @@ (define_predicate "shift_operator"
 (define_predicate "symbol_ref_operand"
   (match_code "symbol_ref"))
 
+(define_special_predicate "soft_fp_comparison_operand"
+  (match_code "subreg,reg")
+{
+  switch (GET_MODE (op))
+    {
+    default:
+      return 0;
+    case CC_FP_NEmode: case CC_FP_GTmode: case CC_FP_UNLTmode:
+      break;
+    }
+  return register_operand (op, mode);
+})
+
+(define_predicate "soft_fp_comparison_operator"
+  (match_code "eq, unle, ge")
+{
+  switch (GET_CODE (op))
+    {
+    default:
+      return 0;
+    case EQ:  mode = CC_FP_NEmode;    break;
+    case UNLE:        mode = CC_FP_GTmode;    break;
+    case GE:  mode = CC_FP_UNLTmode;  break;
+    }
+  return register_operand (XEXP (op, 0), mode);
+})
+
 ;; Same as target_reg_operand, except that label_refs and symbol_refs
 ;; are accepted before reload.
 
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 162269)
+++ gcc/config/sh/sh.c	(working copy)
@@ -284,6 +284,7 @@ static int sh_arg_partial_bytes (CUMULAT
 			         tree, bool);
 static bool sh_scalar_mode_supported_p (enum machine_mode);
 static int sh_dwarf_calling_convention (const_tree);
+static void sh_expand_float_condop (rtx *operands, rtx, rtx (*[2]) (rtx));
 static void sh_encode_section_info (tree, rtx, int);
 static int sh2a_function_vector_p (tree);
 static void sh_trampoline_init (rtx, tree, rtx);
@@ -551,6 +552,9 @@ static const struct attribute_spec sh_at
 /* Machine-specific symbol_ref flags.  */
 #define SYMBOL_FLAG_FUNCVEC_FUNCTION    (SYMBOL_FLAG_MACH_DEP << 0)
 
+#undef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST sh_match_adjust
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 /* Implement TARGET_HANDLE_OPTION.  */
@@ -2180,6 +2184,72 @@ sh_emit_cheap_store_flag (enum machine_m
   return gen_rtx_fmt_ee (code, VOIDmode, target, const0_rtx);
 }
 
+static rtx
+sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+{
+  const char *name = NULL;
+  rtx (*fun) (rtx, rtx), addr, tmp, last, equiv;
+  int df = op_mode == DFmode;
+  enum machine_mode mode = CODE_FOR_nothing; /* shut up warning.  */
+
+  switch (code)
+    {
+    case EQ:
+      if (!flag_finite_math_only)
+	{
+	  name = df ? "__nedf2" : "__nesf2";
+	  fun = df ? gen_cmpnedf_i1 : gen_cmpnesf_i1;
+	  mode = CC_FP_NEmode;
+	  break;
+	} /* Fall through.  */
+    case UNEQ:
+      fun = gen_cmpuneq_sdf;
+      break;
+    case UNLE:
+      if (flag_finite_math_only && !df)
+	{
+	  fun = gen_cmplesf_i1_finite;
+	  break;
+	}
+      name = df ? "__gtdf2t" : "__gtsf2t";
+      fun = df ? gen_cmpgtdf_i1 : gen_cmpgtsf_i1;
+      mode = CC_FP_GTmode;
+      break;
+    case GE:
+      if (flag_finite_math_only && !df)
+	{
+	  tmp = op0; op0 = op1; op1 = tmp;
+	  fun = gen_cmplesf_i1_finite;
+	  break;
+	}
+      name = df ? "__gedf2f" : "__gesf2f";
+      fun = df ? gen_cmpunltdf_i1 : gen_cmpunltsf_i1;
+      mode = CC_FP_UNLTmode;
+      break;
+    case UNORDERED:
+      fun = gen_cmpun_sdf;
+      break;
+    default: gcc_unreachable ();
+    }
+
+  if (!name)
+    return fun (force_reg (op_mode, op0), force_reg (op_mode, op1));
+
+  tmp = gen_reg_rtx (mode);
+  addr = gen_reg_rtx (Pmode);
+  function_symbol (addr, name, SFUNC_STATIC);
+  emit_move_insn (gen_rtx_REG (op_mode, R4_REG), op0);
+  emit_move_insn (gen_rtx_REG (op_mode, R5_REG + df), op1);
+  last = emit_insn (fun (tmp, addr));
+  equiv = gen_rtx_fmt_ee (COMPARE, mode, op0, op1);
+  set_unique_reg_note (last, REG_EQUAL, equiv);
+  /* Use fpcmp_i1 rather than cmpeqsi_t, so that the optimizers can grok
+     the computation.  */
+  return gen_rtx_SET (VOIDmode,
+		      gen_rtx_REG (SImode, T_REG),
+		      gen_rtx_fmt_ee (code, SImode, tmp, CONST0_RTX (mode)));
+}
+
 /* Called from the md file, set up the operands of a compare instruction.  */
 
 void
@@ -8662,6 +8732,49 @@ sh_fix_range (const char *const_str)
       str = comma + 1;
     }
 }
+
+/* Expand an sfunc operation taking NARGS MODE arguments, using generator
+   function FUN, which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using EQUIV.  */
+static void
+expand_sfunc_op (int nargs, enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		 const char *name, rtx equiv, rtx *operands)
+{
+  int next_reg = FIRST_PARM_REG, i;
+  rtx addr, last, insn;
+
+  addr = gen_reg_rtx (Pmode);
+  function_symbol (addr, name, SFUNC_FREQUENT);
+  for ( i = 1; i <= nargs; i++)
+    {
+      insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+      next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+    }
+  last = emit_insn ((*fun) (operands[0], addr));
+  set_unique_reg_note (last, REG_EQUAL, equiv);
+}
+
+/* Expand an sfunc unary operation taking an MODE argument, using generator
+   function FUN, which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using CODE.  */
+void
+expand_sfunc_unop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		   const char *name, enum rtx_code code, rtx *operands)
+{
+  rtx equiv = gen_rtx_fmt_e (code, GET_MODE (operands[0]), operands[1]);
+  expand_sfunc_op (1, mode, fun, name, equiv, operands);
+}
+
+/* Expand an sfunc binary operation in MODE, using generator function FUN,
+   which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using CODE.  */
+void
+expand_sfunc_binop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		    const char *name, enum rtx_code code, rtx *operands)
+{
+  rtx equiv = gen_rtx_fmt_ee (code, mode, operands[1], operands[2]);
+  expand_sfunc_op (2, mode, fun, name, equiv, operands);
+}
 
 /* Insert any deferred function attributes from earlier pragmas.  */
 static void
@@ -11593,11 +11706,10 @@ function_symbol (rtx target, const char 
 {
   rtx sym;
 
-  /* If this is not an ordinary function, the name usually comes from a
-     string literal or an sprintf buffer.  Make sure we use the same
+  /* The name usually comes from a string literal or an sprintf buffer.
+     Make sure we use the same
      string consistently, so that cse will be able to unify address loads.  */
-  if (kind != FUNCTION_ORDINARY)
-    name = IDENTIFIER_POINTER (get_identifier (name));
+  name = IDENTIFIER_POINTER (get_identifier (name));
   sym = gen_rtx_SYMBOL_REF (Pmode, name);
   SYMBOL_REF_FLAGS (sym) = SYMBOL_FLAG_FUNCTION;
   if (flag_pic)
@@ -11605,6 +11717,10 @@ function_symbol (rtx target, const char 
       {
       case FUNCTION_ORDINARY:
 	break;
+      case SFUNC_FREQUENT:
+	if (!optimize || optimize_size)
+	  break;
+	/* Fall through.  */
       case SFUNC_GOT:
 	{
 	  rtx reg = target ? target : gen_reg_rtx (Pmode);
@@ -11715,6 +11831,168 @@ sh_expand_t_scc (rtx operands[])
   return 1;
 }
 
+void
+sh_expand_float_cbranch (rtx operands[4])
+{
+  static rtx (*branches[]) (rtx) = { gen_branch_true, gen_branch_false };
+
+  sh_expand_float_condop (operands, operands[3], branches);
+}
+
+void
+sh_expand_float_scc (rtx operands[4])
+{
+  static rtx (*movts[]) (rtx) = { gen_movt, gen_movnegt };
+
+  sh_expand_float_condop (&operands[1], operands[0], movts);
+}
+
+/* The first element of USER is for positive logic, the second one for
+   negative logic.  */
+static void
+sh_expand_float_condop (rtx *operands, rtx dest, rtx (*user[2]) (rtx))
+{
+  enum machine_mode mode = GET_MODE (operands[1]);
+  enum rtx_code comparison = GET_CODE (operands[0]);
+  int swap_operands = 0;
+  rtx op0, op1;
+  rtx lab = NULL_RTX;
+
+  if (TARGET_SH1_SOFTFP_MODE (mode))
+    {
+      switch (comparison)
+	{
+	case NE:
+	  comparison = EQ;
+	  user++;
+	  break;
+	case LT:
+	  swap_operands = 1;	/* Fall through.  */
+	case GT:
+	  comparison = UNLE;
+	  user++;
+	  break;
+	case UNGT:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLT:
+	  comparison = GE;
+	  user++;
+	  break;
+	case UNGE:
+	  swap_operands = 1;
+	  comparison = UNLE;
+	  break;
+	case LE:
+	  swap_operands = 1;
+	  comparison = GE;	/* Fall through.  */
+	case EQ:
+	case UNEQ:
+	case GE:
+	case UNLE:
+	case UNORDERED:
+	  break;
+	case LTGT:
+	  comparison = UNEQ;
+	  user++;
+	  break;
+	case ORDERED:
+	  comparison = UNORDERED;
+	  user++;
+	  break;
+	  
+	default: gcc_unreachable ();
+	}
+    }
+  else /* SH2E .. SH4 Hardware floating point */
+    {
+      switch (comparison)
+	{
+	case LTGT:
+	  if (!flag_finite_math_only)
+	    break;
+	  /* Fall through.  */
+	case NE:
+	  comparison = EQ;
+	  user++;
+	  break;
+	case LT:
+	  swap_operands = 1;
+	  comparison = GT;	/* Fall through.  */
+	case GT:
+	case EQ:
+	case ORDERED:
+	  break;
+	case LE:
+	  swap_operands = 1;
+	  comparison = GE;	/* Fall through.  */
+	case GE:
+	  if (flag_finite_math_only)
+	    {
+	      swap_operands ^= 1;
+	      comparison = GT;
+	      user++;
+	      break;
+	    }
+	  break;
+	case UNGT:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLT:
+	  if (flag_finite_math_only)
+	    {
+	      swap_operands ^= 1;
+	      comparison = GT;
+	      break;
+	    }
+	  comparison = GE;
+	  user++;
+	  break;
+	case UNGE:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLE:
+	  comparison = GT;
+	  user++;
+	  break;
+	case UNEQ:
+	  if (flag_finite_math_only)
+	    {
+	      comparison = EQ;
+	      break;
+	    }
+	  comparison = LTGT;
+	  user++;
+	  break;
+	case UNORDERED:
+	  comparison = ORDERED;
+	  user++;
+	  break;
+
+	default: gcc_unreachable ();
+	}
+      operands[1] = force_reg (mode, operands[1]);
+      operands[2] = force_reg (mode, operands[2]);
+      if (comparison == GE)
+	{
+	  lab = gen_label_rtx ();
+	  sh_emit_scc_to_t (GT, operands[1+swap_operands],
+			    operands[2-swap_operands]);
+	  emit_jump_insn (gen_branch_true (lab));
+	  comparison = EQ;
+	}
+    }
+  op0 = operands[1+swap_operands];
+  op1 = operands[2-swap_operands];
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT && TARGET_SH1_SOFTFP_MODE (mode))
+    emit_insn (sh_soft_fp_cmp (comparison, mode, op0, op1));
+  else
+    sh_emit_set_t_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, T_REG),
+				     gen_rtx_fmt_ee (comparison, SImode,
+						     op0, op1)),
+			mode);
+  if (lab)
+    emit_label (lab);
+  emit ((*user) (dest));
+}
+
 /* INSN is an sfunc; return the rtx that describes the address used.  */
 static rtx
 extract_sfunc_addr (rtx insn)
@@ -12266,6 +12544,19 @@ sh_secondary_reload (bool in_p, rtx x, r
   return NO_REGS;
 }
 
+int
+sh_match_adjust (rtx x, int regno)
+{
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.  */
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+      && regno < FIRST_PSEUDO_REGISTER)
+    regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+  return regno;
+}
+
 enum sh_divide_strategy_e sh_div_strategy = SH_DIV_STRATEGY_DEFAULT;
 
 #include "gt-sh.h"
Index: gcc/config/sh/sh.h
===================================================================
--- gcc/config/sh/sh.h	(revision 162269)
+++ gcc/config/sh/sh.h	(working copy)
@@ -183,6 +183,11 @@ do { \
 #define TARGET_FPU_DOUBLE \
   ((target_flags & MASK_SH4) != 0 || TARGET_SH2A_DOUBLE)
 
+#define TARGET_SH1_SOFTFP (TARGET_SH1 && !TARGET_FPU_DOUBLE)
+
+#define TARGET_SH1_SOFTFP_MODE(MODE) \
+  (TARGET_SH1_SOFTFP && (!TARGET_SH2E || (MODE) == DFmode))
+
 /* Nonzero if an FPU is available.  */
 #define TARGET_FPU_ANY (TARGET_SH2E || TARGET_FPU_DOUBLE)
 
@@ -329,6 +334,38 @@ do { \
 #define SUPPORT_ANY_SH5 \
   (SUPPORT_ANY_SH5_32MEDIA || SUPPORT_ANY_SH5_64MEDIA)
 
+/* Check if we have support for optimized software floating point using
+   dynamic shifts - then some function calls clobber fewer registers.  */
+#ifdef SUPPORT_SH3
+#define SUPPORT_SH3_OSFP 1
+#else
+#define SUPPORT_SH3_OSFP 0
+#endif
+
+#ifdef SUPPORT_SH3E
+#define SUPPORT_SH3E_OSFP 1
+#else
+#define SUPPORT_SH3E_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_NOFPU) || defined(SUPPORT_SH3_OSFP)
+#define SUPPORT_SH4_NOFPU_OSFP 1
+#else
+#define SUPPORT_SH4_NOFPU_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_SINGLE_ONLY) || defined (SUPPORT_SH3E_OSFP)
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 1
+#else
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 0
+#endif
+
+#define TARGET_OSFP (0 \
+ || (TARGET_SH3 && !TARGET_SH2E && SUPPORT_SH3_OSFP) \
+ || (TARGET_SH3E && SUPPORT_SH3E_OSFP) \
+ || (TARGET_HARD_SH4 && !TARGET_SH2E && SUPPORT_SH4_NOFPU_OSFP) \
+ || (TARGET_HARD_SH4 && TARGET_SH2E && SUPPORT_SH4_SINGLE_ONLY_OSFP))
+
 /* Reset all target-selection flags.  */
 #define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
 		   | MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
@@ -2047,6 +2084,12 @@ struct sh_args {
 #define LIBGCC2_DOUBLE_TYPE_SIZE 64
 #endif
 
+#if defined(__SH2E__) || defined(__SH3E__) || defined( __SH4_SINGLE_ONLY__)
+#define LIBGCC2_DOUBLE_TYPE_SIZE 32
+#else
+#define LIBGCC2_DOUBLE_TYPE_SIZE 64
+#endif
+
 /* 'char' is signed by default.  */
 #define DEFAULT_SIGNED_CHAR  1
 
Index: gcc/config/sh/sh-modes.def
===================================================================
--- gcc/config/sh/sh-modes.def	(revision 162269)
+++ gcc/config/sh/sh-modes.def	(working copy)
@@ -22,6 +22,11 @@ PARTIAL_INT_MODE (SI);
 /* PDI mode is used to represent a function address in a target register.  */
 PARTIAL_INT_MODE (DI);
 
+/* For software floating point comparisons.  */
+CC_MODE (CC_FP_NE);
+CC_MODE (CC_FP_GT);
+CC_MODE (CC_FP_UNLT);
+
 /* Vector modes.  */
 VECTOR_MODE  (INT, QI, 2);    /*                 V2QI */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
Index: gcc/config/sh/lib1funcs.h
===================================================================
--- gcc/config/sh/lib1funcs.h	(revision 162269)
+++ gcc/config/sh/lib1funcs.h	(working copy)
@@ -64,13 +64,151 @@ see the files COPYING3 and COPYING.RUNTI
 #endif /* !__LITTLE_ENDIAN__ */
 
 #ifdef __sh1__
+/* branch with two-argument delay slot insn */
 #define SL(branch, dest, in_slot, in_slot_arg2) \
 	in_slot, in_slot_arg2; branch dest
+/* branch with one-argument delay slot insn */
 #define SL1(branch, dest, in_slot) \
 	in_slot; branch dest
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+        branch dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg2) in_slot, in_slot_arg2
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+	branch .+6; bra .+6; cmp2, cmp2arg2; cmp1, cmp1arg2
+#define DMULU_SAVE \
+ mov.l r10,@-r15; \
+ mov.l r11,@-r15; \
+ mov.l r12,@-r15; \
+ mov.l r13,@-r15
+#define DMULUL(m1, m2, rl) \
+ swap.w m1,r12; \
+ mulu.w r12,m2; \
+ swap.w m2,r13; \
+ sts macl,r10; \
+ mulu.w r13,m1; \
+ clrt; \
+ sts macl,r11; \
+ mulu.w r12,r13; \
+ addc r11,r10; \
+ sts macl,r12; \
+ mulu.w m1,m2; \
+ movt r11; \
+ sts macl,rl; \
+ mov r10,r13; \
+ shll16 r13; \
+ addc r13,rl; \
+ xtrct r11,r10; \
+ addc r10,r12 \
+/* N.B. the carry is cleared here.  */
+#define DMULUH(rh) mov r12,rh
+#define DMULU_RESTORE \
+ mov.l @r15+,r13; \
+ mov.l @r15+,r12; \
+ mov.l @r15+,r11; \
+ mov.l @r15+,r10
 #else /* ! __sh1__ */
+/* branch with two-argument delay slot insn */
 #define SL(branch, dest, in_slot, in_slot_arg2) \
-	branch##.s dest; in_slot, in_slot_arg2
+	branch##/s dest; in_slot, in_slot_arg2
+/* branch with one-argument delay slot insn */
 #define SL1(branch, dest, in_slot) \
 	branch##/s dest; in_slot
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+        branch##/s dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg)
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+	branch##/s .+6; cmp1, cmp1arg2; cmp2, cmp2arg2
+#define DMULU_SAVE
+#define DMULUL(m1, m2, rl) dmulu.l m1,m2; sts macl,rl
+#define DMULUH(rh) sts mach,rh
+#define DMULU_RESTORE
 #endif /* !__sh1__ */
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+/* don't #define DYN_SHIFT */
+  #define SHLL4(REG)	\
+	shll2	REG;	\
+	shll2	REG
+
+  #define SHLR4(REG)	\
+	shlr2	REG;	\
+	shlr2	REG
+
+  #define SHLL6(REG)	\
+	shll2	REG;	\
+	shll2	REG;	\
+	shll2	REG
+
+  #define SHLR6(REG)	\
+	shlr2	REG;	\
+	shlr2	REG;	\
+	shlr2	REG
+
+  #define SHLL12(REG)	\
+	shll8	REG;	\
+	SHLL4 (REG)
+
+  #define SHLR12(REG)	\
+	shlr8	REG;	\
+	SHLR4 (REG)
+
+  #define SHLR19(REG)	\
+	shlr16	REG;	\
+	shlr2	REG;	\
+	shlr	REG
+
+  #define SHLL23(REG)	\
+	shll16	REG;	\
+	shlr	REG;	\
+	shll8	REG
+
+  #define SHLR24(REG)	\
+	shlr16	REG;	\
+	shlr8	REG
+
+  #define SHLR21(REG)	\
+	shlr16	REG;	\
+	shll2	REG;	\
+	add	REG,REG;\
+	shlr8	REG
+
+  #define SHLL21(REG)	\
+	shll16	REG;	\
+	SHLL4 (REG);	\
+	add	REG,REG
+
+  #define SHLR11(REG)	\
+	shlr8	REG;	\
+	shlr2	REG;	\
+	shlr	REG
+
+  #define SHLR22(REG)	\
+	shlr16	REG;	\
+	shll2	REG;	\
+	shlr8	REG
+
+  #define SHLR23(REG)	\
+	shlr16	REG;	\
+	add	REG,REG;\
+	shlr8	REG
+
+  #define SHLR20(REG)	\
+	shlr16	REG;	\
+	SHLR4 (REG)
+
+  #define SHLL20(REG)	\
+	shll16	REG;	\
+	SHLL4 (REG)
+#define SHLD_COUNT(N,COUNT)
+#define SHLRN(N,COUNT,REG) SHLR##N(REG)
+#define SHLLN(N,COUNT,REG) SHLL##N(REG)
+#else
+#define SHLD_COUNT(N,COUNT) mov #N,COUNT
+#define SHLRN(N,COUNT,REG) shld COUNT,REG
+#define SHLLN(N,COUNT,REG) shld COUNT,REG
+#define DYN_SHIFT 1
+#endif
Index: gcc/config/sh/IEEE-754/m3/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divsf3.S	(revision 0)
@@ -0,0 +1,360 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! divsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+! long 0th..3rd significant byte
+#ifdef __LITTLE_ENDIAN__
+#define L0SB	3
+#define L1SB	2
+#define L2SB	1
+#define L3SB	0
+#else
+#define L0SB	0
+#define L1SB	1
+#define L2SB	2
+#define L3SB	3
+#endif
+
+! clobbered: r0,r1,r2,r3,r6,r7,T (and for sh.md's purposes PR)
+!
+! Note: When the divisor is larger than the divident, we have to adjust the
+! exponent down by one.  We do this automatically when subtracting the entire
+! exponent/fraction bitstring as an integer, by means of the borrow from
+! bit 23 to bit 24.
+! Note: non-denormal rounding of a division result cannot cause fraction
+! overflow / exponent change. (r4 > r5 : fraction must stay in (2..1] interval;
+! r4 < r5: having an extra bit of precision available, even the smallest
+! possible difference of the result from one is rounded in all rounding modes
+! to a fraction smaller than one.)
+! sh4-200: 59 cycles
+! sh4-300: 44 cycles
+! tab indent: exponent / sign computations
+! tab+space indent: fraction computation
+FUNC(GLOBAL(divsf3))
+	.global GLOBAL(divsf3)
+	.balign	4
+GLOBAL(divsf3):
+	mov.l	LOCAL(x7f800000),r3
+	mov	#1,r2
+	mov	r4,r6
+	 shll8	 r6
+	mov	r5,r7
+	 shll8	 r7
+	rotr	r2
+	tst	r3,r4
+	or	r2,r6
+	bt/s	LOCAL(denorm_arg0)
+	or	r2,r7
+	tst	r3,r5
+	bt	LOCAL(denorm_arg1)
+	 shlr	 r6
+	mov.l	LOCAL(x3f000000),r3	! bias minus explict leading 1
+	 div0u
+LOCAL(denorm_done):
+	 div1	 r7,r6
+	mov.l	r8,@-r15
+	 bt	 0f
+	 div1	r7,r6
+0:	mov.l	r9,@-r15
+	 div1	 r7,r6
+	add	r4,r3
+	 div1	 r7,r6
+	sub	r5,r3	! result sign/exponent minus 1 if no overflow/underflow
+	 div1	 r7,r6
+	or	r3,r2
+	 div1	 r7,r6
+	mov.w	LOCAL(xff00),r9
+	 div1	 r7,r6
+	mov.l	r2,@-r15 ! L0SB is 0xff iff denorm / infinity exp is computed
+	 div1	 r7,r6
+	mov.w	LOCAL(m23),r2
+	 div1	 r7,r6
+	mov	r4,r0
+	 div1	 r7,r6
+	 extu.b	 r6,r1
+	 and	 r9,r6
+	 swap.w	 r1,r1	! first 8 bits of result fraction in bit 23..16
+	 div1	 r7,r6
+	shld	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L3SB,r15)	! 0xff iff divident was infinity / nan
+	 div1	 r7,r6
+	mov	r5,r0
+	 div1	 r7,r6
+	shld	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L2SB,r15)	! 0xff iff divisor was infinity / nan
+	 div1	 r7,r6
+	mov	r4,r0
+	 div1	 r7,r6
+	mov.w	LOCAL(m31),r2
+	 div1	 r7,r6
+	 extu.b	 r6,r8	! second 8 bits of result fraction in bit 7..0
+	 and	 r9,r6
+	mov.l	LOCAL(xff800000),r9
+	 div1	 r7,r6
+	xor	r5,r0	! msb := correct result sign
+	 div1	 r7,r6
+	xor	r3,r0	! xor with sign of result sign/exponent word
+	 div1	 r7,r6
+	shad	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L1SB,r15)	! 0xff	iff exponent over/underflows
+	and	r9,r3	! isolate sign / exponent
+	 mov.w	 LOCAL(xff01),r2
+	 div1	 r7,r6
+	swap.b	r8,r0	! second 8 bits of result fraction in bit 15..8
+	 div1	 r7,r6
+	or	r1,r0	! first 16 bits of result fraction in bit 23..8
+	 div1	 r7,r6
+	mov.w	LOCAL(m1),r9
+	 div1	 r7,r6
+	mov.l	@r15+,r8 ! load encoding of unusal exponent conditions
+	 and	 r6,r2	! rest | result lsb
+	 mov	 #0,r1
+	 bf	 0f	! bit below lsb clear -> no rounding
+	 cmp/hi	r1,r2
+0:	 extu.b	 r6,r1
+	 or	 r1,r0	! 24 bit result fraction with explicit leading 1
+	addc	r3,r0	! add in exponent / sign
+	cmp/str	r9,r8
+	! (no stall *here* for SH4-100 / SH4-200)
+	bt/s	LOCAL(inf_nan_denorm_zero)
+	mov.l	@r15+,r9
+	rts
+	mov.l	@r15+,r8
+
+/* The exponennt adjustment for denormal numbers is done by leaving an
+   adjusted value in r3; r4/r5 are not changed.  */
+	.balign	4
+LOCAL(denorm_arg0):
+	mov.w	LOCAL(xff00),r1
+	sub	r2,r6	! 0x800000000 : remove implict 1
+	tst	r6,r6
+	sts.l	pr,@-r15
+	bt	LOCAL(div_zero)
+	bsr	LOCAL(clz)
+	mov	r6,r0
+	shld	r0,r6
+	tst	r3,r5
+	mov.l	LOCAL(x3f800000),r3	! bias - 1 + 1
+	mov	#23,r1
+	shld	r1,r0
+	bt/s	LOCAL(denorm_arg1_2)
+	sub	r0,r3
+	 shlr	 r6
+	bra	LOCAL(denorm_done)
+	 div0u
+
+LOCAL(denorm_arg1):
+	mov.l	LOCAL(x3f000000),r3	! bias - 1
+LOCAL(denorm_arg1_2):
+	sub	r2,r7	! 0x800000000 : remove implict 1
+	mov.w	LOCAL(xff00),r1
+	tst	r7,r7
+	sts.l	pr,@-r15
+	bt	LOCAL(div_by_zero)
+	bsr	LOCAL(clz)
+	mov	r7,r0
+	shld	r0,r7
+	add	#-1,r0
+	mov	#23,r1
+	shld	r1,r0
+	add	r0,r3
+	 shlr	 r6
+	bra	LOCAL(denorm_done)
+	 div0u
+
+	.balign	4
+LOCAL(inf_nan_denorm_zero):
+! r0 has the rounded result, r6 has the non-rounded lowest bits & rest.
+! the bit just below the LSB of r6 is available as ~Q
+
+! Alternative way to get at ~Q:
+! if rounding took place, ~Q must be set.
+! if the rest appears to be zero, ~Q must be set.
+! if the rest appears to be nonzero, but rounding didn't take place,
+! ~Q must be clear;  the apparent rest will then require adjusting to test if 
+! the actual rest is nonzero.
+	mov	r0,r2
+	not	r8,r0
+	tst	#0xff,r0
+	shlr8	r0
+	mov.l	@r15+,r8
+	bt/s	LOCAL(div_inf_or_nan)
+	tst	#0xff,r0
+	mov	r4,r0
+	bt	LOCAL(div_by_inf_or_nan)
+	add	r0,r0
+	mov	r5,r1
+	add	r1,r1
+	cmp/hi	r1,r0
+	mov	r6,r0
+	bt	LOCAL(overflow)
+	sub	r2,r0
+	exts.b	r0,r0	! -1 if rounding took place
+	shlr8	r6	! isolate div1-mangled rest
+	addc	r2,r0	! generate carry if rounding took place
+	shlr8	r7
+	sub	r3,r0	! pre-rounding fraction
+	bt	0f ! going directly to denorm_sticky would cause mispredicts
+	tst	r6,r6	! rest can only be zero if lost bit was set
+0:	add	r7,r6	! (T ? corrupt : reconstruct) actual rest
+	bt	0f
+	cmp/pl	r6
+0:	mov.w	LOCAL(m24),r1
+	addc	r0,r0	! put in sticky bit
+	add	#-1,r3
+	mov.l	LOCAL(x40000000),r6
+	add	r3,r3
+	mov	r0,r2
+	shad	r1,r3	! exponent ; s32.0
+	!
+	shld	r3,r0
+	add	#30,r3
+	cmp/pl	r3
+	shld	r3,r2
+	bf	LOCAL(zero_nan)	! return zero
+	rotl	r2
+	cmp/hi	r6,r2
+	mov	#0,r7
+	addc	r7,r0
+	div0s	r4,r5
+	rts
+	rotcr	r0
+	
+! ????
+! undo normal rounding (lowest bits still in r6). then do denormal rounding.
+	
+LOCAL(overflow):
+	mov.l	LOCAL(xff000000),r0
+	div0s	r4,r5
+	rts
+	rotcl	r0
+	
+LOCAL(div_inf_or_nan):
+	mov	r4,r0
+	bra	LOCAL(nan_if_t)
+	add	r0,r0
+	
+LOCAL(div_by_inf_or_nan):
+	mov.l	LOCAL(xff000000),r1
+	mov	#0,r0
+	mov	r5,r2
+	add	r2,r2
+	bra	LOCAL(nan_if_t)
+	cmp/hi	r1,r2
+
+
+
+! still need to check for divide by zero or divide by nan
+! r3: 0x7f800000
+	.balign	4
+LOCAL(div_zero):
+	mov	r5,r1
+	add	r1,r1
+	tst	r1,r1	! 0 / 0 -> nan
+	not	r5,r1
+	bt	LOCAL(nan)
+	add	r3,r3
+	cmp/hi	r3,r1	! 0 / nan -> nan (but 0 / inf -> 0)
+LOCAL(zero_nan):
+	mov	#0,r0
+LOCAL(nan_if_t):
+	bf	0f:
+LOCAL(nan):
+	mov	#-1,r0
+0:	div0s	r4,r5	! compute sign
+	rts
+	rotcr	r0	! insert sign
+
+LOCAL(div_by_zero):
+	mov.l	LOCAL(xff000000),r0
+	mov	r5,r2
+	add	r2,r2
+	bra	LOCAL(nan_if_t)
+	cmp/hi	r0,r2
+	
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#8-8,r9
+	xtrct	r0,r8
+	add	#16,r9
+0:	tst	r1,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(m23):	.word -23
+LOCAL(m24):	.word -24
+LOCAL(m31):	.word -31
+LOCAL(xff01):	.word 0xff01
+	.balign	4
+LOCAL(xff000000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(xff00):	.word 0xff00
+LOCAL(m1):	.word -1
+#else
+LOCAL(m1):	.word -1
+LOCAL(xff00):	.word 0xff00
+#endif
+LOCAL(x7f800000): .long 0x7f800000
+LOCAL(x3f000000): .long 0x3f000000
+LOCAL(x3f800000): .long 0x3f800000
+LOCAL(xff800000): .long 0xff800000
+LOCAL(x40000000): .long 0x40000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(divsf3))
Index: gcc/config/sh/IEEE-754/m3/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3.S	(revision 0)
@@ -0,0 +1,603 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* y = 1/x  ; x (- [1,2)
+   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+   y1 = y0 - ((y0) * x - 1) * y0  =  y-x*d^2
+   y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+   z0 = y2*a ;  a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+   z1 = y2*a1 (round to nearest odd 0.5 ulp);
+   a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+   z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+   with suitable scaling and/or top truncation.
+   We use a slightly modified algorithm here that checks if the lower
+   bits in z1 are sufficient to determine the outcome of rounding - in that
+   case a2 is not computed.
+   -z1 is computed in units of 1/128 ulp, with an error in the range
+   -0x3.e/128 .. +0 ulp.
+   Thus, after adding three, the result can be safely rounded for normal
+   numbers if any of the bits 5..2 is set, or if the highest guard bit
+   (bit 6 if y <1, otherwise bit 7) is set.
+   (Because of the way truncation works, we would be fine for an open
+    error interval of (-4/128..+1/128) ulp )
+   For denormal numbers, the rounding point lies higher, but it would be
+   quite cumbersome to calculate where exactly; it is sufficient if any
+   of the bits 7..3 is set.
+   x truncated to 20 bits is sufficient to calculate y0 or even y1.
+   Table entries are adjusted by about +128 to use full signed byte range.
+   This adjustment has been perturbed slightly to allow cse with the
+   shift count constant -26.
+   The threshold point for the shift adjust before rounding is found by
+   comparing the fractions, which is exact, unlike the top bit of y2.
+   Therefore, the top bit of y2 becomes slightly random after the adjustment
+   shift, but that's OK because this can happen only at the boundaries of
+   the interval, and the biasing of the error means that it can in fact happen
+   only at the bottom end.  And there, the carry propagation will make sure
+   that in the end we will have in effect an implicit 1 (or two whem rounding
+   up...)  */
+/* If an exact result exists, it can have no more bits than the divident.
+   Hence, we don't need to bother with the round-to-even tie breaker
+   unless the result is denormalized.  */
+/* 64 cycles through main path for sh4-300 (about 93.7% of normalized numbers),
+   82 for the path for rounding tie-breaking for normalized numbers
+   (including one branch mispredict).
+   Some cycles might be saved by more careful register allocation.  */
+
+#define x_h r12
+#define yn  r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too.  We still have to come back to denorm_arg1_done,
+   since we heven't done any of the work yet that we do till the denorm_arg0
+   entry point.  We know that neither of the arguments is inf/nan, but
+   arg0 might be zero.  Check for that first to avoid having to establish an
+   rts return address.  */
+LOCAL(both_denorm):
+	mov.l	r9,@-r15
+	mov	DBL0H,r1
+	mov.l	r0,@-r15
+	shll2	r1
+	mov.w LOCAL(both_denorm_cleanup_off),r9
+	or	DBL0L,r1
+	tst	r1,r1
+	mov	DBL0H,r0
+	bf/s	LOCAL(zero_denorm_arg0_1)
+	shll2	r0
+	mov.l	@(4,r15),r9
+	add	#8,r15
+	bra	LOCAL(ret_inf_nan_0)
+	mov	r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+	mov.l	@r15+,r0
+	!
+	mov.l	@r15+,r9
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+	bra	LOCAL(denorm_arg1_done)
+	!
+	add	r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+   (implicit 1).  To leave the result exponent unaltered, the other
+   argument's exponent is adjusted by the the shift count.  */
+
+	.balign 4
+LOCAL(arg0_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL0L,r0
+	shll	DBL0H
+	add	#1,r0
+	mov	DBL0L,DBL0H
+	shld	r0,DBL0H
+	rotcr	DBL0H
+	tst	DBL0L,DBL0L	/* Check for divide of zero.  */
+	add	#-33,r0
+	shld	r0,DBL0L
+	bf/s	LOCAL(adjust_arg1_exp)
+	add	#64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign.  */
+	mov.l	@r15+,r10
+	mov	#0,DBLRH
+	mov.l	@r15+,r9
+	bra	LOCAL(ret_inf_nan_0)
+	mov.l	@r15+,r8
+
+	.balign 4
+LOCAL(arg1_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL1L,r0
+	shll	DBL1H
+	add	#1,r0
+	mov	DBL1L,DBL1H
+	shld	r0,DBL1H
+	rotcr	DBL1H
+	tst	DBL1L,DBL1L	/* Check for divide by zero.  */
+	add	#-33,r0
+	shld	r0,DBL1L
+	bf/s	LOCAL(adjust_arg0_exp)
+	add	#64,r0
+	mov	DBL0H,r0
+	add	r0,r0
+	tst	r0,r0	! 0 / 0 ?
+	mov	#-1,DBLRH
+	bf	LOCAL(return_inf)
+	!
+	bt	LOCAL(ret_inf_nan_0)
+	!
+
+	.balign 4
+LOCAL(zero_denorm_arg1):
+	not	DBL0H,r3
+	mov	DBL1H,r0
+	tst	r2,r3
+	shll2	r0
+	bt	LOCAL(early_inf_nan_arg0)
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg1_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	!
+	shll	DBL1H
+	mov	DBL1L,r3
+	shld	r0,DBL1H
+	shld	r0,DBL1L
+	rotcr	DBL1H
+	add	#-32,r0
+	shld	r0,r3
+	add	#32,r0
+	or	r3,DBL1H
+LOCAL(adjust_arg0_exp):
+	tst	r2,DBL0H
+	mov	#20,r3
+	shld	r3,r0
+	bt	LOCAL(both_denorm)
+	add	DBL0H,r0
+	div0s	r0,DBL0H	! Check for obvious overflow.  */
+	not	r0,r3		! Check for more subtle overflow - lest
+	bt	LOCAL(return_inf)
+	mov	r0,DBL0H
+	tst	r2,r3		! we mistake it for NaN later
+	mov	#12,r3
+	bf	LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign.  */
+	mov	#20,r3
+	mov	#-2,DBLRH
+	bra	LOCAL(ret_inf_nan_0)
+	shad	r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan  nan/x -> nan */
+LOCAL(inf_nan_arg0):
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+LOCAL(early_inf_nan_arg0):
+	not	DBL1H,r3
+	mov	DBL0H,DBLRH
+	tst	r2,r3	! both inf/nan?
+	add	DBLRH,DBLRH
+	bf	LOCAL(ret_inf_nan_0)
+	mov	#-1,DBLRH
+LOCAL(ret_inf_nan_0):
+	mov	#0,DBLRL
+	mov.l	@r15+,r12
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+/* Already handled: inf/x, nan/x .  Thus: x/inf -> 0; x/nan -> nan */
+	.balign	4
+LOCAL(inf_nan_arg1):
+	mov	DBL1H,r2
+	mov	#12,r1
+	shld	r1,r2
+	mov.l	@r15+,r10
+	mov	#0,DBLRL
+	mov.l	@r15+,r9
+	or	DBL1L,r2
+	mov.l	@r15+,r8
+	cmp/hi	DBLRL,r2
+	mov.l	@r15+,r12
+	subc	DBLRH,DBLRH
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+	.balign 4
+LOCAL(zero_denorm_arg0):
+	mov.w	LOCAL(denorm_arg0_done_off),r9
+	not	DBL1H,r1
+	mov	DBL0H,r0
+	tst	r2,r1
+	shll2	r0
+	bt	LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg0_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	shll	DBL0H
+	mov	DBL0L,r12
+	shld	r0,DBL0H
+	shld	r0,DBL0L
+	rotcr	DBL0H
+	add	#-32,r0
+	shld	r0,r12
+	add	#32,r0
+	or	r12,DBL0H
+LOCAL(adjust_arg1_exp):
+	mov	#20,r12
+	shld	r12,r0
+	add	DBL1H,r0
+	div0s	r0,DBL1H	! Check for obvious underflow.  */
+	not	r0,r12		! Check for more subtle underflow - lest
+	bt	LOCAL(return_0)
+	mov	r0,DBL1H
+	tst	r2,r12		! we mistake it for NaN later
+	bt	LOCAL(return_0)
+	!
+	braf	r9
+	mov	#13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00):	.word 0xff00
+LOCAL(denorm_arg0_done_off):
+	.word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+	.word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign	8
+GLOBAL(divdf3):
+ mov.l	LOCAL(x7ff00000),r2
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst	r2,DBL1H
+ mov.l	r12,@-r15
+ bt	LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov	DBL1H,x_h	! x_h live in r12
+ shld	r3,x_h	! x - 1 ; u0.20
+ mov	x_h,yn
+ mova	LOCAL(ytab),r0
+ mov.l	r8,@-r15
+ shld	r1,yn	! x-1 ; u26.6
+ mov.b	@(r0,yn),yn
+ mov	#6,r0
+ mov.l	r9,@-r15
+ mov	x_h,r8
+ mov.l	r10,@-r15
+ shlr16	x_h	! x - 1; u16.16	! x/2 - 0.5 ; u15.17
+ add	x_h,r1	! SH4-200 single-issues this insn
+ shld	r0,yn
+ sub	r1,yn	! yn := y0 ; u15.17
+ mov	DBL1L,r1
+ mov	#-20,r10
+ mul.l	yn,x_h	! r12 dead
+ swap.w	yn,r9
+ shld	r10,r1
+ sts	macl,r0	! y0 * (x-1) - n ; u-1.32
+ add	r9,r0	! y0 * x - 1     ; s-1.32
+ tst	r2,DBL0H
+ dmuls.l r0,yn
+ mov.w	LOCAL(d13),r0
+ or	r1,r8	! x  - 1; u0.32
+ add	yn,yn	! yn = y0 ; u14.18
+ bt	LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):
+ sts	mach,r1	!      d0 ; s14.18
+ sub	r1,yn	! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov	DBL0L,r12
+ shld	r0,yn	! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w	LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld	r10,r12
+ mov	yn,r0
+ mov	DBL0H,r8
+ add	yn,yn	! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts	mach,r1	! y1 * (x-1); u1.31
+ add	r0,r1	! y1 * x    ; u1.31
+ dmulu.l yn,r1
+ not	DBL0H,r10
+ shld	r9,r8
+ tst	r2,r10
+ or	r8,r12	! a - 1; u0.32
+ bt	LOCAL(inf_nan_arg0)
+ sts	mach,r1	! d1+yn; u1.31
+ sett		! adjust y2 so that it can be interpreted as s1.31
+ not	DBL1H,r10
+ subc	r1,yn	! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l	LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst	r2,r10
+ or	DBL1H,r2
+ bt	LOCAL(inf_nan_arg1)
+ mov.l	r11,@-r15
+ sts	mach,r12	! y2*(a-1) ; u1.31
+ add	yn,r12		! z0       ; u1.31
+ dmulu.l r12,DBL1L
+ mov.l	LOCAL(x40000000),DBLRH ! bias + 1
+ and	r9,r2		! x ; u12.20
+ cmp/hi	DBL0L,DBL1L
+ sts	macl,r8
+ mov	#-24,r11
+ sts	mach,r9 	! r9:r8 := z0 * DBL1L; u-19.64
+ subc	DBL1H,DBLRH
+ mul.l	r12,r2  	! (r9+macl):r8 == z0*x; u-19.64
+ shll	r8
+ add	DBL0H,DBLRH	! result sign/exponent + 1
+ mov	r8,r10
+ sts	macl,DBLRL
+ add	DBLRL,r9
+ rotcl	r9		! r9:r8 := z*x; u-20.63
+ shld	r11,r10
+ mov.l	LOCAL(x7fe00000),DBLRL
+ sub	DBL0L,r9	! r9:r8 := -a ; u-20.63
+ cmp/pz	r9		! In corner cases this shift can loose ..
+ shll8	r9		!  .. the sign, so check it first.
+ mov.l	LOCAL(x00200000),r11
+ or	r10,r9	! -a1 ; s-28.32
+ mov.l	LOCAL(x00100000),r10
+ dmulu.l r9,yn	! sign for r9 is in T
+ xor	DBL0H,DBL1H	! calculate expected sign & bit20
+ mov.w	LOCAL(d120),DBL0H ! to test bits 6..4
+ xor	DBLRH,DBL1H
+ !
+ sts	mach,DBL0L	! -z1 ; s-27.32
+ bt 0f
+ sub	yn,DBL0L	! multiply adjust for -a1 negative; r3 dies here
+0:tst	r10,DBL1H		! set T if a >= x
+ mov.l LOCAL(xfff00000),r3
+ bt	0f
+ add	DBL0L,DBL0L	! z1 ; s-27.32 / s-28.32
+0:bt 0f
+ add	r12,r12	! z0 ; u1.31 / u0.31
+0:add	#6-64,DBL0L
+ and	r3,DBLRH	! isolate sign / exponent
+ tst	DBL0H,DBL0L
+ bf/s	LOCAL(exact)	! make the hot path taken for best branch prediction
+ cmp/pz	DBL1H
+
+! Unless we follow the next branch, we need to test which way the rounding
+! should go.
+! For normal numbers, we know that the result is not exact, so the sign
+! of the rest will be conclusive.
+! We generate a number that looks safely rounded so that denorm handling
+! can safely test the number twice.
+! r10:r8 == 0 will indicate if the number was exact, which can happen
+! when we come here for denormals to check a number that is close or
+! equal to a result in whole ulps.
+ bf	LOCAL(ret_denorm_inf)	! denorm or infinity, DBLRH has inverted sign
+ add	#64,DBL0L
+LOCAL(find_adjust): tst	r10,DBL1H ! set T if a >= x
+ mov	#-2,r10
+ addc	r10,r10
+ mov	DBL0L,DBLRL	! z1 ; s-27.32 / s-28.32 ; lower 4 bits unsafe.
+ shad	r10,DBLRL	! tentatively rounded z1 ; s-24.32
+ shll8	r8		! r9:r8 := -a1 ; s-28.64
+ clrt
+ dmuls.l DBLRL,DBL1L	! DBLRL signed, DBL1L unsigned
+ mov	r8,r10
+ shll16	r8		! r8  := lowpart  of -a1 ; s-44.48
+ xtrct	r9,r10		! r10 := highpart of -a1 ; s-44.48
+ !
+ sts	macl,r3
+ subc	r3,r8
+ sts	mach,r3
+ subc	r3,r10
+ cmp/pz	DBL1L
+ mul.l	DBLRL,r2
+ bt	0f
+ sub	DBLRL,r10	! adjust for signed/unsigned multiply
+0: mov.l	LOCAL(x7fe00000),DBLRL
+ mov	#-26,r2
+ sts	macl,r9
+ sub	r9,r10		! r10:r8 := -a2
+ add	#-64+16,DBL0L	! the denorm code negates this adj. for exact results
+ shld	r2,r10		! convert sign into adjustment in the range 32..63
+ sub	r10,DBL0L
+ cmp/pz	DBL1H
+
+ .balign 4
+LOCAL(exact):
+ bf	LOCAL(ret_denorm_inf)	! denorm or infinity, DBLRH has inverted sign
+ tst	DBLRL,DBLRH
+ bt	LOCAL(ret_denorm_inf)	! denorm, DBLRH has correct sign
+ mov	#-7,DBL1H
+ cmp/pz	DBL0L		! T is sign extension of z1
+ not	DBL0L,DBLRL
+ subc	r11,DBLRH	! calculate sign / exponent minus implicit 1 minus T
+ mov.l	@r15+,r11
+ mov.l	@r15+,r10
+ shad	DBL1H,DBLRL
+ mov.l	@r15+,r9
+ mov	#-11,DBL1H
+ mov	r12,r8		! z0 contributes to DBLRH and DBLRL
+ shld	DBL1H,r12
+ mov	#21,DBL1H
+ clrt
+ shld	DBL1H,r8
+ addc	r8,DBLRL
+ mov.l	@r15+,r8
+ addc	r12,DBLRH
+ rts
+ mov.l	@r15+,r12
+
+!	sign in DBLRH ^ DBL1H
+! If the last 7 bits are in the range 64..64+7, we might have an exact
+! value in the preceding bits - or we might not. For denorms, we need to
+! find out.
+! if r10:r8 is zero, we just have found out that there is an exact value.
+	.balign	4
+LOCAL(ret_denorm_inf):
+	mov	DBLRH,r3
+	add	r3,r3
+	div0s	DBL1H,r3
+	mov	#120,DBLRL
+	bt	LOCAL(ret_inf_late)
+	add	#64,DBL0L
+	tst	DBLRL,DBL0L
+	mov	#-21,DBLRL
+	bt	LOCAL(find_adjust)
+	or	r10,r8
+	tst	r8,r8		! check if find_adjust found an exact value.
+	shad	DBLRL,r3
+	bf	0f
+	add	#-16,DBL0L	! if yes, cancel adjustment
+0:	mov	#-8,DBLRL	! remove the three lowest (inexact) bits
+	and	DBLRL,DBL0L
+	add	#-2-11,r3	! shift count for denorm generation
+	mov	DBL0L,DBLRL
+	mov	#28,r2
+	mov.l	@r15+,r11
+	mov.l	@r15+,r10
+	shll2	DBLRL
+	mov.l	@r15+,r9
+	shld	r2,DBL0L
+	mov.l	@r15+,r8
+	mov	#-31,r2
+	cmp/ge	r2,r3
+	shll2	DBLRL
+	bt/s	0f
+	add	DBL0L,r12	! fraction in r12:DBLRL ; u1.63
+	negc	DBLRL,DBLRL	! T := DBLRL != 0
+	add	#31,r3
+	mov	r12,DBLRL
+	rotcl	DBLRL		! put in sticky bit
+	movt	r12
+	cmp/ge	r2,r3
+	bt/s	LOCAL(return_0_late)
+0:	div0s	DBL1H,DBLRH	! calculate sign
+	mov	r12,DBLRH
+	shld	r3,DBLRH
+	mov	DBLRL,r2
+	shld	r3,DBLRL
+	add	#32,r3
+	add	DBLRH,DBLRH
+	mov.l	LOCAL(x80000000),DBL1H
+	shld	r3,r12
+	rotcr	DBLRH		! combine sign with highpart
+	add	#-1,r3
+	shld	r3,r2
+	mov	#0,r3
+	rotl	r2
+	cmp/hi	DBL1H,r2
+	addc	r12,DBLRL
+	mov.l	@r15+,r12
+	rts
+	addc	r3,DBLRH
+
+LOCAL(ret_inf_late):
+	mov.l	@r15+,r11
+	mov.l	@r15+,r10
+	mov	DBLRH,DBL0H
+	mov.l	@r15+,r9
+	bra	LOCAL(return_inf)
+	mov.l	@r15+,r8
+
+LOCAL(return_0_late):
+	div0s	DBLRH,DBL1H
+	mov.l	@r15+,r12
+	mov	#0,DBLRH
+	rts
+	rotcr	DBLRH
+
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#21,r9
+	xtrct	r0,r8
+	add	#-16,r9
+0:	tst	r12,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#-8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(d1):	.word 1
+LOCAL(d12):	.word 12
+LOCAL(d13):	.word 13
+LOCAL(d120):	.word 120
+
+	.balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+        .byte   120, 105,  91,  78,  66,  54,  43,  33
+        .byte    24,  15,   8,   0,  -5, -12, -17, -22
+        .byte   -27, -31, -34, -37, -40, -42, -44, -45
+        .byte   -46, -46, -47, -46, -46, -45, -44, -42
+        .byte   -41, -39, -36, -34, -31, -28, -24, -20
+        .byte   -17, -12,  -8,  -4,   0,   5,  10,  16
+        .byte    21,  27,  33,  39,  45,  52,  58,  65
+        .byte    72,  79,  86,  93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssisf.S	(revision 0)
@@ -0,0 +1,89 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsisf))
+	.global GLOBAL(floatunsisf)
+	.balign	4
+GLOBAL(floatunsisf):
+	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r4,r1
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r4,r1
+	mov	#24,r2
+	bt	0f
+	mov	r4,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r1
+	mov	r4,r0
+	mov.l	LOCAL(x4a800000),r3	! bias + 23 - implicit 1
+	tst	r4,r4
+	bt	LOCAL(ret0)
+	!
+	sub	r1,r2
+	mov.l	LOCAL(x80000000),r1
+	shld	r2,r0
+	cmp/pz	r2
+	add	r3,r0
+	bt	LOCAL(noround)
+	add	#31,r2
+	shld	r2,r4
+	rotl	r4
+	add	#-31,r2
+	cmp/hi	r1,r4
+	mov	#0,r3
+	addc	r3,r0
+LOCAL(noround):
+	mov	#23,r1
+	shld	r1,r2
+	rts
+	sub	r2,r0
+LOCAL(ret0):
+	rts
+	nop
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsisf))
Index: gcc/config/sh/IEEE-754/m3/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssidf.S	(revision 0)
@@ -0,0 +1,91 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! floatunssidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsidf))
+	.global GLOBAL(floatunsidf)
+	.balign	4
+GLOBAL(floatunsidf):
+	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r4,r1
+	mov.w	LOCAL(0xff00),r3
+	cmp/eq	r4,r1
+	mov	#21,r2
+	bt	0f
+	mov	r4,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r5
+	mov	r4,DBLRL
+	mov.l	LOCAL(x41200000),r3	! bias + 20 - implicit 1
+	tst	r4,r4
+	mov	r4,DBLRH
+	bt	LOCAL(ret0)
+	sub	r5,r2
+	mov	r2,r5
+	shld	r2,DBLRH
+	cmp/pz	r2
+	add	r3,DBLRH
+	add	#32,r2
+	shld	r2,DBLRL
+	bf	0f
+	mov.w	LOCAL(d0),DBLRL
+0:	mov	#20,r2
+	shld	r2,r5
+	rts
+	sub	r5,DBLRH
+LOCAL(ret0):
+	mov	r4,DBLRL
+	rts
+	mov	r4,DBLRH
+
+LOCAL(0xff00):	.word  0xff00
+	.balign	4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0):	  .word 0
+		  .word 0x4120
+#else
+		  .word 0x4120
+LOCAL(d0):	  .word 0
+#endif
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsidf))
Index: gcc/config/sh/IEEE-754/m3/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixunsdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixunsdfsi.S	(revision 0)
@@ -0,0 +1,77 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!! fixunsdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixunsdfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get INT_MAX, for set sign bit, you get INT_MIN.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixunsdfsi)
+	FUNC(GLOBAL(fixunsdfsi))
+	.balign	4
+GLOBAL(fixunsdfsi):
+	mov.w	LOCAL(x413),r1	! bias + 20
+	mov	DBL0H,r0
+	shll	DBL0H
+	mov.l	LOCAL(mask),r3
+	mov	#-21,r2
+	shld	r2,DBL0H	! SH4-200 will start this insn in a new cycle
+	bt/s	LOCAL(ret0)
+	sub	r1,DBL0H
+	cmp/pl	DBL0H		! SH4-200 will start this insn in a new cycle
+	and	r3,r0
+	bf/s	LOCAL(ignore_low)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#11,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmax)
+	shld	DBL0H,DBL0L
+	rts
+	or	DBL0L,r0
+
+	.balign	8
+LOCAL(ignore_low):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	add	#1,r0
+	bf	0f
+LOCAL(ret0): mov #0,r0		! results in 0 return
+0:	rts
+	shld	DBL0H,r0
+
+LOCAL(retmax):
+	rts
+	mov	#-1,r0
+
+LOCAL(x413): .word 0x413
+
+	.balign 4
+LOCAL(mask): .long 0x000fffff
+	ENDFUNC(GLOBAL(fixunsdfsi))
+#endif /* L_fixunsdfsi */
Index: gcc/config/sh/IEEE-754/m3/divdf3-rt.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3-rt.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3-rt.S	(revision 0)
@@ -0,0 +1,514 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* This version is not quite finshed, since I've found that I can
+   get better average performance with a slightly altered algorithm.
+   Still, if you want a version for hard real time, this version here might
+   be a good starting point, since it has effectively no conditional
+   branches in the path that deals with normal numbers
+   (branches with zero offset are effectively conditional execution),
+   and thus it has a uniform execution time in this path.  */
+
+/* y = 1/x  ; x (- [1,2)
+   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+   y1 = y0 - ((y0) * x - 1) * y0  =  y-x*d^2
+   y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+   z0 = y2*a ;  a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+   z1 = y2*a1 (round to nearest odd 0.5 ulp);
+   a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+   z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+   with suitable scaling and/or top truncation.
+   x truncated to 20 bits is sufficient to calculate y0 or even y1.
+   Table entries are adjusted by about +128 to use full signed byte range.
+   This adjustment has been perturbed slightly to allow cse with the
+   shift count constant -26.
+   The threshold point for the shift adjust before rounding is found by
+   comparing the fractions, which is exact, unlike the top bit of y2.
+   Therefore, the top bit of y2 becomes slightly random after the adjustment
+   shift, but that's OK because this can happen only at the boundaries of
+   the interval, and the baising of the error means that it can in fact happen
+   only at the bottom end.  And there, the carry propagation will make sure
+   that in the end we will have in effect an implicit 1 (or two whem rounding
+   up...)  */
+/* If an exact result exists, it can have no more bits than the divident.
+   Hence, we don't need to bother with the round-to-even tie breaker
+   unless the result is denormalized.  */
+/* 70 cycles through main path for sh4-300 .  Some cycles might be
+   saved by more careful register allocation.
+   122 cycles for sh4-200.  If execution time for sh4-200 is of concern,
+   a specially scheduled version makes sense.  */
+
+#define x_h r12
+#define yn  r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too.  We still have to come back to denorm_arg1_done,
+   since we heven't done any of the work yet that we do till the denorm_arg0
+   entry point.  We know that neither of the arguments is inf/nan, but
+   arg0 might be zero.  Check for that first to avoid having to establish an
+   rts return address.  */
+LOCAL(both_denorm):
+	mov.l	r9,@-r15
+	mov	DBL0H,r1
+	mov.l	r0,@-r15
+	shll2	r1
+	mov.w LOCAL(both_denorm_cleanup_off),r9
+	or	DBL0L,r1
+	tst	r1,r1
+	mov	DBL0H,r0
+	bf/s	LOCAL(zero_denorm_arg0_1)
+	shll2	r0
+	mov.l	@(4,r15),r9
+	add	#8,r15
+	bra	LOCAL(ret_inf_nan_0)
+	mov	r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+	mov.l	@r15+,r0
+	!
+	mov.l	@r15+,r9
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+	bra	LOCAL(denorm_arg1_done)
+	!
+	add	r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+   (implicit 1).  To leave the result exponent unaltered, the other
+   argument's exponent is adjusted by the the shift count.  */
+
+	.balign 4
+LOCAL(arg0_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL0L,r0
+	shll	DBL0H
+	add	#1,r0
+	mov	DBL0L,DBL0H
+	shld	r0,DBL0H
+	rotcr	DBL0H
+	tst	DBL0L,DBL0L	/* Check for divide of zero.  */
+	add	#-33,r0
+	shld	r0,DBL0L
+	bf/s	LOCAL(adjust_arg1_exp)
+	add	#64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign.  */
+	mov.l	@r15+,r10
+	mov	#0,DBLRH
+	mov.l	@r15+,r9
+	bra	LOCAL(ret_inf_nan_0)
+	mov.l	@r15+,r8
+
+	.balign 4
+LOCAL(arg1_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL1L,r0
+	shll	DBL1H
+	add	#1,r0
+	mov	DBL1L,DBL1H
+	shld	r0,DBL1H
+	rotcr	DBL1H
+	tst	DBL1L,DBL1L	/* Check for divide by zero.  */
+	add	#-33,r0
+	shld	r0,DBL1L
+	bf/s	LOCAL(adjust_arg0_exp)
+	add	#64,r0
+	mov	DBL0H,r0
+	add	r0,r0
+	tst	r0,r0	! 0 / 0 ?
+	mov	#-1,DBLRH
+	bf	LOCAL(return_inf)
+	!
+	bt	LOCAL(ret_inf_nan_0)
+	!
+
+	.balign 4
+LOCAL(zero_denorm_arg1):
+	not	DBL0H,r3
+	mov	DBL1H,r0
+	tst	r2,r3
+	shll2	r0
+	bt	LOCAL(early_inf_nan_arg0)
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg1_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	!
+	shll	DBL1H
+	mov	DBL1L,r3
+	shld	r0,DBL1H
+	shld	r0,DBL1L
+	rotcr	DBL1H
+	add	#-32,r0
+	shld	r0,r3
+	add	#32,r0
+	or	r3,DBL1H
+LOCAL(adjust_arg0_exp):
+	tst	r2,DBL0H
+	mov	#20,r3
+	shld	r3,r0
+	bt	LOCAL(both_denorm)
+	add	DBL0H,r0
+	div0s	r0,DBL0H	! Check for obvious overflow.  */
+	not	r0,r3		! Check for more subtle overflow - lest
+	bt	LOCAL(return_inf)
+	mov	r0,DBL0H
+	tst	r2,r3		! we mistake it for NaN later
+	mov	#12,r3
+	bf	LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign.  */
+	mov	#20,r3
+	mov	#-2,DBLRH
+	bra	LOCAL(ret_inf_nan_0)
+	shad	r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan  nan/x -> nan */
+LOCAL(inf_nan_arg0):
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+LOCAL(early_inf_nan_arg0):
+	not	DBL1H,r3
+	mov	DBL0H,DBLRH
+	tst	r2,r3	! both inf/nan?
+	add	DBLRH,DBLRH
+	bf	LOCAL(ret_inf_nan_0)
+	mov	#-1,DBLRH
+LOCAL(ret_inf_nan_0):
+	mov	#0,DBLRL
+	mov.l	@r15+,r12
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+/* Already handled: inf/x, nan/x .  Thus: x/inf -> 0; x/nan -> nan */
+	.balign	4
+LOCAL(inf_nan_arg1):
+	mov	DBL1H,r2
+	mov	#12,r1
+	shld	r1,r2
+	mov.l	@r15+,r10
+	mov	#0,DBLRL
+	mov.l	@r15+,r9
+	or	DBL1L,r2
+	mov.l	@r15+,r8
+	cmp/hi	DBLRL,r2
+	mov.l	@r15+,r12
+	subc	DBLRH,DBLRH
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+	.balign 4
+LOCAL(zero_denorm_arg0):
+	mov.w	LOCAL(denorm_arg0_done_off),r9
+	not	DBL1H,r1
+	mov	DBL0H,r0
+	tst	r2,r1
+	shll2	r0
+	bt	LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg0_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	shll	DBL0H
+	mov	DBL0L,r12
+	shld	r0,DBL0H
+	shld	r0,DBL0L
+	rotcr	DBL0H
+	add	#-32,r0
+	shld	r0,r12
+	add	#32,r0
+	or	r12,DBL0H
+LOCAL(adjust_arg1_exp):
+	mov	#20,r12
+	shld	r12,r0
+	add	DBL1H,r0
+	div0s	r0,DBL1H	! Check for obvious underflow.  */
+	not	r0,r12		! Check for more subtle underflow - lest
+	bt	LOCAL(return_0)
+	mov	r0,DBL1H
+	tst	r2,r12		! we mistake it for NaN later
+	bt	LOCAL(return_0)
+	!
+	braf	r9
+	mov	#13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00):	.word 0xff00
+LOCAL(denorm_arg0_done_off):
+	.word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+	.word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign	8
+GLOBAL(divdf3):
+ mov.l	LOCAL(x7ff00000),r2
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst	r2,DBL1H
+ mov.l	r12,@-r15
+ bt	LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov	DBL1H,x_h	! x_h live in r12
+ shld	r3,x_h	! x - 1 ; u0.20
+ mov	x_h,yn
+ mova	LOCAL(ytab),r0
+ mov.l	r8,@-r15
+ shld	r1,yn	! x-1 ; u26.6
+ mov.b	@(r0,yn),yn
+ mov	#6,r0
+ mov.l	r9,@-r15
+ mov	x_h,r8
+ mov.l	r10,@-r15
+ shlr16	x_h	! x - 1; u16.16	! x/2 - 0.5 ; u15.17
+ add	x_h,r1	! SH4-200 single-issues this insn
+ shld	r0,yn
+ sub	r1,yn	! yn := y0 ; u15.17
+ mov	DBL1L,r1
+ mov	#-20,r10
+ mul.l	yn,x_h	! r12 dead
+ swap.w	yn,r9
+ shld	r10,r1
+ sts	macl,r0	! y0 * (x-1) - n ; u-1.32
+ add	r9,r0	! y0 * x - 1     ; s-1.32
+ tst	r2,DBL0H
+ dmuls.l r0,yn
+ mov.w	LOCAL(d13),r0
+ or	r1,r8	! x  - 1; u0.32
+ add	yn,yn	! yn = y0 ; u14.18
+ bt	LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):	! This label must stay aligned.
+ sts	mach,r1	!      d0 ; s14.18
+ sub	r1,yn	! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov	DBL0L,r12
+ shld	r0,yn	! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w	LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld	r10,r12
+ mov	yn,r0
+ mov	DBL0H,r8
+ add	yn,yn	! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts	mach,r1	! y1 * (x-1); u1.31
+ add	r0,r1	! y1 * x    ; u1.31
+ dmulu.l yn,r1
+ not	DBL0H,r10
+ shld	r9,r8
+ tst	r2,r10
+ or	r8,r12	! a - 1; u0.32
+ bt	LOCAL(inf_nan_arg0)
+ sts	mach,r1	! d1+yn; u1.31
+ sett		! adjust y2 so that it can be interpreted as s1.31
+ not	DBL1H,r10
+ subc	r1,yn	! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l	LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst	r2,r10
+ or	DBL1H,r2
+ bt	LOCAL(inf_nan_arg1)
+ mov.l	r11,@-r15
+ sts	mach,r11	! y2*(a-1) ; u1.31
+ add	yn,r11		! z0       ; u1.31
+ dmulu.l r11,DBL1L
+ mov.l	LOCAL(x40000000),DBLRH	! bias + 1
+ and	r9,r2		! x ; u12.20
+ cmp/hi	DBL0L,DBL1L
+ sts	macl,r8
+ mov	#-24,r12
+ sts	mach,r9 	! r9:r8 := z0 * DBL1L; u-19.64
+ subc	DBL1H,DBLRH
+ mul.l	r11,r2  	! (r9+macl):r8 == z0*x; u-19.64
+ shll	r8
+ add	DBL0H,DBLRH	! result sign/exponent + 1
+ mov	r8,r10
+ sts	macl,DBLRL
+ add	DBLRL,r9
+ rotcl	r9		! r9:r8 := z*x; u-20.63
+ shld	r12,r10
+ mov.l	LOCAL(x7fe00000),DBLRL
+ sub	DBL0L,r9	! r9:r8 := -a ; u-20.63
+ mov.l	LOCAL(x00200000),r12
+FIXME: the following  shift might loose the sign.
+ shll8	r9
+ or	r10,r9	! -a1 ; s-28.32
+ mov.l	LOCAL(x00100000),r10
+ dmuls.l r9,yn	! r3 dead
+ mov	DBL1H,r3
+ mov.l LOCAL(xfff00000),DBL0L
+ xor	DBL0H,r3	! calculate expected sign & bit20
+ div0s	r3,DBLRH
+ xor	DBLRH,r3
+ bt	LOCAL(ret_denorm_inf)
+ tst	DBLRL,DBLRH
+ bt	LOCAL(ret_denorm)
+ sub	r12,DBLRH ! calculate sign / exponent minus implicit 1
+ tst	r10,r3	! set T if a >= x
+ sts	mach,r12! -z1 ; s-27.32
+ bt	0f
+ add	r11,r11	! z0 ; u1.31 / u0.31
+0: mov	#6,r3
+ negc	r3,r10 ! shift count := a >= x ? -7 : -6; T := 1
+ shll8	r8	! r9:r8 := -a1 ; s-28.64
+ shad	r10,r12	! -z1 ; truncate to s-20.32 / s-21.32
+ rotcl	r12	! -z1 ; s-21.32 / s-22.32 / round to odd 0.5 ulp ; T := sign
+ add	#20,r10
+ dmulu.l r12,DBL1L ! r12 signed, DBL1L unsigned
+ and	DBL0L,DBLRH	! isolate sign / exponent
+ shld	r10,r9
+ mov	r8,r3
+ shld	r10,r8
+ sts	macl,DBL0L
+ sts	mach,DBLRL
+ add	#-32,r10
+ shld	r10,r3
+ mul.l r12,r2
+ bf	0f	! adjustment for signed/unsigned multiply
+ sub	DBL1L,DBLRL	! DBL1L dead
+0: shar	r12	! -z1 ; truncate to s-20.32 / s-21.32
+ sts	macl,DBL1L
+ or	r3,r9	! r9:r8 := -a1 ;             s-41.64/s-42.64
+ !
+ cmp/hi	r8,DBL0L
+ add	DBLRL,DBL1L ! DBL1L:DBL0L := -z1*x ; s-41.64/s-42.64
+ subc	DBL1L,r9
+ not	r12,DBLRL ! z1, truncated to s-20.32 / s-21.32
+ shll	r9	! T :=  a2 > 0
+ mov	r11,r2
+ mov	#21,r7
+ shld	r7,r11
+ addc	r11,DBLRL
+ mov.l	@r15+,r11
+ mov.l	@r15+,r10
+ mov	#-11,r7
+ mov.l	@r15+,r9
+ shld	r7,r2
+ mov.l	@r15+,r8
+ addc	r2,DBLRH
+ rts
+ mov.l	@r15+,r12
+
+LOCAL(ret_denorm):
+	tst	r10,DBLRH
+	bra	LOCAL(denorm_have_count)
+	movt	DBLRH	! calculate shift count (off by 2)
+
+LOCAL(ret_denorm_inf):
+	mov	DBLRH,r12
+	add	r12,r12
+	cmp/pz	r12
+	mov	#-21,DBLRL
+	bt	LOCAL(ret_inf_late)
+	shld	DBLRL,DBLRH
+LOCAL(denorm_have_count):
+	add	#-2,DBLRH
+/* FIXME */
+	bra	LOCAL(return_0)
+	mov.l	@r15+,r11
+
+LOCAL(ret_inf_late):
+	mov.l	@r15+,r11
+	!
+	mov.l	@r15+,r10
+	!
+	mov.l	@r15+,r9
+	bra	LOCAL(return_inf)
+	mov.l	@r15+,r8
+
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#8-11,r9
+	xtrct	r0,r8
+	add	#16,r9
+0:	tst	r12,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(d1):	.word 1
+LOCAL(d12):	.word 12
+LOCAL(d13):	.word 13
+
+	.balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+        .byte   120, 105,  91,  78,  66,  54,  43,  33
+        .byte    24,  15,   8,   0,  -5, -12, -17, -22
+        .byte   -27, -31, -34, -37, -40, -42, -44, -45
+        .byte   -46, -46, -47, -46, -46, -45, -44, -42
+        .byte   -41, -39, -36, -34, -31, -28, -24, -20
+        .byte   -17, -12,  -8,  -4,   0,   5,  10,  16
+        .byte    21,  27,  33,  39,  45,  52,  58,  65
+        .byte    72,  79,  86,  93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/addsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/addsf3.S	(revision 0)
@@ -0,0 +1,285 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! addsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+#ifdef L_add_sub_sf3
+	.balign 4
+	.global GLOBAL(subsf3)
+	FUNC(GLOBAL(subsf3))
+	.global GLOBAL(addsf3)
+	FUNC(GLOBAL(addsf3))
+GLOBAL(subsf3):
+	cmp/pz	r5
+	add	r5,r5
+	rotcr	r5
+	.balign 4
+GLOBAL(addsf3):
+	mov.l	LOCAL(x7f800000),r3
+	mov	r4,r6
+	add	r6,r6
+	mov	r5,r7
+	add	r7,r7
+	mov	r4,r0
+	or	r3,r0
+	cmp/hi	r6,r7
+	mov	r5,r1
+	bf/s	LOCAL(r4_hs)
+	 or	r3,r1
+	cmp/eq	r5,r1
+	bt	LOCAL(ret_r5) /* sole Inf or NaN, return unchanged.  */
+	shll8	r0	! r4 fraction
+	shll8	r1	! r5 fraction
+	mov	r6,r3
+	mov	#-24,r2
+	mov	r7,r6
+	shld	r2,r6	! r5 exp
+	mov	r0,r7
+	shld	r2,r3	! r4 exp
+	tst	r6,r6
+	sub	r6,r3	! exp difference (negative or 0)
+	bt	LOCAL(denorm_r4)
+LOCAL(denorm_r4_done): ! r1: u1.31
+	shld	r3,r0	! Get 31 upper bits, including 8 guard bits
+	mov.l	LOCAL(xff000000),r2
+	add	#31,r3
+	mov.l	r5,@-r15 ! push result sign.
+	cmp/pl	r3	! r0 has no more than one bit set -> return arg 1
+	shld	r3,r7	! copy of lowest guard bit in r0 and lower guard bits
+	bf	LOCAL(ret_stack)
+	div0s	r4,r5
+	bf/s	LOCAL(add)
+	 cmp/pl	r7	/* Is LSB in r0 clear, but any lower guards bit set?  */
+	subc	r0,r1
+	mov.l	LOCAL(c__clz_tab),r7
+	tst	r2,r1
+	mov	#-24,r3
+	bf/s LOCAL(norm_r0)
+	 mov	r1,r0
+	extu.w	r1,r1
+	bra	LOCAL(norm_check2)
+	 cmp/eq	r0,r1
+LOCAL(ret_r5):
+	rts
+	mov	r5,r0
+LOCAL(ret_stack):
+	rts
+	mov.l	@r15+,r0
+
+/* We leave the numbers denormalized, but we change the bit position to be
+   consistent with normalized numbers.  This also removes the spurious
+   leading one that was inserted before.  */
+LOCAL(denorm_r4):
+	tst	r3,r3
+	bf/s	LOCAL(denorm_r4_done)
+	add	r0,r0
+	bra	LOCAL(denorm_r4_done)
+	add	r1,r1
+LOCAL(denorm_r5):
+	tst	r6,r6
+	add	r1,r1
+	bf	LOCAL(denorm_r5_done)
+	clrt
+	bra	LOCAL(denorm_r5_done)
+	add	r0,r0
+
+/* If the exponent differs by two or more, normalization is minimal, and
+   few guard bits are needed for an exact final result, so sticky guard
+   bit compresion before subtraction (or add) works fine.
+   If the exponent differs by one, only one extra guard bit is generated,
+   and effectively no guard bit compression takes place.  */
+
+	.balign	4
+LOCAL(r4_hs):
+	cmp/eq	r4,r0
+	mov	#-24,r3
+	bt	LOCAL(inf_nan_arg0)
+	shld	r3,r7
+	shll8	r0
+	tst	r7,r7
+	shll8	r1
+	mov.l	LOCAL(xff000000),r2
+	bt/s	LOCAL(denorm_r5)
+	shld	r3,r6
+LOCAL(denorm_r5_done):
+	mov	r1,r3
+	subc	r6,r7
+	bf	LOCAL(same_exp)
+	shld	r7,r1	/* Get 31 upper bits.  */
+	add	#31,r7
+	mov.l	r4,@-r15 ! push result sign.
+	cmp/pl	r7
+	shld	r7,r3
+	bf	LOCAL(ret_stack)
+	div0s	r4,r5
+	bf/s	LOCAL(add)
+	 cmp/pl	r3	/* Is LSB in r1 clear, but any lower guard bit set?  */
+	subc	r1,r0
+	mov.l	LOCAL(c__clz_tab),r7
+LOCAL(norm_check):
+	tst	r2,r0
+	mov	#-24,r3
+	bf LOCAL(norm_r0)
+	extu.w	r0,r1
+	cmp/eq	r0,r1
+LOCAL(norm_check2):
+	mov	#-8,r3
+	bt LOCAL(norm_r0)
+	mov	#-16,r3
+LOCAL(norm_r0):
+	mov	r0,r1
+	shld	r3,r0
+#ifdef __pic__
+	add	r0,r7
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r7),r7
+	add	#25,r3
+	add	#-9+1,r6
+	mov	r1,r0
+	sub	r7,r3
+	mov.l	LOCAL(xbfffffff),r7
+	sub	r3,r6	/* generate exp-1  */
+	mov.w	LOCAL(d24),r2
+	cmp/pz	r6	/* check exp > 0  */
+	shld	r3,r0	/* Leading 1 becomes +1 exp adjustment.  */
+	bf	LOCAL(zero_denorm)
+LOCAL(denorm_done):
+	add	#30,r3
+	shld	r3,r1
+	mov.w   LOCAL(m1),r3
+	tst	r7,r1	! clear T if rounding up
+	shld	r2,r6
+	subc	r3,r0	! round - overflow will boost exp adjustment to 2.
+	mov.l	@r15+,r2
+	add	r6,r0	! overflow will generate inf
+	cmp/ge	r2,r3	! get sign into T
+	rts
+	rotcr	r0
+LOCAL(ret_r4):
+	rts
+	mov	r4,r0
+
+/* At worst, we are shifting the number back in place where an incoming
+   denormal was.  Thus, the shifts won't get out of range.  They still
+   might generate a zero fraction, but that's OK, that makes it 0.  */
+LOCAL(zero_denorm):
+	add	r6,r3
+	mov	r1,r0
+	mov	#0,r6	/* leading one will become free (except for rounding) */
+	bra	LOCAL(denorm_done)
+	shld	r3,r0
+
+/* Handle abs(r4) >= abs(r5), same exponents specially so we don't need
+   check for a zero fraction in the main path.  */
+LOCAL(same_exp):
+	div0s	r4,r5
+	mov.l	r4,@-r15
+	bf	LOCAL(add)
+	cmp/eq	r1,r0
+	mov.l	LOCAL(c__clz_tab),r7
+	bf/s	LOCAL(norm_check)
+	 sub	r1,r0
+	rts	! zero difference -> return +zero
+	mov.l	@r15+,r1
+
+/* r2: 0xff000000 */
+LOCAL(add):
+	addc	r1,r0
+	mov.w	LOCAL(x2ff),r7
+	shll8	r6
+	bf/s	LOCAL(no_carry)
+	shll16	r6
+	tst	r7,r0
+	shlr8	r0
+	mov.l	@r15+,r3	! discard saved sign
+	subc	r2,r0
+	sett
+	addc	r6,r0
+	cmp/hs	r2,r0
+	bt/s	LOCAL(inf)
+	 div0s	r7,r4 /* Copy sign.  */
+	rts
+	rotcr	r0
+LOCAL(inf):
+	mov	r6,r0
+	rts
+	rotcr	r0
+LOCAL(no_carry):
+	mov.w	LOCAL(m1),r3
+	tst	r6,r6
+	bt	LOCAL(denorm_add)
+	add	r0,r0
+	tst	r7,r0		! check if lower guard bit set or round to even
+	shlr8	r0
+	mov.l	@r15+,r1	! discard saved sign
+	subc	r3,r0	! round ; overflow -> exp++
+	cmp/ge	r4,r3	/* Copy sign.  */
+	add	r6,r0	! overflow -> inf
+	rts
+	rotcr	r0
+
+LOCAL(denorm_add):
+	cmp/ge	r4,r3	/* Copy sign.  */
+	shlr8	r0
+	mov.l	@r15+,r1	! discard saved sign
+	rts
+	rotcr	r0
+
+LOCAL(inf_nan_arg0):
+	cmp/eq	r5,r1
+	bf	LOCAL(ret_r4)
+	div0s	r4,r5		/* Both are inf or NaN, check signs.  */
+	bt	LOCAL(ret_nan)	/* inf - inf, or NaN.  */
+	mov	r4,r0		! same sign; return NaN if either is NaN.
+	rts
+	or	r5,r0
+LOCAL(ret_nan):
+	rts
+	mov	#-1,r0
+
+LOCAL(d24):
+	.word	24
+LOCAL(x2ff):
+	.word	0x2ff
+LOCAL(m1):
+	.word	-1
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(xbfffffff):
+	.long	0xbfffffff
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(xfe000000):
+	.long	0xfe000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+
+	ENDFUNC(GLOBAL(addsf3))
+	ENDFUNC(GLOBAL(subsf3))
+#endif /* L_add_sub_sf3 */
Index: gcc/config/sh/IEEE-754/m3/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/adddf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/adddf3.S	(revision 0)
@@ -0,0 +1,582 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! adddf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4-200 without FPU, but can also be used for SH3.
+! Numbers with same sign are added in typically 37 cycles, worst case is
+! 43 cycles, unless there is an overflow, in which case the addition can
+! take up to takes 47 cycles.
+! Normal numbers with different sign are added in 56 (57 for PIC) cycles
+! or less on SH4.
+! If one of the inputs is a denormal, the worst case is 59 (60 for PIC)
+! cycles. (Two denormal inputs are faster than normal inputs, and
+! denormal outputs don't slow down computation).
+! Subtraction takes two cycles to negate the second input and then drops
+! through to addition.
+
+/* If the input exponents of a difference of two normalized numbers
+   differ by more than one, the output does not need to be adjusted
+   by more than one bit position.  Hence, it makes sense to ensure that
+   the shifts by 0 & 1 are handled quickly to reduce average and worst
+   case times.  */
+FUNC(GLOBAL(adddf3))
+FUNC(GLOBAL(subdf3))
+	.global	GLOBAL(adddf3)
+	.global	GLOBAL(subdf3)
+LOCAL(denorm_arg1):
+	bt	LOCAL(inf_nan_arg0)
+	tst	r0,r2
+	bt/s	LOCAL(denorm_both)
+	shlr	r1
+	mov.l	LOCAL(x00100000),r3
+	bra	LOCAL(denorm_arg1_done)
+	 sub	r2,r3
+
+! Handle denorm addition here because otherwise the ordinary addition would
+! have to check for denormal results.
+! Denormal subtraction could also be done faster, but the denorm subtraction
+! path here is still one cycles faster than the one for normalized input
+! numbers, and 16 instructions shorter than the fastest version.
+! Here we also generate +0.0 + +0.0 -> +0.0 ; -0.0 + -0.0 -> -0.0
+LOCAL(denorm_both):
+	div0s	DBL0H,DBL1H
+	mov.l	LOCAL(x800fffff),r9
+	bt/s	LOCAL(denorm_sub)
+	and	r1,DBL1H
+	and	r9,DBL0H
+	mov.l	@r15+,r9
+	mov	DBL0L,DBLRL
+	mov	DBL0H,DBLRH
+	addc	DBL1L,DBLRL
+	mov.l	@r15+,r8
+	rts
+	 addc	DBL1H,DBLRH
+
+! N.B., since subtraction also generates +0.0 for subtraction of numbers
+! with identical fractions, this also covers the +0.0 + -0.0 -> +0.0 /
+! -0.0 + +0.0 -> +0.0 cases.
+LOCAL(denorm_sub):
+	mov	DBL0H,r8	! tentative result sign
+	and	r1,DBL0H
+	bra	LOCAL(sub_same_exp)
+	 addc	r1,r2	! exponent++, clear T
+
+LOCAL(inf_nan_arg0):
+	mov	DBL0L,DBLRL
+	bra	LOCAL(pop_r8_r9)
+	 mov	DBL0H,DBLRH
+
+LOCAL(ret_arg0):
+	mov.l	LOCAL(x800fffff),DBLRH
+	mov	DBL0L,DBLRL
+	mov	r2,r3
+LOCAL(ret_arg):
+	mov.l	@r15+,r9
+	and	r8,DBLRH
+	mov.l	@r15+,r8
+	rts
+	or	r3,DBLRH
+
+	.balign	4
+GLOBAL(subdf3):
+	cmp/pz	DBL1H
+	add	DBL1H,DBL1H
+	rotcr	DBL1H
+	nop
+
+GLOBAL(adddf3):
+	mov.l	LOCAL(x7ff00000),r0
+	mov	DBL0H,r2
+	mov.l	LOCAL(x001fffff),r1
+	mov	DBL1H,r3
+	mov.l	r8,@-r15
+	and	r0,r2
+	mov.l	r9,@-r15
+	and	r0,r3
+	cmp/hi	r2,r3
+	or	r0,DBL0H
+	or	r0,DBL1H
+	bt	LOCAL(arg1_gt)
+	tst	r0,r3
+	mov	#-20,r9
+	bt/s	LOCAL(denorm_arg1)
+	cmp/hs	r0,r2
+	bt	LOCAL(inf_nan_arg0)
+	sub	r2,r3
+LOCAL(denorm_arg1_done):	! r2 is tentative result exponent
+	shad	r9,r3
+	mov.w	LOCAL(m32),r9
+	mov	DBL0H,r8	! tentative result sign
+	and	r1,DBL0H	! arg0 fraction
+	mov	DBL1H,r0	! the 'other' sign
+	and	r1,DBL1H	! arg1 fraction
+	cmp/ge	r9,r3
+	mov	DBL1H,r1
+	bf/s	LOCAL(large_shift_arg1)
+	 shld	r3,DBL1H
+LOCAL(small_shift_arg1):
+	mov	DBL1L,r9
+	shld	r3,DBL1L
+	tst	r3,r3
+	add	#32,r3
+	bt/s	LOCAL(same_exp)
+	 div0s	r8,r0	! compare signs
+	shld	r3,r1
+
+	or	r1,DBL1L
+	bf/s	LOCAL(add)
+	shld	r3,r9
+	clrt
+	negc	r9,r9
+	mov.l	LOCAL(x001f0000),r3
+LOCAL(sub_high):
+	mov	DBL0L,DBLRL
+	subc	DBL1L,DBLRL
+	mov	DBL0H,DBLRH
+	bra	LOCAL(subtract_done)
+	 subc	DBL1H,DBLRH
+
+LOCAL(large_shift_arg1):
+	mov.w	LOCAL(d0),r9
+	add	#64,r3
+	cmp/pl	r3
+	shld	r3,r1
+	bf	LOCAL(ret_arg0)
+	cmp/hi	r9,DBL1L
+	mov	DBL1H,DBL1L
+	mov	r9,DBL1H
+	addc	r1,r9
+
+	div0s	r8,r0	! compare signs
+
+	bf	LOCAL(add)
+	clrt
+	mov.l	LOCAL(x001f0000),r3
+	bra	LOCAL(sub_high)
+	 negc	r9,r9
+
+LOCAL(add_clr_r9):
+	mov	#0,r9
+LOCAL(add):
+	mov.l	LOCAL(x00200000),r3
+	addc	DBL1L,DBL0L
+	addc	DBL1H,DBL0H
+	mov.l	LOCAL(x80000000),r1
+	tst	r3,DBL0H
+	mov.l	LOCAL(x7fffffff),r3
+	mov	DBL0L,r0
+	bt/s	LOCAL(no_carry)
+	and	r1,r8
+	tst	r9,r9
+	bf	LOCAL(add_one)
+	tst	#2,r0
+LOCAL(add_one):
+	subc	r9,r9
+	sett
+	mov	r0,DBLRL
+	addc	r9,DBLRL
+	mov	DBL0H,DBLRH
+	addc	r9,DBLRH
+	shlr	DBLRH
+	mov.l	LOCAL(x7ff00000),r3
+	add	r2,DBLRH
+	mov.l	@r15+,r9
+	rotcr	DBLRL
+	cmp/hi	r3,DBLRH
+LOCAL(add_done):
+	bt	LOCAL(inf)
+LOCAL(or_sign):
+	or	r8,DBLRH
+	rts
+	 mov.l	@r15+,r8
+
+LOCAL(inf):
+	bra	LOCAL(or_sign)
+	 mov	r3,DBLRH
+
+LOCAL(pos_difference_0):
+	tst	r3,DBL0H
+	mov	DBL0L,DBLRL
+	mov.l	LOCAL(x80000000),DBL0L
+	mov	DBL0H,DBLRH
+	mov.l	LOCAL(x00100000),DBL0H
+	bt/s	LOCAL(long_norm)
+	and	DBL0L,r8
+	bra	LOCAL(norm_loop)
+	 not	DBL0L,r3
+
+LOCAL(same_exp):
+	bf	LOCAL(add_clr_r9)
+	clrt
+LOCAL(sub_same_exp):
+	subc	DBL1L,DBL0L
+	mov.l	LOCAL(x001f0000),r3
+	subc	DBL1H,DBL0H
+	mov.w	LOCAL(d0),r9
+	bf	LOCAL(pos_difference_0)
+	clrt
+	negc	DBL0L,DBLRL
+	mov.l	LOCAL(x80000000),DBL0L
+	negc	DBL0H,DBLRH
+	mov.l	LOCAL(x00100000),DBL0H
+	tst	r3,DBLRH
+	not	r8,r8
+	bt/s	LOCAL(long_norm)
+	and	DBL0L,r8
+	bra	LOCAL(norm_loop)
+	 not	DBL0L,r3
+
+LOCAL(large_shift_arg0):
+	add	#64,r2
+
+	mov	#0,r9
+	cmp/pl	r2
+	shld	r2,r1
+	bf	LOCAL(ret_arg1_exp_r3)
+	cmp/hi	r9,DBL0L
+	mov	DBL0H,DBL0L
+	mov	r9,DBL0H
+	addc	r1,r9
+	div0s	r8,r0	! compare signs
+	mov	r3,r2	! tentative result exponent
+	bf	LOCAL(add)
+	clrt
+	negc	r9,r9
+	bra	LOCAL(subtract_arg0_arg1_done)
+	 mov	DBL1L,DBLRL
+
+LOCAL(arg1_gt):
+	tst	r0,r2
+	mov	#-20,r9
+	bt/s	LOCAL(denorm_arg0)
+	cmp/hs	r0,r3
+	bt	LOCAL(inf_nan_arg1)
+	sub	r3,r2
+LOCAL(denorm_arg0_done):
+	shad	r9,r2
+	mov.w	LOCAL(m32),r9
+	mov	DBL1H,r8	! tentative result sign
+	and	r1,DBL1H
+	mov	DBL0H,r0	! the 'other' sign
+	and	r1,DBL0H
+	cmp/ge	r9,r2
+	mov	DBL0H,r1
+	shld	r2,DBL0H
+	bf	LOCAL(large_shift_arg0)
+	mov	DBL0L,r9
+	shld	r2,DBL0L
+	add	#32,r2
+	mov.l	r3,@-r15
+	shld	r2,r1
+	mov	r2,r3
+	div0s	r8,r0		! compare signs
+	mov.l	@r15+,r2	! tentative result exponent
+	shld	r3,r9
+	bf/s	LOCAL(add)
+	or	r1,DBL0L
+	clrt
+	negc	r9,r9
+	mov	DBL1L,DBLRL
+LOCAL(subtract_arg0_arg1_done):
+	subc	DBL0L,DBLRL
+	mov	DBL1H,DBLRH
+	mov.l	LOCAL(x001f0000),r3
+	subc	DBL0H,DBLRH
+/* Since the exponents were different, the difference is positive.  */
+/* Fall through */
+LOCAL(subtract_done):
+/* First check if a shift by a few bits is sufficient.  This not only
+   speeds up this case, but also alleviates the need for considering
+   lower bits from r9 or rounding in the other code.
+   Moreover, by handling the upper 1+4 bits of the fraction here, long_norm
+   can assume that DBLRH fits into 20 (20 < 16) bit.  */
+	tst	r3,DBLRH
+	mov.l	LOCAL(x80000000),r3
+	mov.l	LOCAL(x00100000),DBL0H
+	bt/s	LOCAL(long_norm)
+	and	r3,r8
+	mov.l	LOCAL(x7fffffff),r3
+LOCAL(norm_loop):	! Well, this used to be a loop...
+	tst	DBL0H,DBLRH
+	sub	DBL0H,r2
+	bf	LOCAL(norm_round)
+	shll	r9
+	rotcl	DBLRL
+
+	rotcl	DBLRH
+
+	tst	DBL0H,DBLRH
+	sub	DBL0H,r2
+	bf	LOCAL(norm_round)
+	shll	DBLRL
+	rotcl	DBLRH
+	mov.l	@r15+,r9
+	cmp/gt	r2,DBL0H
+	sub	DBL0H,r2
+LOCAL(norm_loop_1):
+	bt	LOCAL(denorm0_n)
+	tst	DBL0H,DBLRH
+	bf	LOCAL(norm_pack)
+	shll	DBLRL
+	rotcl	DBLRH	! clears T
+	bra	LOCAL(norm_loop_1)
+	 subc	DBL0H,r2
+
+LOCAL(no_carry):
+	shlr	r0
+	mov.l	LOCAL(x000fffff),DBLRH
+	addc	r3,r9
+	mov.w	LOCAL(d0),DBL1H
+	mov	DBL0L,DBLRL
+	and	DBL0H,DBLRH	! mask out implicit 1
+	mov.l	LOCAL(x7ff00000),r3
+	addc	DBL1H,DBLRL
+	addc	r2,DBLRH
+	mov.l	@r15+,r9
+	add	DBL1H,DBLRH	! fraction overflow -> exp increase
+	bra	LOCAL(add_done)
+	 cmp/hi	r3,DBLRH
+
+LOCAL(denorm_arg0):
+	bt	LOCAL(inf_nan_arg1)
+	mov.l	LOCAL(x00100000),r2
+	shlr	r1
+	bra	LOCAL(denorm_arg0_done)
+	 sub	r3,r2
+
+LOCAL(inf_nan_arg1):
+	mov	DBL1L,DBLRL
+	bra	LOCAL(pop_r8_r9)
+	 mov	DBL1H,DBLRH
+
+LOCAL(ret_arg1_exp_r3):
+	mov.l	LOCAL(x800fffff),DBLRH
+	bra	LOCAL(ret_arg)
+	 mov	DBL1L,DBLRL
+
+#ifdef __pic__
+	.balign 8
+#endif
+LOCAL(m32):
+	.word	-32
+LOCAL(d0):
+	.word	0
+#ifndef __pic__
+	.balign 8
+#endif
+! Because we had several bits of cancellations, we know that r9 contains
+! only one bit.
+! We'll normalize by shifting words so that DBLRH:DBLRL contains
+! the fraction with 0 < DBLRH <= 0x1fffff, then we shift DBLRH:DBLRL
+! up by 21 minus the number of non-zero bits in DBLRH.
+LOCAL(long_norm):
+	tst	DBLRH,DBLRH
+	mov.w	LOCAL(xff),DBL0L
+	mov	#21,r3
+	bf	LOCAL(long_norm_highset)
+	mov.l	LOCAL(x02100000),DBL1L	! shift 32, implicit 1
+	tst	DBLRL,DBLRL
+	extu.w	DBLRL,DBL0H
+	bt	LOCAL(zero_or_ulp)
+	mov	DBLRL,DBLRH
+	cmp/hi	DBL0H,DBLRL
+	bf	0f
+	mov.l	LOCAL(x01100000),DBL1L	! shift 16, implicit 1
+	clrt
+	shlr16  DBLRH
+	xtrct	DBLRL,r9
+	mov     DBLRH,DBL0H
+LOCAL(long_norm_ulp_done):
+0:	mov	r9,DBLRL	! DBLRH:DBLRL == fraction; DBL0H == DBLRH
+	subc	DBL1L,r2
+	bt	LOCAL(denorm1_b)
+#ifdef __pic__
+	mov.l	LOCAL(c__clz_tab),DBL1H
+LOCAL(long_norm_lookup):
+	mov	r0,r9
+	mova	LOCAL(c__clz_tab),r0
+	add	DBL1H,r0
+#else
+	mov	r0,r9
+LOCAL(long_norm_lookup):
+	mov.l	LOCAL(c__clz_tab),r0
+#endif /* __pic__ */
+	cmp/hi	DBL0L,DBL0H
+	bf	0f
+	shlr8	DBL0H
+0:	mov.b	@(r0,DBL0H),r0
+	bf	0f
+	add	#-8,r3
+0:	mov.w	LOCAL(d20),DBL0L
+	mov	#-20,DBL0H
+	clrt
+	sub	r0,r3
+	mov	r9,r0
+	mov	r3,DBL1H
+	shld	DBL0L,DBL1H
+	subc	DBL1H,r2
+	!
+	bf	LOCAL(no_denorm)
+	shad	DBL0H,r2
+	bra	LOCAL(denorm1_done)
+	add	r2,r3
+	
+LOCAL(norm_round):
+	cmp/pz	r2
+	mov	#0,DBL1H
+	bf	LOCAL(denorm0_1)
+	or	r8,r2
+	mov	DBLRL,DBL1L
+	shlr	DBL1L
+	addc	r3,r9
+	mov.l	@r15+,r9
+	addc	DBL1H,DBLRL	! round to even
+	mov.l	@r15+,r8
+	rts
+	 addc	r2,DBLRH
+
+LOCAL(norm_pack):
+	add	r8,DBLRH
+	mov.l	@r15+,r8
+	rts
+	add	r2,DBLRH
+
+LOCAL(denorm0_1):
+	mov.l	@r15+,r9
+	mov	r8,DBL0L
+	mov.l	@r15+,r8
+LOCAL(denorm0_shift):
+	shlr	DBLRH
+	rotcr	DBLRL
+
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(denorm0_n):
+	mov	r8,DBL0L
+	addc	DBL0H,r2
+	mov.l	@r15+,r8
+	bf	LOCAL(denorm0_shift)
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(no_denorm):
+	add	r2,r8		! add (exponent - 1) to sign
+
+LOCAL(denorm1_done):
+	shld	r3,DBLRH
+	mov	DBLRL,DBL0L
+	shld	r3,DBLRL
+
+	add	r8,DBLRH	! add in sign and (exponent - 1)
+	mov.l	@r15+,r9
+	add	#-32,r3
+	mov.l	@r15+,r8
+	shld	r3,DBL0L
+
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(long_norm_highset):
+	mov.l	LOCAL(x00200000),DBL1L	! shift 1, implicit 1
+	shll	r9
+	rotcl	DBLRL
+	mov	DBLRH,DBL0H
+	rotcl	DBLRH	! clears T
+#ifdef __pic__
+	mov.l	LOCAL(c__clz_tab),DBL1H
+#else
+	mov	r0,r9
+#endif /* __pic__ */
+	subc	DBL1L,r2
+	add	#-1,r3
+	bf	LOCAL(long_norm_lookup)
+LOCAL(denorm1_a):
+	shlr	DBLRH
+	rotcr	DBLRL
+	mov.l	@r15+,r9
+	or	r8,DBLRH
+
+	rts
+	mov.l	@r15+,r8
+
+	.balign	4
+LOCAL(denorm1_b):
+	mov	#-20,DBL0L
+	shad	DBL0L,r2
+	mov	DBLRH,DBL0L
+	shld	r2,DBLRH
+	shld	r2,DBLRL
+	or	r8,DBLRH
+	mov.l	@r15+,r9
+	add	#32,r2
+	mov.l	@r15+,r8
+	shld	r2,DBL0L
+	rts
+	or	DBL0L,DBLRL
+
+LOCAL(zero_or_ulp):
+	tst	r9,r9
+	bf	LOCAL(long_norm_ulp_done)
+	! return +0.0
+LOCAL(pop_r8_r9):
+	mov.l	@r15+,r9
+	rts
+	mov.l	@r15+,r8
+
+LOCAL(d20):
+	.word	20
+LOCAL(xff):
+	.word 0xff
+	.balign	4
+LOCAL(x7ff00000):
+	.long	0x7ff00000
+LOCAL(x001fffff):
+	.long	0x001fffff
+LOCAL(x80000000):
+	.long	0x80000000
+LOCAL(x000fffff):
+	.long	0x000fffff
+LOCAL(x800fffff):
+	.long	0x800fffff
+LOCAL(x001f0000):
+	.long	0x001f0000
+LOCAL(x00200000):
+	.long	0x00200000
+LOCAL(x7fffffff):
+	.long	0x7fffffff
+LOCAL(x00100000):
+	.long	0x00100000
+LOCAL(x02100000):
+	.long	0x02100000
+LOCAL(x01100000):
+	.long	0x01100000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(adddf3))
+ENDFUNC(GLOBAL(subdf3))
Index: gcc/config/sh/IEEE-754/m3/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/mulsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/mulsf3.S	(revision 0)
@@ -0,0 +1,241 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! mulsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+	.balign 4
+	.global GLOBAL(mulsf3)
+	FUNC(GLOBAL(mulsf3))
+GLOBAL(mulsf3):
+	mov.l	LOCAL(x7f800000),r1
+	not	r4,r2
+	mov	r4,r3
+	not	r5,r0
+	tst	r1,r2
+	or	r1,r3
+	bt/s	LOCAL(inf_nan_arg0)
+	 tst	r1,r0
+	bt	LOCAL(inf_nan_arg1)
+	tst	r1,r5
+	mov	r1,r2
+	shll8	r3
+	or	r5,r1
+	bt/s	LOCAL(zero_denorm_arg1)
+	 shll8	r1
+	tst	r2,r4
+	bt	LOCAL(zero_denorm_arg0)
+	dmulu.l	r3,r1
+	mov	r4,r0
+	and	r2,r0
+LOCAL(arg_norm):
+	and	r5,r2
+	mov.l	LOCAL(x3f800000),r3
+	sts	mach,r1
+	sub	r3,r0
+	sts	macl,r3
+	add	r2,r0
+	cmp/pz	r1
+	mov.w	LOCAL(x100),r2
+	bf/s	LOCAL(norm_frac)
+	 tst	r3,r3
+	shll2	r1	/* Shift one up, replace leading 1 with 0.  */
+	shlr	r1
+	tst	r3,r3
+LOCAL(norm_frac):
+	mov.w	LOCAL(mx80),r3
+	bf	LOCAL(round_frac)
+	tst	r2,r1
+LOCAL(round_frac):
+	mov.l	LOCAL(xff000000),r2
+	subc	r3,r1	/* Even overflow gives right result: exp++, frac=0.  */
+	shlr8	r1
+	add	r1,r0
+	shll	r0
+	bt	LOCAL(ill_exp)
+	tst	r2,r0
+	bt	LOCAL(denorm0)
+	cmp/hs	r2,r0
+	bt	LOCAL(inf)
+LOCAL(insert_sign):
+	div0s	r4,r5
+	rts
+	rotcr	r0
+LOCAL(denorm0):
+	sub	r2,r0
+	bra	LOCAL(insert_sign)
+	 shlr	r0
+LOCAL(zero_denorm_arg1):
+	mov.l	LOCAL(x60000000),r2	/* Check exp0 >= -64	*/
+	add	r1,r1
+	tst	r1,r1	/* arg1 == 0 ? */
+	mov	#0,r0
+	bt	LOCAL(insert_sign) /* argument 1 is zero ==> return 0  */
+	tst	r4,r2
+	bt	LOCAL(insert_sign) /* exp0 < -64  ==> return 0 */
+	mov.l	LOCAL(c__clz_tab),r0
+	mov	r3,r2
+	mov	r1,r3
+	bra	LOCAL(arg_normalize)
+	mov	r2,r1
+LOCAL(zero_denorm_arg0):
+	mov.l	LOCAL(x60000000),r2	/* Check exp1 >= -64	*/
+	add	r3,r3
+	tst	r3,r3	/* arg0 == 0 ? */
+	mov	#0,r0
+	bt	LOCAL(insert_sign) /* argument 0 is zero ==> return 0  */
+	tst	r5,r2
+	bt	LOCAL(insert_sign) /* exp1 < -64  ==> return 0 */
+	mov.l	LOCAL(c__clz_tab),r0
+LOCAL(arg_normalize):
+	mov.l	r7,@-r15
+	extu.w	r3,r7
+	cmp/eq	r3,r7
+	mov.l	LOCAL(xff000000),r7
+	mov	#-8,r2
+	bt	0f
+	tst	r7,r3
+	mov	#-16,r2
+	bt	0f
+	mov	#-24,r2
+0:
+	mov	r3,r7
+	shld	r2,r7
+#ifdef __pic__
+	add	r0,r7
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r7),r0
+	add	#32,r2
+	mov	r2,r7
+	mov	#23,r2
+	sub	r0,r7
+	mov.l	LOCAL(x7f800000),r0
+	shld	r7,r3
+	shld	r2,r7
+	mov	r0,r2
+	and	r4,r0
+	sub	r7,r0
+	mov.l	@r15+,r7
+	bra	LOCAL(arg_norm)
+	 dmulu.l	r3,r1
+#if 0 /* This is slightly slower, but could be used if table lookup causes
+         cache thrashing.  */
+	bt	LOCAL(insert_sign) /* exp1 < -64  ==> return 0 */
+	mov.l	LOCAL(xff000000),r2
+	mov	r4,r0
+LOCAL(arg_normalize):
+	tst	r2,r3
+	bf	LOCAL(arg_bit_norm)
+LOCAL(arg_byte_loop):
+	tst	r2,r3
+	add	r2,r0
+	shll8	r3
+	bt	LOCAL(arg_byte_loop)
+	add	r4,r0
+LOCAL(arg_bit_norm):
+	mov.l	LOCAL(x7f800000),r2
+	rotl	r3
+LOCAL(arg_bit_loop):
+	add	r2,r0
+	bf/s	LOCAL(arg_bit_loop)
+	 rotl	r3
+	rotr	r3
+	rotr	r3
+	sub	r2,r0
+	bra	LOCAL(arg_norm)
+	 dmulu.l	r3,r1
+#endif /* 0 */
+LOCAL(inf):
+	bra	LOCAL(insert_sign)
+	 mov	r2,r0
+LOCAL(inf_nan_arg0):
+	bt	LOCAL(inf_nan_both)
+	add	r0,r0
+	cmp/eq	#-1,r0	/* arg1 zero? -> NAN */
+	bt	LOCAL(insert_sign)
+	mov	r4,r0
+LOCAL(inf_insert_sign):
+	bra	LOCAL(insert_sign)
+	 add	r0,r0
+LOCAL(inf_nan_both):
+	mov	r4,r0
+	bra	LOCAL(inf_insert_sign)
+	 or	r5,r0
+LOCAL(inf_nan_arg1):
+	mov	r2,r0
+	add	r0,r0
+	cmp/eq	#-1,r0	/* arg0 zero? */
+	bt	LOCAL(insert_sign)
+	bra	LOCAL(inf_insert_sign)
+	 mov	r5,r0
+LOCAL(ill_exp):
+	cmp/pz	r0
+	mov	#-24,r3
+	bt	LOCAL(inf)
+	add	r1,r1
+	mov	r0,r2
+	sub	r1,r2	! remove fraction to get back pre-rounding exponent.
+	sts	mach,r0
+	sts	macl,r1
+	shad	r3,r2
+	mov	r0,r3
+	shld	r2,r0
+	add	#32,r2
+	cmp/pz	r2
+	shld	r2,r3
+	bf	LOCAL(zero)
+	or	r1,r3
+	mov	#-1,r1
+	tst	r3,r3
+	mov.w	LOCAL(x100),r3
+	bf/s	LOCAL(denorm_round_up)
+	mov	#-0x80,r1
+	tst	r3,r0
+LOCAL(denorm_round_up):
+	mov	#-7,r3
+	subc	r1,r0
+	bra	LOCAL(insert_sign)
+	 shld	r3,r0
+LOCAL(zero):
+	bra	LOCAL(insert_sign)
+	 mov #0,r0
+LOCAL(x100):
+	.word	0x100
+LOCAL(mx80):
+	.word	-0x80
+	.balign	4
+LOCAL(x7f800000):
+	.long 0x7f800000
+LOCAL(x3f800000):
+	.long 0x3f800000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x60000000):
+	.long	0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+	ENDFUNC(GLOBAL(mulsf3))
Index: gcc/config/sh/IEEE-754/m3/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsisf.S	(revision 0)
@@ -0,0 +1,101 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsisf))
+	.global GLOBAL(floatsisf)
+	.balign	4
+GLOBAL(floatsisf):
+	cmp/pz	r4
+	mov	r4,r5
+	bt	0f
+	neg	r4,r5
+0:	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r5,r1
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r5,r1
+	mov	#24,r2
+	bt	0f
+	mov	r5,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r1
+	cmp/pz	r4
+	mov.l	LOCAL(x4a800000),r3	! bias + 23 - implicit 1
+	bt	0f
+	mov.l	LOCAL(xca800000),r3	! sign + bias + 23 - implicit 1
+0:	mov	r5,r0
+	sub	r1,r2
+	mov.l	LOCAL(x80000000),r1
+	shld	r2,r0
+	cmp/pz	r2
+	add	r3,r0
+	bt	LOCAL(noround)
+	add	#31,r2
+	shld	r2,r5
+	add	#-31,r2
+	rotl	r5
+	cmp/hi	r1,r5
+	mov	#0,r3
+	addc	r3,r0
+	mov	#23,r1
+	shld	r1,r2
+	rts
+	sub	r2,r0
+	.balign	8
+LOCAL(noround):
+	mov	#23,r1
+	tst	r4,r4
+	shld	r1,r2
+	bt	LOCAL(ret0)
+	rts
+	sub	r2,r0
+LOCAL(ret0):
+	rts
+	mov	#0,r0
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(xca800000): .long 0xca800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsisf))
Index: gcc/config/sh/IEEE-754/m3/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/muldf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/muldf3.S	(revision 0)
@@ -0,0 +1,481 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! muldf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+! Normal numbers are multiplied in 53 or 54 cycles on SH4-200.
+
+FUNC(GLOBAL(muldf3))
+	.global GLOBAL(muldf3)
+LOCAL(inf_nan_denorm_or_zero_a):
+	mov.l	r8,@-r15
+	sub	r3,DBL0H	! isolate high fraction
+	mov.l	@(4,r15),r8	! original DBL0H (with sign & exp)
+	sub	r3,r1		! 0x7ff00000
+	mov.l	LOCAL(x60000000),r3
+	shll16	r2		! 0xffff0000
+	!			  no stall here for sh4-200
+	!
+	tst	r1,r8
+	mov.l	r0,@-r15
+	bf	LOCAL(inf_nan_a)
+	tst	r1,r0		! test for DBL1 inf, nan or small
+	bt	LOCAL(ret_inf_nan_zero)
+LOCAL(normalize_arg):
+	tst	DBL0H,DBL0H
+	bf	LOCAL(normalize_arg53)
+	tst	DBL0L,DBL0L
+	bt	LOCAL(a_zero)
+	tst	r2,DBL0L
+	mov	DBL0L,DBL0H
+	bt	LOCAL(normalize_arg16)
+	shlr16	DBL0H
+	mov.w	LOCAL(m15),r2	! 1-16
+	bra	LOCAL(normalize_arg48)
+	shll16	DBL0L
+
+LOCAL(normalize_arg53):
+	tst	r2,DBL0H
+	mov	#1,r2
+	bt	LOCAL(normalize_arg48)
+	mov	DBL0H,r1
+	shlr16	r1
+	bra	LOCAL(normalize_DBL0H)
+	mov	#21-16,r3
+
+LOCAL(normalize_arg16):
+	mov.w	LOCAL(m31),r2 ! 1-32
+	mov	#0,DBL0L
+LOCAL(normalize_arg48):
+	mov	DBL0H,r1
+	mov	#21,r3
+LOCAL(normalize_DBL0H):
+	extu.b	r1,r8
+	mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r8,r1
+	!
+	bt	0f
+	shlr8	r1
+0:
+#ifdef	__pic__
+	add	r0,r1
+
+	mova	LOCAL(c__clz_tab),r0
+
+#endif /* __pic__ */
+	mov.b	@(r0,r1),r8
+	mov	DBL0L,r1
+	mov.l	@r15+,r0
+	bt	0f
+	add	#-8,r3
+0:	clrt
+	sub	r8,r3
+	mov.w	LOCAL(d20),r8
+	shld	r3,DBL0H
+	shld	r3,DBL0L
+	sub	r3,r2
+	add	#-32,r3
+	shld	r3,r1
+	mov.l	LOCAL(x00100000),r3
+	or	r1,DBL0H
+	shld	r8,r2
+	mov.l	@r15+,r8
+	add	r2,DBL1H
+	mov.l	LOCAL(x001fffff),r2
+	dmulu.l	DBL0L,DBL1L
+	bra	LOCAL(arg_denorm_done)
+	or	r3,r0		! set implicit 1 bit
+
+LOCAL(a_zero):
+	mov.l	@(4,r15),r8
+	add	#8,r15
+LOCAL(zero):
+	mov	#0,DBLRH
+	bra	LOCAL(pop_ret)
+	mov	#0,DBLRL
+
+! both inf / nan -> result is nan if at least one is none, else inf.
+! BBL0 inf/nan, DBL1 zero   -> result is nan
+! DBL0 inf/nan, DBL1 finite -> result is DBL0 with sign adjustemnt
+LOCAL(inf_nan_a):
+	mov	r8,DBLRH
+	mov.l	@(4,r15),r8
+	add	#8,r15
+	tst	r1,r0	! arg1 inf/nan ?
+	mov	DBL0L,DBLRL
+	bt	LOCAL(both_inf_nan)
+	tst	DBL1L,DBL1L
+	mov	DBL1H,r1
+	bf	LOCAL(pop_ret)
+	add	r1,r1
+	tst	r1,r1
+	!
+	bf	LOCAL(pop_ret)
+LOCAL(nan):
+	mov	#-1,DBLRL
+	bra	LOCAL(pop_ret)
+	mov	#-1,DBLRH
+
+LOCAL(both_inf_nan):
+	or	DBL1L,DBLRL
+	bra	LOCAL(pop_ret)
+	or	DBL1H,DBLRH
+
+LOCAL(ret_inf_nan_zero):
+	tst	r1,r0
+	mov.l	@(4,r15),r8
+	or	DBL0L,DBL0H
+	bf/s	LOCAL(zero)
+	add	#8,r15
+	tst	DBL0H,DBL0H
+	bt	LOCAL(nan)
+LOCAL(inf_nan_b):
+	mov	DBL1L,DBLRL
+	mov	DBL1H,DBLRH
+LOCAL(pop_ret):
+	mov.l	@r15+,DBL0H
+	add	DBLRH,DBLRH
+
+
+	div0s	DBL0H,DBL1H
+
+	rts
+	rotcr	DBLRH
+
+	.balign	4
+/* Argument a has already been tested for being zero or denorm.
+   On the other side, we have to swap a and b so that we can share the
+   normalization code.
+   a: sign/exponent : @r15 fraction: DBL0H:DBL0L
+   b: sign/exponent: DBL1H fraction:    r0:DBL1L  */
+LOCAL(inf_nan_denorm_or_zero_b):
+	sub	r3,r1		! 0x7ff00000
+	mov.l	@r15,r2		! get original DBL0H
+	tst	r1,DBL1H
+	sub	r3,r0		! isolate high fraction
+	bf	LOCAL(inf_nan_b)
+	mov.l	DBL1H,@r15
+	mov	r0,DBL0H
+	mov.l	r8,@-r15
+	mov	r2,DBL1H
+	mov.l	LOCAL(0xffff0000),r2
+	mov.l	r1,@-r15
+	mov	DBL1L,r1
+	mov	DBL0L,DBL1L
+	bra	LOCAL(normalize_arg)
+	mov	r1,DBL0L
+
+LOCAL(d20):
+	.word	20
+LOCAL(m15):
+	.word	-15
+LOCAL(m31):
+	.word	-31
+LOCAL(xff):
+	.word	0xff
+
+	.balign	4
+LOCAL(0xffff0000): .word 0xffff0000
+
+	! calculate a (DBL0H:DBL0L) * b (DBL1H:DBL1L)
+	.balign	4
+GLOBAL(muldf3):
+	mov.l	LOCAL(xfff00000),r3
+	mov	DBL1H,r0
+	dmulu.l	DBL0L,DBL1L
+	mov.l	LOCAL(x7fe00000),r1
+	sub	r3,r0
+	mov.l	DBL0H,@-r15
+	sub	r3,DBL0H
+	tst	r1,DBL0H
+	or	r3,DBL0H
+	mov.l	LOCAL(x001fffff),r2
+	bt	LOCAL(inf_nan_denorm_or_zero_a)
+	tst	r1,r0
+	or	r3,r0		! r0:DBL1L    := b fraction ; u12.52
+	bt	LOCAL(inf_nan_denorm_or_zero_b) ! T clear on fall-through
+LOCAL(arg_denorm_done):
+	and	r2,r0		! r0:DBL1L    := b fraction ; u12.52
+	sts	macl,r3
+	sts	mach,r1
+	dmulu.l	DBL0L,r0
+	and	r2,DBL0H	! DBL0H:DBL0L := a fraction ; u12.52
+	mov.l	r8,@-r15
+	mov	#0,DBL0L
+	mov.l	r9,@-r15
+	sts	macl,r2
+	sts	mach,r8
+	dmulu.l	DBL0H,DBL1L
+	addc	r1,r2
+
+	addc	DBL0L,r8	! add T; clears T
+
+	sts	macl,r1
+	sts	mach,DBL1L
+	dmulu.l	DBL0H,r0
+	addc	r1,r2
+	mov.l	LOCAL(x7ff00000),DBL0H
+	addc	DBL1L,r8	! clears T
+	mov.l	@(8,r15),DBL1L	! a sign/exp w/fraction
+	sts	macl,DBLRL
+	sts	mach,DBLRH
+	and	DBL0H,DBL1L	! a exponent
+	mov.w	LOCAL(x200),r9
+	addc	r8,DBLRL
+	mov.l	LOCAL(x3ff00000),r8	! bias
+	addc	DBL0L,DBLRH	! add T
+	cmp/hi	DBL0L,r3	! 32 guard bits -> sticky: T := r3 != 0
+	movt	r3
+	tst	r9,DBLRH	! T := fraction < 2
+	or	r3,r2		! DBLRH:DBLRL:r2 := result fraction; u24.72
+	bt/s	LOCAL(shll12)
+	sub	r8,DBL1L
+	mov.l	LOCAL(x002fffff),r8
+	and	DBL1H,DBL0H	! b exponent
+	mov.l	LOCAL(x00100000),r9
+	add	DBL0H,DBL1L ! result exponent - 1
+	tst	r8,r2
+	mov.w	LOCAL(m20),r8
+	subc	DBL0L,r9
+	addc	r2,r9 ! r2 value is still needed for denormal rounding
+	mov.w	LOCAL(d11),DBL0L
+	rotcr	r9
+	clrt
+	shld	r8,r9
+	mov.w	LOCAL(m21),r8
+	mov	DBLRL,r3
+	shld	DBL0L,DBLRL
+	addc	r9,DBLRL
+	mov.l	@r15+,r9
+	shld	r8,r3
+	mov.l	@r15+,r8
+	shld	DBL0L,DBLRH
+	mov.l	@r15+,DBL0H
+	addc	r3,DBLRH
+	mov.l	LOCAL(x7ff00000),DBL0L
+	add	DBL1L,DBLRH	! implicit 1 adjusts exponent
+	mov.l	LOCAL(xffe00000),r3
+	cmp/hs	DBL0L,DBLRH
+	add	DBLRH,DBLRH
+	bt	LOCAL(ill_exp_11)
+	tst	r3,DBLRH
+	bt	LOCAL(denorm_exp0_11)
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+
+
+LOCAL(shll12):
+	mov.l	LOCAL(x0017ffff),r8
+	extu.b	DBLRH,DBLRH	! remove implicit 1.
+	mov.l	LOCAL(x00080000),r9
+	and	DBL1H,DBL0H	! b exponent
+	add	DBL0H,DBL1L	! result exponent
+	tst	r8,r2		! rounding adjust for lower guard ...
+	mov.w	LOCAL(m19),r8
+	subc	DBL0L,r9	! ... bits and round to even; clear T
+	addc	r2,r9 ! r2 value is still needed for denormal rounding
+	mov.w	LOCAL(d12),DBL0L
+	rotcr	r9
+	clrt
+	shld	r8,r9
+	mov.w	LOCAL(m20),r8
+	mov	DBLRL,r3
+	shld	DBL0L,DBLRL
+	addc	r9,DBLRL
+	mov.l	@r15+,r9
+	shld	r8,r3
+	mov.l	@r15+,r8
+	shld	DBL0L,DBLRH
+	mov.l	LOCAL(x7ff00000),DBL0L
+	addc	r3,DBLRH
+	mov.l	@r15+,DBL0H
+	add	DBL1L,DBLRH
+	mov.l	LOCAL(xffe00000),r3
+	cmp/hs	DBL0L,DBLRH
+	add	DBLRH,DBLRH
+	bt	LOCAL(ill_exp_12)
+	tst	r3,DBLRH
+	bt	LOCAL(denorm_exp0_12)
+LOCAL(insert_sign):
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+
+LOCAL(overflow):
+	mov	r3,DBLRH
+	mov	#0,DBLRL
+	bra	LOCAL(insert_sign)
+	mov.l	@r15+,r8
+
+LOCAL(denorm_exp0_11):
+	mov.l	r8,@-r15
+	mov	#-21,r8
+	mov.l	r9,@-r15
+	bra	LOCAL(denorm)
+	mov	#-2,DBL1L	! one for denormal, and one for sticky bit
+
+LOCAL(ill_exp_11):
+	mov	DBL1H,DBL1L
+	and	r3,DBL0L	! 0x7fe00000
+	add	DBL1L,DBL1L
+	mov.l	r8,@-r15
+	cmp/hi	DBL1L,DBL0L	! check if exp a was large
+	mov	#-20,DBL0L
+	bf	LOCAL(overflow)
+	mov	#-21,r8
+	mov	DBLRH,DBL1L
+	rotcr	DBL1L		! shift in negative sign
+	mov.l	r9,@-r15
+	shad	DBL0L,DBL1L	! exponent ; s32
+	bra	LOCAL(denorm)
+	add	#-2,DBL1L	! add one for denormal, and one for sticky bit
+
+LOCAL(denorm_exp0_12):
+	mov.l	r8,@-r15
+	mov	#-20,r8
+	mov.l	r9,@-r15
+	bra	LOCAL(denorm)
+	mov	#-2,DBL1L	! one for denormal, and one for sticky bit
+
+	.balign 4		! also aligns LOCAL(denorm)
+LOCAL(ill_exp_12):
+	and	r3,DBL0L	! 0x7fe00000
+	mov	DBL1H,DBL1L
+	add	DBL1L,DBL1L
+	mov.l	r8,@-r15
+	cmp/hi	DBL1L,DBL0L	! check if exp a was large
+	bf	LOCAL(overflow)
+	mov	DBLRH,DBL1L
+	rotcr	DBL1L		! shift in negative sign
+	mov	#-20,r8
+	shad	r8,DBL1L	! exponent ; s32
+	mov.l	r9,@-r15
+	add	#-2,DBL1L	! add one for denormal, and one for sticky bit
+LOCAL(denorm):
+	not	r3,r9		! 0x001fffff
+	mov.l	r10,@-r15
+	mov	r2,r10
+	shld	r8,r10	! 11 or 12 lower bit valid
+	and	r9,DBLRH ! Mask away vestiges of exponent.
+	add	#32,r8
+	sub	r3,DBLRH ! Make leading 1 explicit.
+	shld	r8,r2	! r10:r2 := unrounded result lowpart
+	shlr	DBLRH	! compensate for doubling at end of normal code
+	sub	DBLRL,r10	! reconstruct effect of previous rounding
+	exts.b	r10,r9
+	shad	r3,r10	! sign extension
+	mov	#0,r3
+	clrt
+	addc	r9,DBLRL	! Undo previous rounding.
+	mov.w	LOCAL(m32),r9
+	addc	r10,DBLRH
+	cmp/hi	r3,r2
+	rotcl	DBLRL	! fit in the rest of r2 as a sticky bit.
+	mov.l	@r15+,r10
+	rotcl	DBLRH
+	cmp/ge	r9,DBL1L
+	bt	LOCAL(small_norm_shift)
+	cmp/hi	r3,DBLRL
+	add	#32,DBL1L
+	movt	DBLRL
+	cmp/gt	r9,DBL1L
+	or	DBLRH,DBLRL
+	bt/s	LOCAL(small_norm_shift)
+	mov	r3,DBLRH
+	mov	r3,DBLRL	! exponent too negative to shift - return zero
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	.balign	4
+LOCAL(small_norm_shift):
+	mov	DBLRL,r2	! stash away guard bits
+	shld	DBL1L,DBLRL
+	mov	DBLRH,DBL0L
+	shld	DBL1L,DBLRH
+	mov.l	LOCAL(x7fffffff),r9
+	add	#32,DBL1L
+	shld	DBL1L,r2
+	shld	DBL1L,DBL0L
+	or	DBL0L,DBLRL
+	shlr	DBL0L
+	addc	r2,r9
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	addc	r3,DBLRL
+	addc	r3,DBLRH
+	div0s	DBL0H,DBL1H
+	add	DBLRH,DBLRH
+	rts
+	rotcr	DBLRH
+
+
+LOCAL(x200):
+	.word 0x200
+LOCAL(m19):
+	.word	-19
+LOCAL(m20):
+	.word	-20
+LOCAL(m21):
+	.word	-21
+LOCAL(m32):
+	.word	-32
+LOCAL(d11):
+	.word	11
+LOCAL(d12):
+	.word	12
+	.balign	4
+LOCAL(x60000000):
+	.long	0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+LOCAL(xfff00000):
+	.long	0xfff00000
+LOCAL(x7fffffff):
+	.long	0x7fffffff
+LOCAL(x00100000):
+	.long	0x00100000
+LOCAL(x7fe00000):
+	.long	0x7fe00000
+LOCAL(x001fffff):
+	.long	0x001fffff
+LOCAL(x7ff00000):
+	.long	0x7ff00000
+LOCAL(x3ff00000):
+	.long	0x3ff00000
+LOCAL(x002fffff):
+	.long	0x002fffff
+LOCAL(xffe00000):
+	.long	0xffe00000
+LOCAL(x0017ffff):
+	.long	0x0017ffff
+LOCAL(x00080000):
+	.long	0x00080000
+ENDFUNC(GLOBAL(muldf3))
Index: gcc/config/sh/IEEE-754/m3/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsidf.S	(revision 0)
@@ -0,0 +1,98 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! floatsidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsidf))
+	.global GLOBAL(floatsidf)
+	.balign	4
+GLOBAL(floatsidf):
+	tst	r4,r4
+	mov	r4,r1
+	bt	LOCAL(ret0)
+	cmp/pz	r4
+	bt	0f
+	neg	r4,r1
+0:	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r1,r5
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r1,r5
+	mov	#21,r2
+	bt	0f
+	mov	r1,r5
+	shlr16	r5
+	add	#-16,r2
+0:	tst	r3,r5	! 0xff00
+	bt	0f
+	shlr8	r5
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r5
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r5),r5
+	cmp/pz	r4
+	mov.l	LOCAL(x41200000),r3	! bias + 20 - implicit 1
+	bt	0f
+	mov.l	LOCAL(xc1200000),r3	! sign + bias + 20 - implicit 1
+0:	mov	r1,r0	! DBLRL & DBLRH
+	sub	r5,r2
+	mov	r2,r5
+	shld	r2,DBLRH
+	cmp/pz	r2
+	add	r3,DBLRH
+	add	#32,r2
+	shld	r2,DBLRL
+	bf	0f
+	mov.w	LOCAL(d0),DBLRL
+0:	mov	#20,r2
+	shld	r2,r5
+	rts
+	sub	r5,DBLRH
+LOCAL(ret0):
+	mov	#0,DBLRL
+	rts
+	mov	#0,DBLRH
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0):	  .word 0
+		  .word 0x4120
+#else
+		  .word 0x4120
+LOCAL(d0):	  .word 0
+#endif
+LOCAL(xc1200000): .long 0xc1200000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsidf))
Index: gcc/config/sh/IEEE-754/m3/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixdfsi.S	(revision 0)
@@ -0,0 +1,110 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!! fixdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixdfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get UINT_MAX, for set sign bit, you get 0.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixdfsi)
+	FUNC(GLOBAL(fixdfsi))
+	.balign	4
+GLOBAL(fixdfsi):
+	mov.w	LOCAL(x413),r1
+	mov	DBL0H,r0
+	shll	DBL0H
+	mov.l	LOCAL(mask),r3
+	mov	#-21,r2
+	shld	r2,DBL0H	! SH4-200 will start this insn in a new cycle
+	bt/s	LOCAL(neg)
+	sub	r1,DBL0H
+	cmp/pl	DBL0H		! SH4-200 will start this insn in a new cycle
+	and	r3,r0
+	bf/s	LOCAL(ignore_low)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#10,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmax)
+	shld	DBL0H,DBL0L
+	rts
+	or	DBL0L,r0
+
+	.balign	8
+LOCAL(ignore_low):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	bf	0f		! SH4-200 will start this insn in a new cycle
+	mov	#-31,DBL0H	! results in 0 return
+0:	add	#1,r0
+	rts
+	shld	DBL0H,r0
+
+	.balign 4
+LOCAL(neg):
+	cmp/pl	DBL0H
+	and	r3,r0
+	bf/s	LOCAL(ignore_low_neg)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#10,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmin)
+	shld	DBL0H,DBL0L
+	or	DBL0L,r0	! SH4-200 will start this insn in a new cycle
+	rts
+	neg	r0,r0
+
+	.balign 4
+LOCAL(ignore_low_neg):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	add	#1,r0
+	shld	DBL0H,r0
+	bf	0f
+	mov	#0,r0		! results in 0 return
+0:	rts
+	neg	r0,r0
+
+LOCAL(retmax):
+	mov	#-1,r0
+	rts
+	shlr	r0
+
+LOCAL(retmin):
+	mov	#1,r0
+	rts
+	rotr	r0
+
+LOCAL(x413): .word 0x413
+
+	.balign 4
+LOCAL(mask): .long 0x000fffff
+	ENDFUNC(GLOBAL(fixdfsi))
+#endif /* L_fixdfsi */
Index: gcc/config/sh/IEEE-754/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divdf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/divdf3.S	(revision 0)
@@ -0,0 +1,593 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!division of two double precision floating point numbers
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:dividend
+!
+!r6,r7:divisor
+!
+!Exit:
+!r0,r1:quotient
+
+!Notes: dividend is passed in regs r4 and r5 and divisor is passed in regs 
+!r6 and r7, quotient is returned in regs r0 and r1. dividend is referred as op1
+!and divisor as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align	5
+	.global	GLOBAL (divdf3)
+	FUNC (GLOBAL (divdf3))
+
+GLOBAL (divdf3):
+
+#ifdef  __LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r5,r4
+	mov	r1,r5
+
+        mov     r6,r1
+        mov     r7,r6
+        mov     r1,r7
+#endif
+	mov	r4,r2
+	mov.l	.L_inf,r1
+
+	and	r1,r2
+	mov.l   r8,@-r15
+
+	cmp/eq	r1,r2
+	mov     r6,r8
+
+	bt	.L_a_inv
+	and	r1,r8
+
+	cmp/eq	r1,r8
+	mov.l	.L_high_mant,r3
+
+	bf	.L_chk_zero
+	and	r6,r3
+
+	mov.l   .L_mask_sign,r8	
+	cmp/pl	r7
+
+	mov	r8,r0
+	bt	.L_ret_b	!op2=NaN,return op2
+
+	and	r4,r8
+	cmp/pl	r3
+
+	and	r6,r0
+	bt	.L_ret_b	!op2=NaN,return op2
+
+	xor     r8,r0           !op1=normal no,op2=Inf, return Zero
+	mov     #0,r1
+	
+#ifdef __LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_ret_b:
+	mov	r7,r1
+	mov     r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l   @r15+,r8
+
+.L_a_inv:
+	!chk if op1 is Inf or NaN
+	mov.l   .L_high_mant,r2
+	cmp/pl  r5
+
+	and	r4,r2
+	bt	.L_ret_a
+
+	and	r1,r8		!r1 contains infinity
+	cmp/pl	r2
+
+	bt	.L_ret_a
+	cmp/eq	r1,r8
+
+	mov	r1,DBLRH
+	add	DBLRH,DBLRH
+	bf	0f
+	mov	#-1,DBLRH	! Inf/Inf, return NaN.
+0:	div0s	r4,r6
+	mov.l   @r15+,r8	
+	rts
+	rotcr	DBLRH
+
+.L_ret_a:
+	!return op1
+	mov	r5,r1
+	mov	r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+        mov.l   @r15+,r8
+
+.L_chk_zero:
+	!chk if op1=0
+	mov.l   .L_mask_sign,r0
+        mov     r4,r3
+
+        and     r0,r3
+        shll    r4
+
+        and     r6,r0
+        shlr    r4
+
+        xor     r3,r0
+        shll    r6
+
+	shlr	r6
+	tst	r4,r4
+
+
+	bf      .L_op1_not_zero	
+	tst	r5,r5
+	
+        bf      .L_op1_not_zero
+	tst	r7,r7
+
+	mov.l   @r15+,r8
+	bf	.L_ret_zero
+
+	tst	r6,r6
+	bf	.L_ret_zero
+
+	rts
+	mov     #-1,DBLRH       !op1=op2=0, return NaN
+	
+.L_ret_zero:
+	!return zero
+	mov	r0,r1
+	rts
+#ifdef __LITTLE__ENDIAN
+	mov	#0,r0
+#else
+	mov	#0,r1		!op1=0,op2=normal no,return zero
+#endif
+
+.L_norm_b:
+	!normalize op2
+        shll    r7
+        mov.l   .L_imp_bit,r3
+
+        rotcl   r6
+        tst     r3,r6
+
+        add     #-1,r8
+        bt      .L_norm_b
+
+        bra     .L_divide
+        add     #1,r8
+
+.L_op1_not_zero:
+	!op1!=0, chk if op2=0
+	tst	r7,r7	
+	mov	r1,r3
+	
+	mov	#0,r1
+	bf	.L_normal_nos
+
+	tst	r6,r6
+	bf      .L_normal_nos
+
+	mov.l   @r15+,r8
+	or	r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	nop
+
+.L_normal_nos:
+	!op1 and op2 are normal nos
+	tst	r2,r2
+	mov	#-20,r1
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld    r1,r2
+#else
+	SHLR20 (r2)
+#endif
+	bt	.L_norm_a	!normalize dividend
+	
+.L_chk_b:
+	mov.l	r9,@-r15
+	tst	r8,r8
+
+        mov.l   .L_high_mant,r9
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r8
+#else
+        SHLR20 (r8)
+#endif
+				! T set -> normalize divisor
+	SL(bt,	.L_norm_b,
+	 and	r9,r4)
+
+.L_divide:
+	mov.l   .L_2047,r1
+	sub	r8,r2
+
+	mov.l	.L_1023,r8
+	and	r9,r6
+
+	!resultant exponent
+	add	r8,r2
+	!chk the exponent for overflow
+	cmp/ge	r1,r2
+	
+	mov.l	.L_imp_bit,r1
+	bt	.L_overflow
+	
+	mov	#0,r8
+	or	r1,r4
+	
+	or      r1,r6	
+	mov	#-24,r3
+
+	!chk if the divisor is 1(mantissa only)
+	cmp/eq	r8,r7
+	bf	.L_div2
+
+	cmp/eq	 r6,r1
+	bt	.L_den_one
+
+.L_div2:
+	!divide the mantissas
+	shll8	r4
+	mov	r5,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r9
+#else
+        SHLR24 (r9)
+#endif
+	shll8	r6
+
+	or	r9,r4
+	shll8   r5
+
+	mov	r7,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r9
+#else
+        SHLR24 (r9)
+#endif
+	mov	r8,r3
+	shll8	r7
+
+	or	r9,r6	
+	cmp/gt	r4,r6
+
+	mov	r3,r9
+	bt	.L_shift
+
+	cmp/eq	r4,r6
+	bf	.L_loop
+
+	cmp/gt	r5,r7
+	bf	.L_loop
+
+.L_shift:
+	add	#-1,r2
+	shll	r5
+	rotcl	r4
+
+.L_loop:
+	!actual division loop
+	cmp/gt	r6,r4
+	bt	.L_subtract
+
+	cmp/eq	r6,r4
+	bf	.L_skip
+
+	cmp/ge	r7,r5
+	bf	.L_skip
+
+.L_subtract:
+	clrt
+	subc	r7,r5
+	
+	or	r1,r8
+	subc	r6,r4
+
+.L_skip:
+	shlr	r1
+	shll	r5
+
+	rotcl	r4
+	cmp/eq	r1,r3
+
+	bf	.L_loop
+	mov.l	.L_imp_bit,r1
+
+	!chk if the divison was for the higher word of the quotient
+	tst	r1,r9
+	bf	.L_chk_exp
+
+	mov	r8,r9
+	mov.l   .L_mask_sign,r1
+
+	!divide for the lower word of the quotient
+	bra	.L_loop
+	mov	r3,r8
+
+.L_chk_exp:
+	!chk if the result needs to be denormalized
+	cmp/gt	r2,r3
+	bf	.L_round
+	mov     #-53,r7
+
+.L_underflow:
+	!denormalize the result
+	add	#1,r2
+	cmp/gt	r2,r7
+
+	or      r4,r5           !remainder
+	add	#-2,r2
+
+	mov	#32,r4
+	bt      .L_return_zero
+
+	add	r2,r4
+	cmp/ge	r3,r4
+
+	mov	r2,r7
+	mov	r3,r1
+
+	mov     #-54,r2
+	bt	.L_denorm
+	mov	#-32,r7
+
+.L_denorm:
+	shlr	r8
+	rotcr	r1
+
+	shll	r8
+	add     #1,r7
+
+	shlr	r9
+	rotcr	r8
+
+	cmp/eq	r3,r7
+	bf	.L_denorm
+
+	mov	r4,r7
+	cmp/eq	r2,r4
+
+	bt	.L_break
+	mov     r3,r6
+
+	cmp/gt	r7,r3
+	bf	.L_break
+
+	mov	r2,r4
+	mov	r1,r6
+
+	mov	r3,r1
+	bt	.L_denorm
+
+.L_break:
+	mov     #0,r2
+
+	cmp/gt	r1,r2
+
+	addc	r2,r8
+	mov.l   .L_comp_1,r4
+
+	addc	r3,r9		
+	or	r9,r0
+
+	cmp/eq	r5,r3
+	bf	.L_return	
+
+	cmp/eq	r3,r6
+	mov.l	.L_mask_sign,r7
+
+	bf	.L_return
+	cmp/eq	r7,r1
+
+	bf	.L_return
+	and	r4,r8
+
+.L_return:
+	mov.l	@r15+,r9
+	mov     r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_norm_a:
+        !normalize op1
+        shll    r5
+        mov.l   .L_imp_bit,r3
+
+        rotcl   r4
+        tst     r3,r4
+
+        add     #-1,r2
+        bt      .L_norm_a
+
+        bra     .L_chk_b
+        add     #1,r2
+
+.L_overflow:
+	!overflow, return inf
+	mov.l   .L_inf,r2
+#ifdef __LITTLE_ENDIAN__
+	or	r2,r1	
+	mov	#0,r0
+#else
+	or	r2,r0
+	mov	#0,r1
+#endif
+        mov.l   @r15+,r9
+        rts
+        mov.l   @r15+,r8
+
+.L_den_one:
+	!denominator=1, result=numerator
+        mov     r4,r9
+        mov   	#-53,r7
+
+	cmp/ge	r2,r8
+	mov	r8,r4
+
+	mov	r5,r8
+	mov	r4,r3
+
+	!chk the exponent for underflow
+	SL(bt,	.L_underflow,
+	 mov     r4,r5)
+
+	mov.l	.L_high_mant,r7
+        bra     .L_pack
+	mov     #20,r6
+
+.L_return_zero:
+	!return zero
+	mov	r3,r1
+	mov.l	@r15+,r9
+
+	rts
+	mov.l   @r15+,r8
+
+.L_round:
+	!apply rounding
+	cmp/eq	r4,r6
+	bt	.L_lower
+
+	clrt
+	subc    r6,r4
+
+	bra     .L_rounding
+	mov	r4,r6
+	
+.L_lower:
+	clrt
+	subc	r7,r5
+	mov	r5,r6
+	
+.L_rounding:
+	!apply rounding
+	mov.l   .L_invert,r1
+	mov	r3,r4
+
+	movt	r3
+	clrt
+	
+	not	r3,r3
+	and	r1,r3	
+
+	addc	r3,r8
+	mov.l   .L_high_mant,r7
+
+	addc	r4,r9
+	cmp/eq	r4,r6
+
+	mov.l   .L_comp_1,r3
+	SL (bf,	.L_pack,
+	 mov     #20,r6)
+	and	r3,r8
+
+.L_pack:
+	!pack the result, r2=exponent,r0=sign,r8=lower mantissa, r9=higher mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld    r6,r2
+#else
+        SHLL20 (r2)
+#endif
+	and	r7,r9
+
+	or	r2,r0
+	mov	r8,r1
+
+	or      r9,r0
+	mov.l	@r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+	.align	2
+
+.L_mask_sign:
+	.long	0x80000000
+.L_high_mant:
+	.long	0x000fffff
+.L_inf:
+	.long	0x7ff00000
+.L_1023:
+	.long	1023
+.L_2047:
+	.long	2047
+.L_imp_bit:
+	.long	0x00100000	
+.L_comp_1:
+	.long	0xfffffffe
+.L_invert:
+	.long	0x00000001
+
+ENDFUNC (GLOBAL (divdf3))
Index: gcc/config/sh/IEEE-754/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatunssisf.S	(revision 0)
@@ -0,0 +1,132 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of unsigned integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatunsisf)
+	FUNC (GLOBAL (floatunsisf))
+
+GLOBAL (floatunsisf):
+	tst	r4,r4
+	mov	#23,r6
+
+	mov.l	.L_set_24_bits,r7
+	SL(bt,	.L_return,
+	 not	r7,r3)
+
+	! Decide the direction for shifting
+	mov.l	.L_set_24_bit,r5
+	cmp/hi	r7,r4
+
+	not	r5,r2
+	SL(bt,	.L_shift_right,
+	 mov	#0,r7)
+
+	tst	r5,r4
+	
+	mov	#0,r0
+	bf	.L_pack_sf
+
+! Shift the bits to the left. Adjust the exponent
+.L_shift_left:
+	shll	r4
+	tst	r5,r4
+
+	add	#-1,r6
+	bt	.L_shift_left
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa
+.L_pack_sf:
+	mov	#23,r3
+	add	#127,r6
+
+	! Align the exponent
+	and	r2,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+        SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+
+	or	r6,r0
+	rts
+	or	r4,r0
+
+! Shift right the number with rounding
+.L_shift_right:
+	shlr	r4
+	rotcr	r7
+
+	tst	r4,r3
+	add	#1,r6
+
+	bf	.L_shift_right
+	
+	tst	r7,r7
+	bt	.L_sh_rt_1
+
+	shll	r7
+	movt	r1
+
+	add	r1,r4
+
+	tst	r7,r7
+	bf	.L_sh_rt_1
+
+	! Halfway between two numbers.
+	! Round towards LSB = 0
+	shlr	r4
+	shll	r4
+
+.L_sh_rt_1:
+	mov	r4,r0
+
+	! Rounding may have misplaced MSB. Adjust.
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	bf	.L_shift_right
+	bt	.L_pack_sf
+
+.L_return:
+	rts
+	mov	r4,r0
+
+	.align 2
+.L_set_24_bit:
+	.long 0x00800000
+
+.L_set_24_bits:
+	.long 0x00FFFFFF
+
+ENDFUNC (GLOBAL (floatunsisf))
Index: gcc/config/sh/IEEE-754/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunsdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixunsdfsi.S	(revision 0)
@@ -0,0 +1,176 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to unsigned integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global GLOBAL (fixunsdfsi)
+	FUNC (GLOBAL (fixunsdfsi))
+
+GLOBAL (fixunsdfsi):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+#endif
+	mov.l	.L_p_inf,r2
+	mov     #-20,r1
+	
+	mov	r2,r7
+	mov.l   .L_1023,r3
+
+	and	r4,r2
+	shll    r4
+
+        movt    r6		! r6 contains the sign bit
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r2           ! r2 contains the exponent
+#else
+        SHLR20 (r2)
+#endif
+	shlr    r4
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r7
+#else
+        SHLR20 (r7)
+#endif
+	tst	r6,r6	
+	SL(bf,	.L_epil,
+	 mov	#0,r0)
+
+	cmp/hi	r2,r3		! if exp < 1023,return 0
+	mov.l	.L_high_mant,r1
+
+	SL(bt,	.L_epil,
+	 and	r4,r1)		! r1 contains high mantissa
+
+	cmp/eq	r2,r7		! chk if exp is invalid
+	mov.l	.L_1054,r7
+
+	bt	.L_inv_exp
+	mov	#11,r0
+	
+	cmp/hi	r7,r2		! If exp > 1054,return maxint
+	sub     r2,r7		!r7 contains the number of shifts
+
+	mov.l	.L_21bit,r2
+	bt	.L_ret_max
+
+	or	r2,r1
+	mov	r7,r3
+
+	shll8   r1
+	neg     r7,r7
+
+	shll2	r1
+
+        shll	r1
+	cmp/hi	r3,r0
+
+	SL(bt,	.L_lower_mant,
+	 mov	#21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_sh_loop:
+        tst	r7,r7
+        bt      .L_break
+        add     #1,r7
+        bra     .L_sh_loop
+        shlr    r1
+
+.L_break:
+#endif
+	rts
+	mov     r1,r0
+
+.L_lower_mant:
+	neg	r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r0,r5
+#else
+        SHLR21 (r5)
+#endif
+	or	r5,r1		!pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_loop:
+        tst	r7,r7
+        bt      .L_break1
+        add     #1,r7
+        bra     .L_loop
+        shlr    r1
+
+.L_break1:
+#endif
+	mov	r1,r0
+.L_epil:
+	rts
+	nop
+
+.L_inv_exp:
+	cmp/hi	r0,r5
+	bt	.L_epil
+
+	cmp/hi	r0,r1		!compare high mantissa,r1
+	bt	.L_epil
+
+.L_ret_max:
+	mov.l   .L_maxint,r0
+
+	rts
+	nop
+
+	.align	2
+
+.L_maxint:
+	.long	0xffffffff
+.L_p_inf:
+	.long	0x7ff00000
+.L_high_mant:
+	.long	0x000fffff
+.L_1023:
+	.long	0x000003ff
+.L_1054:
+	.long	1054
+.L_21bit:
+	.long	0x00100000
+
+ENDFUNC (GLOBAL (fixunsdfsi))
Index: gcc/config/sh/IEEE-754/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/adddf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/adddf3.S	(revision 0)
@@ -0,0 +1,786 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for adding two double numbers
+
+! Author: Rakesh Kumar
+! SH1 Support by Joern Rennecke
+! Sticky Bit handling : Joern Rennecke
+
+! Arguments: r4-r5, r6-r7
+! Result: r0-r1
+
+! The value in r4-r5 is referred to as op1
+! and that in r6-r7 is referred to as op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+        .align 5
+	.global	GLOBAL (subdf3)
+	FUNC (GLOBAL (subdf3))
+        .global GLOBAL (adddf3)
+	FUNC (GLOBAL (adddf3))
+
+GLOBAL (subdf3):
+#ifdef __LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r6,r2
+
+	mov	r5,r4
+	mov	r7,r6
+
+	mov	r1,r5
+	mov	r2,r7
+#endif
+	mov.l	.L_sign,r2
+	bra	.L_adddf3_1
+	xor	r2,r6
+
+GLOBAL (adddf3):
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r6,r2
+
+	mov	r5,r4
+	mov	r7,r6
+
+	mov	r1,r5
+	mov	r2,r7
+#endif
+	
+.L_adddf3_1:
+	mov.l	r8,@-r15
+	mov	r4,r1
+
+	mov.l 	.L_inf,r2
+	mov	r6,r3
+
+	mov.l	r9,@-r15
+	and	r2,r1		!Exponent of op1 in r1
+
+	mov.l	r10,@-r15
+	and	r2,r3		!Exponent of op2 in r3
+
+	! Check for Nan or Infinity
+	mov.l	.L_sign,r9
+	cmp/eq	r2,r1
+
+	mov	r9,r10
+	bt	.L_thread_inv_exp_op1
+
+	mov	r9,r0
+	cmp/eq	r2,r3
+! op1 has a valid exponent. We need not check it again.
+! Return op2 straight away.
+	and	r4,r9		!r9 has sign bit for op1
+	bt	.L_ret_op2
+
+	! Check for -ve zero
+	cmp/eq	r4,r0
+	and	r6,r10		!r10 has sign bit for op2
+
+	bt	.L_op1_nzero
+
+	cmp/eq	r6,r0
+	bt	.L_op2_nzero
+
+! Check for zero
+.L_non_zero:
+	tst	r4,r4
+	bt	.L_op1_zero
+
+	! op1 is not zero, check op2 for zero
+	tst	r6,r6
+	bt	.L_op2_zero
+
+! r1 and r3 has masked out exponents, r9 and r10 has signs
+.L_add:
+	mov.l	.L_high_mant,r8
+	mov	#-20,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r1		! r1 now has exponent for op1 in its lower bits
+#else
+	SHLR20 (r1)
+#endif
+	and	r8,r6	! Higher bits of mantissa of op2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3		! r3 has exponent for op2 in its lower bits
+#else
+	SHLR20 (r3)
+#endif
+	and	r8,r4	! Higher bits of mantissa of op1
+
+	mov.l	.L_21bit,r8
+
+	tst	r1,r1
+	bt	.L_norm_op1
+
+	! Set the 21st bit.
+	or	r8,r4
+	tst	r3,r3
+
+	bt	.L_norm_op2
+	or	r8,r6
+
+! Check for negative mantissas. Make them positive by negation
+! r9 and r10 have signs of op1 and op2 respectively
+.L_neg_mant:
+	tst	r9,r9
+	bf	.L_neg_op1
+
+	tst	r10,r10
+	bf	.L_neg_op2
+
+.L_add_1:
+	cmp/ge	r1,r3
+
+	mov	r1,r0
+	bt	.L_op2_exp_greater
+
+	sub	r3,r0
+	! If exponent difference is greater than 54, the resultant exponent
+	! won't be changed. Return op1 straight away.
+	mov	#54,r2
+	cmp/gt	r2,r0
+
+	bt	.L_pack_op1
+
+	mov	r1,r3
+	clrt
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+
+	! Shift left the first operand and apply rest of shifts to second operand.
+	mov	#0,r2
+	shll	r5
+
+	rotcl	r4
+
+	add	#-1,r3
+	dt	r0
+
+	bt	.L_add_mant
+	dt	r0
+
+	bt	LOCAL(got_guard)
+	dt	r0
+
+	bt	LOCAL(got_sticky)
+
+! Shift the mantissa part of op2 so that both exponents are equal
+.L_shfrac_op2:
+	shar	r6
+	or	r7,r2	! sticky bit
+
+	rotcr	r7
+	dt	r0
+
+	bf	.L_shfrac_op2
+
+	shlr	r2
+
+	subc	r2,r2	! spread sticky bit across r2
+LOCAL(got_sticky):
+	shar	r6
+
+	rotcr	r7
+
+	rotcr	r2
+LOCAL(got_guard):
+	shar	r6
+
+	rotcr	r7
+
+	rotcr	r2
+
+
+! Add the psotive mantissas and check for overflow by checking the
+! MSB of the resultant. In case of overflow, negate the result.
+.L_add_mant:
+	clrt
+	addc	r7,r5
+
+	mov	#0,r10	! Assume resultant to be positive
+	addc	r6,r4
+
+	cmp/pz	r4
+
+	bt	.L_mant_ptv
+	negc	r2,r2
+
+	negc	r5,r5
+
+	mov.l	.L_sign,r10 ! The assumption was wrong, result is negative
+	negc	r4,r4
+
+! 23rd bit in the high part of mantissa could be set.
+! In this case, right shift the mantissa.
+.L_mant_ptv:
+	mov.l	.L_23bit,r0
+
+	tst	r4,r0
+	bt	.L_mant_ptv_0
+
+	shlr	r4
+	rotcr	r5
+
+	add	#1,r3
+	bra	.L_mant_ptv_1
+	rotcr	r2
+
+.L_mant_ptv_0:
+	mov.l	.L_22bit,r0
+	tst	r4,r0
+
+	bt	.L_norm_mant
+
+.L_mant_ptv_1:
+	! 22 bit of resultant mantissa is set. Shift right the mantissa
+	! and add 1 to exponent
+	add	#1,r3
+	shlr	r4
+	rotcr	r5
+	! The mantissa is already normalized. We don't need to
+	! spend any effort. Branch to epilogue. 
+	bra	.L_epil
+	rotcr	r2
+
+! Normalize operands
+.L_norm_op1:
+	shll	r5
+
+	rotcl	r4
+	add	#-1,r1
+
+	tst	r4,r8
+	bt	.L_norm_op1
+
+	tst	r3,r3
+	SL(bf,	.L_neg_mant,
+	 add	#1,r1)
+
+.L_norm_op2:
+	shll	r7
+
+	rotcl	r6
+	add	#-1,r3
+
+	tst	r6,r8
+	bt	.L_norm_op2
+
+	bra	.L_neg_mant
+	add	#1,r3
+
+! Negate the mantissa of op1
+.L_neg_op1:
+	clrt
+	negc	r5,r5
+
+	negc	r4,r4
+	tst	r10,r10
+
+	bt	.L_add_1
+
+! Negate the mantissa of op2
+.L_neg_op2:
+	clrt
+	negc	r7,r7
+
+	bra	.L_add_1
+	negc	r6,r6
+
+! Thread the jump to .L_inv_exp_op1
+.L_thread_inv_exp_op1:
+	bra	.L_inv_exp_op1
+	nop
+
+.L_ret_op2:
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r6,r1
+#else
+	mov	r6,r0
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r7,r0
+#else
+	mov	r7,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+.L_op1_nzero:
+	tst	r5,r5
+	bt	.L_ret_op2
+
+	! op1 is not zero. Check op2 for negative zero
+	cmp/eq	r6,r0
+	bf	.L_non_zero	! both op1 and op2 are not -0
+
+.L_op2_nzero:
+	tst	r7,r7
+	bf	.L_non_zero
+
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+	mov	r4,r0	! op2 is -0, return op1
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+	mov	r5,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! High bit of op1 is known to be zero.
+! Check low bit. r2 contains 0x00000000
+.L_op1_zero:
+	tst	r5,r5
+	bt	.L_ret_op2
+
+	! op1 is not zero. Check high bit of op2
+	tst	r6,r6
+	bf	.L_add	! both op1 and op2 are not zero
+
+! op1 is not zero. High bit of op2 is known to be zero.
+! Check low bit of op2. r2 contains 0x00000000
+.L_op2_zero:
+	tst	r7,r7
+	bf	.L_add
+
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+	mov	r4,r0	! op2 is zero, return op1
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+	mov	r5,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! exp (op1) is smaller or equal to exp (op2)
+! The logic of same operations is present in .L_add. Kindly refer it for
+! comments
+.L_op2_exp_greater:
+	mov	r3,r0
+	sub	r1,r0
+
+	mov	#54,r2
+	cmp/gt	r2,r0
+
+	bt	.L_pack_op2
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+
+	mov	#0,r2
+	shll	r7
+	rotcl	r6
+	add	#-1,r0
+	add	#-1,r3
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+.L_shfrac_op1:	
+        add     #-1,r0
+        shar    r4
+
+	rotcr	r5
+	rotcr	r2
+
+        cmp/eq  #0,r0
+        bf      .L_shfrac_op1
+
+	bra	.L_add_mant
+	nop
+
+! Return the value in op1
+.L_ret_op1:
+        mov.l   @r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+        mov     r4,r0
+#endif
+
+        mov.l   @r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+        mov     r5,r1
+#endif
+
+        rts
+        mov.l   @r15+,r8
+
+! r1 has exp, r9 has sign, r4 and r5 mantissa
+.L_pack_op1:
+	mov.l	.L_high_mant,r7
+	mov	r4,r0
+
+	tst	r9,r9
+	bt	.L_pack_op1_1
+
+	clrt
+	negc	r5,r5
+	negc	r0,r0
+
+.L_pack_op1_1:
+	and	r7,r0
+	mov	r1,r3
+
+	mov	#20,r2
+	mov	r5,r1
+
+	mov.l	@r15+,r10
+	or	r9,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+	mov.l	@r15+,r9
+
+	or	r3,r0
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+!r2 has exp, r10 has sign, r6 and r7 mantissa
+.L_pack_op2:
+	mov.l	.L_high_mant,r9
+	mov	r6,r0
+
+	tst	r10,r10
+	bt	.L_pack_op2_1
+
+	clrt
+	negc	r7,r7
+	negc	r0,r0
+
+.L_pack_op2_1:
+	and	r9,r0
+	mov	r7,r1
+
+	mov	#20,r2
+	or	r10,r0
+
+	mov.l	@r15+,r10
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+
+	mov.l	@r15+,r9
+
+	or	r3,r0
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+! Normalize the mantissa by setting its 21 bit in high part
+.L_norm_mant:
+	mov.l	.L_21bit,r0
+
+	tst	r4,r0
+	bf	.L_epil
+
+	tst	r4,r4
+	bf	.L_shift_till_1
+
+	tst	r5,r5
+	bf	.L_shift_till_1
+
+	! Mantissa is zero, return 0
+	mov.l	@r15+,r10
+	mov	#0,r0
+
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+
+	rts
+	mov	#0,r1
+
+! A loop for making the 21st bit 1 in high part of resultant mantissa
+! It is already ensured that 1 bit is present in the mantissa
+.L_shift_till_1:
+	clrt
+	shll	r5
+
+	rotcl	r4
+	add	#-1,r3
+
+	tst	r4,r0
+	bt	.L_shift_till_1
+
+! Return the result. Mantissa is in r4-r5. Exponent is in r3
+! Sign bit in r10
+.L_epil:
+	cmp/pl	r3
+
+	bf	.L_denorm
+	mov.l	LOCAL(x7fffffff),r0
+
+	mov	r5,r1
+	shlr	r1
+
+	mov	#0,r1
+	addc	r0,r2
+
+! Check extra MSB here
+	mov.l	.L_22bit,r9
+	addc	r1,r5	! round to even
+
+	addc	r1,r4
+	tst	r9,r4
+
+	bf	.L_epil_1
+
+.L_epil_0:
+	mov.l	.L_21bit,r1
+
+	not	r1,r1
+	and	r1,r4
+
+	mov	r4,r0
+	or	r10,r0
+
+	mov.l	@r15+,r10
+	mov	#20,r2
+
+	mov.l	@r15+,r9
+	mov	r5,r1
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+	or	r3,r0
+
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+.L_epil_1:
+	shlr	r4
+	add	#1,r3
+	bra	.L_epil_0
+	rotcr	r5
+
+.L_denorm:
+	add	#-1,r3
+.L_denorm_1:
+	tst	r3,r3
+	bt	.L_denorm_2
+
+	shlr	r4
+	rotcr	r5
+
+	movt	r1
+	bra	.L_denorm_1
+	add	#1,r3
+
+.L_denorm_2:
+	clrt
+	mov	#0,r2
+	addc	r1,r5
+
+	addc	r2,r4
+	mov	r4,r0
+
+	or	r10,r0
+	mov.l	@r15+,r10
+
+	mov	r5,r1
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+! op1 is known to be positive infinity, and op2 is Inf. The sign
+! of op2 is not known. Return the appropriate value
+.L_op1_pinf_op2_inf:
+	mov.l	.L_sign,r0
+	tst	r6,r0
+
+	bt	.L_ret_op2_1
+
+	! op2 is negative infinity. Inf - Inf is being performed
+	mov.l	.L_inf,r0
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	rts
+	mov	#-1,DBLRH	! return NaN.
+	
+.L_ret_op1_1:
+        mov.l   @r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+        mov     r4,r0
+#endif
+
+        mov.l   @r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+        mov     r5,r1
+#endif
+
+        rts
+        mov.l   @r15+,r8
+
+.L_ret_op2_1:
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r6,r1
+#else
+	mov	r6,r0
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r7,r0
+#else
+	mov	r7,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! op1 is negative infinity. Check op2 for infinity or Nan
+.L_op1_ninf:
+	cmp/eq	r2,r3
+	bf	.L_ret_op1_1	! op2 is neither Nan nor Inf
+
+	mov.l	@r15+,r9
+	div0s	r4,r6		! different signs -> NaN
+	mov	r4,DBLRH
+	or	r6,DBLRH
+	mov.l	@r15+,r8
+	SL(bf, 0f,
+	 mov	r5,DBLRL)
+	mov	#-1,DBLRH	! return NaN.
+0:	rts
+	or	r7,DBLRL
+
+!r1 contains exponent for op1, r3 contains exponent for op2
+!r2 has .L_inf (+ve Inf)
+!op1 has invalid exponent. Either it contains Nan or Inf
+.L_inv_exp_op1:
+	! Check if a is Nan
+	cmp/pl	r5
+	bt	.L_ret_op1_1
+
+	mov.l	.L_high_mant,r0
+	and	r4,r0
+
+	cmp/pl	r0
+	bt	.L_ret_op1_1
+
+	! op1 is not Nan. It is infinity. Check the sign of it.
+	! If op2 is Nan, return op2
+	cmp/pz	r4
+
+	bf	.L_op1_ninf
+
+	! op2 is +ve infinity here
+	cmp/eq	r2,r3
+	bf	.L_ret_op1_1	! op2 is neither Nan nor Inf
+
+	! r2 is free now
+	mov.l	.L_high_mant,r0
+	tst	r6,r0		! op2 also has invalid exponent
+
+	bf	.L_ret_op2_1	! branch if op2 is NaN
+
+	tst	r7,r7
+	bt	.L_op1_pinf_op2_inf	! op2 is Infinity, and op1 is +Infinity
+	!op2 is not infinity, It is Nan
+	bf	.L_ret_op2_1
+
+	.align 2	
+.L_high_mant:
+	.long 0x000FFFFF
+
+.L_21bits:
+	.long 0x001FFFFF
+
+.L_22bit:
+	.long 0x00200000
+
+.L_23bit:
+	.long 0x00400000
+
+.L_21bit:
+	.long 0x00100000
+
+.L_sign:
+	.long 0x80000000
+
+.L_inf:
+	.long 0x7ff00000
+
+LOCAL(x7fffffff): .long 0x7fffffff
+
+ENDFUNC (GLOBAL (subdf3))
+ENDFUNC (GLOBAL (adddf3))
Index: gcc/config/sh/IEEE-754/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatsisf.S	(revision 0)
@@ -0,0 +1,195 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatsisf)
+        FUNC (GLOBAL (floatsisf))
+
+GLOBAL (floatsisf):
+	mov.l	.L_sign,r2
+	mov	#23,r6
+
+	! Check for zero
+	tst	r4,r4
+	mov.l	.L_24_bits,r7
+
+	! Extract sign
+	and	r4,r2
+	bt	.L_ret
+
+	! Negative ???
+	mov.l	.L_imp_bit,r5
+	cmp/pl	r4
+
+	not	r7,r3
+	bf	.L_neg
+
+	! Decide the direction for shifting
+	cmp/gt	r7,r4
+	mov	r4,r0
+
+	and	r5,r0
+	bt	.L_shr_0
+
+	! Number may already be in normalized form
+	cmp/eq	#0,r0
+	bf	.L_pack
+
+! Shift the bits to the left. Adjust the exponent
+.L_shl:
+	shll	r4
+	mov	r4,r0
+
+	and	r5,r0
+	cmp/eq	#0,r0
+
+	SL(bt,	.L_shl,
+	 add	#-1,r6)
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa, r2 has sign
+.L_pack:
+	mov	#23,r3
+	not	r5,r5
+
+	mov	r2,r0
+	add	#127,r6
+
+	and	r5,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+
+	or	r6,r0
+	rts
+	or	r4,r0
+
+! Negate the number
+.L_neg:
+	! Take care for -2147483648.
+	mov	r4,r0
+	shll	r0
+	
+	cmp/eq	#0,r0
+	SL(bt,	.L_ret_min,
+	 neg	r4,r4)
+
+        cmp/gt  r7,r4
+        bt	.L_shr_0
+
+	mov	r4,r0
+	and	r5,r0
+
+	cmp/eq	#0,r0
+	bf	.L_pack
+	bt	.L_shl
+	
+.L_shr_0:
+	mov	#0,r1
+
+! Shift right the number with rounding
+.L_shr:
+	shlr	r4
+	movt	r7
+
+	tst	r7,r7
+
+	! Count number of ON bits shifted
+	bt	.L_shr_1
+	add	#1,r1
+
+.L_shr_1:
+	mov	r4,r0
+	add	#1,r6
+
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	! Add MSB of shifted bits
+	bf	.L_shr
+	add	r7,r4
+
+	tst	r7,r7
+	bt	.L_pack
+
+.L_pack1:
+	mov	#1,r0
+	cmp/eq	r1,r0
+
+	bt	.L_rnd
+	mov	r4,r0
+
+	! Rounding may have misplaced MSB. Adjust.
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	bf	.L_shr
+	bt	.L_pack
+
+! If only MSB of shifted bits is ON, we are halfway
+! between two numbers. Round towards even LSB of
+! resultant mantissa.
+.L_rnd:
+	shlr	r4
+	bra	.L_pack
+	shll	r4
+
+.L_ret:
+	rts
+	mov	r4,r0
+
+! Return value for -2147483648
+.L_ret_min:
+	mov.l	.L_min_val,r0
+	rts
+	nop
+
+	.align 2
+.L_sign:
+	.long 0x80000000
+
+.L_imp_bit:
+	.long 0x00800000
+
+.L_24_bits:
+	.long 0x00FFFFFF
+
+.L_nsign:
+	.long 0x7FFFFFFF
+
+.L_min_val:
+	.long 0xCF000000
+
+ENDFUNC (GLOBAL (floatsisf))
Index: gcc/config/sh/IEEE-754/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/muldf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/muldf3.S	(revision 0)
@@ -0,0 +1,596 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!multiplication of two double precision floating point numbers
+!Author:Aanchal Khanna
+!SH1 Support / Simplifications: Joern Rennecke
+!
+!Entry:
+!r4,r5:operand 1
+!
+!r6,r7:operand 2
+!
+!Exit:
+!r0,r1:result
+!
+!Notes: argument 1 is passed in regs r4 and r5 and argument 2 is passed in regs
+!r6 and r7, result is returned in regs r0 and r1. operand 1 is referred as op1
+!and operand 2 as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+	.text
+	.align	5	
+	.global	GLOBAL (muldf3)
+	FUNC (GLOBAL (muldf3))
+
+GLOBAL (muldf3):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+
+        mov     r6,r1
+        mov     r7,r6
+        mov     r1,r7
+#endif
+	mov.l	.L_mask_sign,r0
+	mov	r4,r2
+
+	and	r0,r2		
+	mov	#0,r1
+
+	shll	r4
+	and	r6,r0		
+	
+	xor     r2,r0		!r0 contains the result's sign bit
+	shlr	r4
+
+	mov.l   .L_inf,r2
+	shll	r6
+
+	mov	r4,r3
+	shlr	r6
+	
+.L_chk_a_inv:
+	!chk if op1 is Inf/NaN
+	and	r2,r3
+	mov.l	r8,@-r15
+
+	cmp/eq	r3,r2
+	mov.l	.L_mask_high_mant,r8
+
+	mov	r2,r3
+	bf	.L_chk_b_inv
+
+	mov	r8,r3
+	and	r4,r8
+
+	cmp/hi  r1,r8		
+	bt	.L_return_a	!op1 NaN, return op1
+
+	cmp/hi  r1,r5	
+	mov	r2,r8
+
+	bt      .L_return_a	!op1 NaN, return op1
+	and	r6,r8
+
+	cmp/eq	r8,r2		
+	and	r6,r3
+
+	bt      .L_b_inv
+	cmp/eq	r1,r6		
+
+	bf	.L_return_a	!op1 Inf,op2= normal no return op1
+	cmp/eq	r1,r7
+
+	bf	.L_return_a	!op1 Inf,op2= normal no return op1
+	mov.l   @r15+,r8	
+
+	rts
+	mov	#-1,DBLRH	!op1=Inf, op2=0,return nan
+
+.L_b_inv:
+	!op2 is NaN/Inf
+	cmp/hi	r1,r7
+	mov	r1,r2
+
+	mov	r5,r1
+	bt	.L_return_b	!op2=NaN,return op2
+
+	cmp/hi	r2,r6
+	or	r4,r0
+
+	bt	.L_return_b	!op2=NaN,return op2
+	mov.l   @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts			!op1=Inf,op2=Inf,return Inf with sign
+	nop
+
+.L_chk_b_inv:
+	!Chk if op2 is NaN/Inf
+	and	r6,r2
+	cmp/eq	r3,r2
+
+	bf	.L_chk_a_for_zero
+	and	r6,r8
+
+	cmp/hi	r1,r8
+	bt	.L_return_b	 !op2=NaN,return op2
+
+	cmp/hi	r1,r7
+	bt	.L_return_b	 !op2=NaN,return op2
+
+	cmp/eq	r5,r1
+	bf      .L_return_b	 !op1=normal number,op2=Inf,return Inf
+
+	mov	r7,r1
+	cmp/eq	r4,r1
+
+	bf	.L_return_b	/* op1=normal number, op2=Inf,return Inf */
+	mov.l   @r15+,r8
+
+	rts
+	mov	#-1,DBLRH	!op1=0,op2=Inf,return NaN
+
+.L_return_a:
+	mov	r5,r1
+	or	r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l   @r15+,r8
+
+.L_return_b:
+	mov	r7,r1
+	or	r6,r0	
+	
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+	
+.L_chk_a_for_zero:
+	!Chk if op1 is zero
+	cmp/eq	r1,r4
+	bf	.L_chk_b_for_zero
+	
+	cmp/eq	r1,r5
+	bf	.L_chk_b_for_zero
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+.L_chk_b_for_zero:
+	!op1=0,chk if op2 is zero
+        cmp/eq  r1,r6
+        mov	r1,r3
+	
+	mov.l   .L_inf,r1
+	bf      .L_normal_nos
+
+        cmp/eq  r3,r7
+        bf      .L_normal_nos
+
+	mov	r3,r1
+	mov.l   @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	nop
+
+.L_normal_nos:
+	!op1 and op2 are normal nos
+	mov.l	r9,@-r15
+	mov	r4,r3
+
+	mov     #-20,r9	
+	and	r1,r3	
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r9,r2
+#else
+        SHLR20 (r2)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r9,r3
+#else
+        SHLR20 (r3)
+#endif
+	cmp/pl	r3
+
+	bf	.L_norm_a	!normalize op1
+.L_chk_b:	
+	cmp/pl	r2
+	bf	.L_norm_b	!normalize op2
+
+.L_mul1:
+	add	r3,r2
+	mov.l  .L_1023,r1
+	
+	!resultant exponent in r2
+	add     r1,r2
+	mov.l   .L_2047,r1	
+
+	!Chk the exponent for overflow
+	cmp/ge	r1,r2
+	and     r8,r4
+
+	bt	.L_return_inf
+	mov.l	.L_imp_bit,r1
+	
+	or	r1,r4		
+	and	r8,r6
+
+	or	r1,r6
+	clrt
+
+	!multiplying the mantissas
+	DMULU_SAVE
+	DMULUL	(r7,r5,r1) 	!bits 0-31 of product 	
+
+	DMULUH	(r3)
+	
+	DMULUL	(r4,r7,r8)
+
+	addc	r3,r8
+
+	DMULUH	(r3)
+
+	movt	r9
+	clrt
+
+	DMULUL	(r5,r6,r7)
+
+	addc	r7,r8		!bits 63-32 of product
+
+	movt	r7
+	add	r7,r9
+
+	DMULUH	(r7)
+
+	add	r7,r3
+
+	add	r9,r3
+	clrt
+
+	DMULUL	(r4,r6,r7)
+
+	addc	r7,r3		!bits 64-95 of product
+
+	DMULUH	(r7)
+	DMULU_RESTORE
+	
+	mov	#0,r5
+	addc	r5,r7		!bits 96-105 of product
+
+	cmp/eq	r5,r1
+	mov     #1,r4
+
+	bt	.L_skip
+	or	r4,r8
+.L_skip:
+	mov.l   .L_106_bit,r4
+	mov	r8,r9
+
+.L_chk_extra_msb:
+	!chk if exra MSB is generated
+	and     r7,r4
+	cmp/eq	r5,r4
+
+	mov     #12,r4
+	SL(bf,	.L_shift_rt_by_1,
+	 mov     #31,r5)
+	
+.L_pack_mantissa:
+	!scale the mantissa t0 53 bits
+	mov	#-19,r6
+	mov.l	.L_mask_high_mant,r5
+
+        SHLRN (19, r6, r8)
+
+	and	r3,r5
+
+	shlr	r8
+	movt	r1
+
+        SHLLN (12, r4, r5)
+
+	add	#-1,r6
+
+	or	r5,r8		!lower bits of resulting mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r6,r3
+#else
+        SHLR20 (r3)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r4,r7
+#else
+        SHLL12 (r7)
+#endif
+	clrt
+
+	or	r7,r3		!higher bits of resulting mantissa
+	mov     #0,r7
+
+	!chk the exponent for underflow
+	cmp/ge	r2,r7
+	bt	.L_underflow
+
+	addc    r1,r8           !rounding
+	mov	r8,r1
+
+	addc	r7,r3		!rounding
+	mov.l	.L_mask_22_bit,r5
+
+	and	r3,r5
+	!chk if extra msb is generated after rounding
+	cmp/eq	r7,r5
+
+	mov.l	.L_mask_high_mant,r8
+	bt	.L_pack_result
+
+	add	#1,r2
+	mov.l	.L_2047,r6
+
+	cmp/ge	r6,r2
+
+	bt	.L_return_inf
+	shlr	r3
+
+	rotcr	r1
+
+.L_pack_result:
+	!pack the result, r2=exponent, r3=higher mantissa, r1=lower mantissa
+	!r0=sign bit
+	mov	#20,r6
+	and	r8,r3
+	
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r6,r2
+#else
+        SHLL20 (r2)
+#endif
+	or	r3,r0
+	
+	or      r2,r0
+	mov.l   @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_norm_a:
+	!normalize op1
+	shll	r5
+	mov.l	.L_imp_bit,r1
+
+	rotcl	r4
+	add	#-1,r3
+
+	tst	r1,r4
+	bt	.L_norm_a
+
+	bra	.L_chk_b
+	add	#1,r3
+
+.L_norm_b:
+	!normalize op2
+        shll    r7
+        mov.l   .L_imp_bit,r1
+
+        rotcl   r6
+        add     #-1,r2
+
+        tst     r1,r6
+        bt      .L_norm_b
+
+        bra     .L_mul1
+        add     #1,r2
+
+.L_shift_rt_by_1:
+	!adjust the extra msb
+
+	add     #1,r2           !add 1 to exponent
+	mov.l	.L_2047,r6
+
+	cmp/ge	r6,r2
+	mov	#20,r6
+
+	bt	.L_return_inf
+	shlr	r7		!r7 contains bit 96-105 of product
+
+	rotcr	r3		!r3 contains bit 64-95 of product
+
+	rotcr	r8		!r8 contains bit 32-63 of product
+	bra	.L_pack_mantissa
+
+	rotcr	r1		!r1 contains bit 31-0 of product
+
+.L_return_inf:
+	!return Inf
+	mov.l	.L_inf,r2
+	mov     #0,r1
+
+	or	r2,r0
+	mov.l   @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+	
+.L_underflow:
+	!check if the result needs to be denormalized
+	mov	#-53,r1
+	add	#1,r2
+
+	cmp/gt	r2,r1
+	mov	#32,r4
+
+	add	#-2,r2
+	bt	.L_return_zero
+
+	add	r2,r4
+	mov	r7,r1
+	
+	cmp/ge	r7,r4
+	mov	r2,r6
+
+	mov	#-54,r2
+	bt	.L_denorm
+
+	mov	#-32,r6
+	
+.L_denorm:
+	!denormalize the result
+	shlr	r8
+	rotcr	r1	
+
+	shll	r8
+	add	#1,r6
+
+	shlr	r3
+	rotcr	r8
+
+	cmp/eq	r7,r6
+	bf	.L_denorm
+
+	mov	r4,r6
+	cmp/eq	r2,r4
+
+	bt	.L_break
+	mov	r7,r5
+
+	cmp/gt	r6,r7
+	bf	.L_break
+
+	mov	r2,r4
+	mov	r1,r5
+
+	mov	r7,r1
+	bt	.L_denorm
+
+.L_break:
+	mov	#0,r2
+
+	cmp/gt	r1,r2
+
+	addc	r2,r8
+	mov.l	.L_comp_1,r4
+	
+	addc	r7,r3
+	or	r3,r0
+
+	cmp/eq	r9,r7
+	bf	.L_return
+
+	cmp/eq	r7,r5
+	mov.l	.L_mask_sign,r6
+
+	bf	.L_return
+	cmp/eq	r1,r6
+	
+	bf	.L_return
+	and	r4,r8
+
+.L_return:
+	mov.l	@r15+,r9
+	mov	r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_return_zero:
+	mov.l	@r15+,r9
+	mov	r7,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+	.align	2
+
+.L_mask_high_mant:
+	.long	0x000fffff
+.L_inf:
+	.long	0x7ff00000	
+.L_mask_sign:
+	.long	0x80000000
+.L_1023:
+	.long	-1023
+.L_2047:
+	.long	2047
+.L_imp_bit:
+	.long	0x00100000
+.L_mask_22_bit:
+	.long	0x00200000
+.L_106_bit:
+	.long	0x00000200
+.L_comp_1:
+	.long	0xfffffffe
+
+ENDFUNC (GLOBAL (muldf3))
Index: gcc/config/sh/IEEE-754/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixdfsi.S	(revision 0)
@@ -0,0 +1,195 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to signed integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 	5
+	.global GLOBAL (fixdfsi)
+	FUNC (GLOBAL (fixdfsi))
+
+GLOBAL (fixdfsi):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+
+#endif
+	mov.l	.L_p_inf,r2
+	mov     #-20,r1
+	
+	mov	r2,r7
+	mov.l   .L_1023,r3
+
+	and	r4,r2
+	shll    r4
+        
+	movt    r6		! r6 contains the sign bit
+	
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r2		! r2 contains the exponent
+#else
+        SHLR20 (r2)
+#endif
+	 shlr    r4
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r7
+#else
+        SHLR20 (r7)
+#endif
+	cmp/hi	r2,r3		! if exp < 1023,return 0
+	mov.l	.L_mask_high_mant,r1
+
+	SL(bt,	.L_epil,
+	 mov	#0,r0)
+	and	r4,r1		! r1 contains high mantissa
+
+	cmp/eq	r2,r7		! chk if exp is invalid
+	mov.l	.L_1053,r7
+
+	bt	.L_inv_exp
+	mov	#11,r0
+	
+	cmp/hi	r7,r2		! If exp > 1053,return maxint
+	sub     r2,r7
+
+	mov.l	.L_21bit,r2
+	SL(bt,	.L_ret_max,
+	 add	#1,r7)		! r7 contains the number of shifts
+
+	or	r2,r1
+	mov	r7,r3
+	shll8   r1
+
+	neg     r7,r7
+	shll2	r1
+
+        shll	r1
+	cmp/hi	r3,r0
+
+	!chk if the result can be made only from higher mantissa
+	SL(bt,	.L_lower_mantissa,
+	 mov	#21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_loop:
+        tst	r7,r7
+        bt      .L_break1
+        add     #1,r7
+        bra     .L_loop
+        shlr    r1
+
+.L_break1:
+#endif
+	tst	r6,r6
+	SL(bt,	.L_epil,
+	 mov	r1,r0)
+
+	rts
+	neg	r0,r0
+
+.L_lower_mantissa:
+	!result is made from lower mantissa also
+	neg	r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r0,r5
+#else
+        SHLR21 (r5)
+#endif
+
+	or	r5,r1		!pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_sh_loop:
+	tst	r7,r7
+	bt	.L_break
+	add	#1,r7
+	bra	.L_sh_loop
+	shlr	r1
+
+.L_break:
+#endif
+	mov	r1,r0
+	bra	.L_chk_sign
+	nop
+
+.L_epil:
+	rts
+	nop
+
+.L_inv_exp:
+	cmp/hi	r0,r5
+	bt	.L_epil
+
+	cmp/hi	r0,r1		!compare high mantissa,r1
+	bt	.L_epil
+
+.L_ret_max:
+	mov.l   .L_maxint,r0
+	tst	r6,r6
+	bt	.L_epil
+
+	rts
+	add	#1,r0
+
+.L_chk_sign:
+	tst	r6,r6		!sign bit is set, number is -ve
+	bt	.L_epil
+	
+	rts
+	neg	r0,r0
+
+	.align	2
+
+.L_maxint:
+	.long	0x7fffffff
+.L_p_inf:
+	.long	0x7ff00000
+.L_mask_high_mant:
+	.long	0x000fffff
+.L_1023:
+	.long	0x000003ff
+.L_1053:
+	.long	1053
+.L_21bit:
+	.long	0x00100000
+
+ENDFUNC (GLOBAL (fixdfsi))
Index: gcc/config/sh/IEEE-754/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/divsf3.S	(revision 0)
@@ -0,0 +1,393 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!divides two single precision floating point 
+
+! Author: Aanchal Khanna
+
+! Arguments: Dividend is in r4, divisor in r5
+! Result: r0
+
+! r4 and r5 are referred as op1 and op2 resp.
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align	5
+	.global	GLOBAL (divsf3)
+	FUNC (GLOBAL (divsf3))
+
+GLOBAL (divsf3):
+	mov.l	.L_mask_sign,r1
+	mov	r4,r3
+
+	xor	r5,r3
+	shll	r4
+
+	shlr	r4
+	mov.l	.L_inf,r2
+
+	and	r3,r1		!r1=resultant sign
+	mov	r4,r6
+
+	shll	r5
+	mov	#0,r0		
+
+	shlr	r5
+	and	r2,r6
+
+	cmp/eq	r2,r6
+	mov	r5,r7
+
+	and     r2,r7
+	bt	.L_op1_inv
+
+	cmp/eq	r2,r7
+	mov	#-23,r3
+
+	bt	.L_op2_inv
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+	SHLR23 (r7)
+#else
+	shld	r3,r6
+	shld	r3,r7
+#endif
+
+	cmp/eq	r0,r4
+
+	bt	.L_op1_zero		!dividend=0
+	cmp/eq	r0,r6
+
+	mov.l   .L_imp_bit,r3
+	bt	.L_norm_op1		!normalize dividend
+.L_chk_op2:
+	cmp/eq	r0,r5
+	bt	.L_op2_zero		!divisor=0
+
+	cmp/eq	r0,r7
+	bt	.L_norm_op2		!normalize divisor
+
+.L_div1:
+	sub	r7,r6
+	add	#127,r6			!r6=resultant exponent
+
+	mov     r3,r7
+	mov.l	.L_mask_mant,r3
+
+	and	r3,r4
+	!chk exponent for overflow
+        mov.l   .L_255,r2
+
+	and     r3,r5
+	or	r7,r4
+
+	cmp/ge  r2,r6
+	or	r7,r5
+
+	bt	.L_return_inf
+	mov	r0,r2
+
+	cmp/eq  r4,r5
+	bf      .L_den_one
+
+	cmp/ge	r6,r0
+	!numerator=denominator, quotient=1, remainder=0
+	mov	r7,r2			
+
+	mov     r0,r4
+	!chk exponent for underflow
+	bt	.L_underflow
+        bra     .L_pack
+        nop
+
+.L_den_one:
+	!denominator=1, result=numerator
+
+	cmp/eq  r7,r5
+        bf      .L_divide
+
+	!chk exponent for underflow
+	cmp/ge  r6,r0
+        mov    r4,r2           
+
+        SL(bt,    .L_underflow,
+	 mov	r0,r4)
+	bra     .L_pack
+	nop
+
+.L_divide:
+	!dividing the mantissas r4<-dividend, r5<-divisor
+
+	cmp/hi	r4,r5
+	bf	.L_loop
+
+	shll	r4		! if mantissa(op1)< mantissa(op2)
+	add     #-1,r6		! shift left the numerator and decrease the exponent.
+
+.L_loop:
+	!division loop
+
+	cmp/ge	r5,r4
+	bf	.L_skip
+
+	or	r7,r2
+	sub	r5,r4
+
+.L_skip:
+	shlr	r7
+	shll	r4
+
+	cmp/eq	r0,r7
+	bf	.L_loop
+
+	!chk the exponent for underflow
+	cmp/ge  r6,r0
+	bt      .L_underflow
+	
+	!apply rounding
+	cmp/gt	r5,r4
+	bt	.L_round1
+
+	cmp/eq	r4,r5
+	bt	.L_round2
+
+.L_pack:
+	!pack the result, r1=sign, r2=quotient, r6=exponent
+
+	mov    #23,r4
+	and     r3,r2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r4,r6
+#endif
+	or	r2,r1
+
+	or	r6,r1
+	mov	r1,r0	
+	
+	rts
+	nop
+
+.L_round1:
+	!Apply proper rounding
+
+        bra     .L_pack
+        add     #1,r2
+
+.L_round2:
+	!Apply proper rounding
+
+        mov.l   .L_comp_1,r5
+        bra     .L_pack
+        and     r5,r2
+
+.L_op1_inv:
+	!chk if op1 is Inf or NaN
+
+	mov.l	.L_mask_mant,r3
+	mov	r4,r6
+
+	and	r3,r6
+	cmp/hi	r0,r6
+
+	bt	.L_ret_NaN ! op1 is NaN, return NaN.
+	cmp/eq	r2,r7
+
+	SL(bf,	.L_return,
+	 mov	r4,r0) ! Inf/finite, return Inf
+
+	! Inf/Inf or Inf/NaN, return NaN
+.L_ret_NaN:
+	rts
+	mov	#-1,r0
+	
+.L_op2_inv:
+	!chk if op2 is Inf or NaN
+
+	mov.l	.L_mask_mant,r3
+	mov	r5,r7
+	
+	and	r3,r7
+	cmp/hi	r0,r7
+
+	bt	.L_ret_op2
+	mov	r1,r0
+	
+	rts
+	nop
+
+.L_op1_zero:
+	!op1 is zero. If op2 is zero, return NaN, else return zero
+
+	cmp/eq	r0,r5
+
+	bf	.L_return
+
+	rts
+	mov	#-1,r0
+
+.L_op2_zero:
+	!B is zero,return Inf
+
+	rts
+	or	r2,r0
+
+.L_return_inf:
+	mov.l	.L_inf,r0
+	
+	rts
+	or	r1,r0
+
+.L_norm_op1:
+	!normalize dividend
+
+	shll	r4
+	tst	r2,r4
+	
+	add     #-1,r6
+	bt	.L_norm_op1
+
+	bra	.L_chk_op2
+	add	#1,r6
+
+.L_norm_op2:
+	!normalize divisor
+
+	shll	r5
+	tst	r2,r5
+	
+	add	#-1,r7
+	bt	.L_norm_op2
+
+	bra	.L_div1
+	add	#1,r7
+
+.L_underflow:
+	!denormalize the result
+
+	add	#1,r6
+	mov	#-24,r7
+
+	cmp/gt	r6,r7
+	mov	r2,r5
+
+	bt	.L_return_zero
+	add     #-1,r6
+
+	mov	#32,r3
+	neg	r6,r7
+
+	add	#1,r7
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r6,r2
+#else
+	cmp/ge	r0,r6
+	bf	.L_mov_right
+
+.L_mov_left:
+	cmp/eq	r0,r6
+	bt	.L_out
+
+	shll	r2
+	bra	.L_mov_left
+	add	#-1,r6
+
+.L_mov_right:
+	cmp/eq	r0,r6
+	bt	.L_out
+
+	add	#1,r6
+	bra	.L_mov_right
+	shlr	r2
+	
+.L_out:
+#endif
+	sub	r7,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r3,r5
+#else
+	cmp/ge	r0,r3
+	bf	.L_mov_right_1
+
+.L_mov_left_1:
+	shll	r5
+	add	#-1,r3
+
+	cmp/eq	r0,r3
+	bf	.L_mov_left_1
+
+	bt	.L_out_1
+
+.L_mov_right_1:
+	cmp/eq	r0,r3
+	bt	.L_out_1
+
+	add	#1,r3
+	bra	.L_mov_right_1
+	shlr	r5
+
+.L_out_1:
+#endif
+	shlr	r2
+	addc	r0,r2
+
+	cmp/eq	r4,r0		!r4 contains the remainder
+	mov      r2,r0
+
+	mov.l	.L_mask_sign,r7
+	bf	.L_return
+
+	mov.l   .L_comp_1,r2
+	cmp/eq	r7,r5
+
+	bf	.L_return
+	and	r2,r0
+
+.L_return:
+.L_return_zero:
+	rts
+	or     r1,r0
+	
+.L_ret_op2:
+	rts
+	or	r5,r0
+
+
+	.align	2
+.L_inf:
+	.long	0x7f800000
+.L_mask_sign:
+	.long	0x80000000
+.L_mask_mant:
+	.long	0x007fffff
+.L_imp_bit:
+	.long	0x00800000
+.L_comp_1:
+	.long	0xfffffffe
+.L_255:
+	.long	255
+
+ENDFUNC (GLOBAL (divsf3))
Index: gcc/config/sh/IEEE-754/fixunssfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunssfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixunssfsi.S	(revision 0)
@@ -0,0 +1,150 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion from floating point to unsigned integer
+
+! Author: Rakesh Kumar
+
+! Argument: r4 (in floating point format)
+! Result: r0
+
+! For negative floating point numbers, it returns zero
+
+! The argument is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global	GLOBAL (fixunssfsi)
+	FUNC (GLOBAL (fixunssfsi))
+
+GLOBAL (fixunssfsi):
+	mov.l	.L_sign,r0
+	mov	r4,r2
+
+	! Check for NaN
+	mov.l	.L_inf,r1
+	and	r4,r0
+
+	mov.l	.L_mask_sign,r7
+	mov	#127,r5
+
+	! Remove sign bit
+	cmp/eq	#0,r0
+	and	r7,r2
+
+	! If number is negative, return 0
+	! LIBGCC deviates from standard in this regard.
+	mov	r4,r3
+	SL(bf,	.L_epil,
+	 mov	#0,r0)
+
+	mov.l	.L_frac,r6
+	cmp/gt	r1,r2
+
+	shll	r2
+	SL1(bt,	.L_epil,
+	 shlr16	r2)
+
+	shlr8	r2	! r2 has exponent
+	mov.l	.L_24bit,r1
+
+	and	r6,r3	! r3 has fraction
+	cmp/gt	r2,r5
+
+	! If exponent is less than 127, return 0
+	or	r1,r3
+	bt	.L_epil
+
+	! Process only if exponent is less than 158
+	mov.l	.L_158,r1
+	shll8	r3
+
+	cmp/gt	r1,r2
+	sub	r2,r1
+
+	neg	r1,r1
+	bt	.L_ret_max
+
+! Shift the mantissa with exponent difference from 158
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r1,r3
+#else
+	cmp/gt	r0,r1
+	bt	.L_mov_left
+
+.L_mov_right:
+	cmp/eq	r1,r0
+	bt	.L_ret
+
+	add	#1,r1
+	bra	.L_mov_right
+	shlr	r3
+
+.L_mov_left:
+	add	#-1,r1
+	
+	shll	r3
+	cmp/eq	r1,r0
+
+	bf	.L_mov_left
+
+.L_ret:	
+#endif
+	rts
+	mov	r3,r0
+
+! r0 already has appropriate value
+.L_epil:
+	rts
+	nop
+
+! Return the maximum unsigned integer value
+.L_ret_max:
+	mov.l	.L_max,r3
+
+	rts
+	mov	r3,r0
+
+	.align 2
+.L_inf:
+	.long 0x7F800000
+
+.L_158:
+	.long 158
+
+.L_max:
+	.long 0xFFFFFFFF
+
+.L_frac:
+	.long 0x007FFFFF
+
+.L_sign:
+	.long 0x80000000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_mask_sign:
+	.long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixunssfsi))
Index: gcc/config/sh/IEEE-754/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatunssidf.S	(revision 0)
@@ -0,0 +1,71 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of unsigned integer to double precision floating point number
+!Author:Rakesh Kumar
+!Rewritten for SH1 support: Joern Rennecke
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatunsidf)
+	FUNC (GLOBAL (floatunsidf))
+
+GLOBAL (floatunsidf):
+	mov.w	LOCAL(x41f0),DBLRH	! bias + 32
+	tst	r4,r4			! check for zero
+	bt	.L_ret_zero
+.L_loop:
+	shll	r4	
+	SL(bf,	.L_loop,
+	 add	#-16,DBLRH)
+
+	mov	r4,DBLRL
+
+        SHLL20 (DBLRL)
+
+        shll16	DBLRH ! put exponent in proper place
+
+        SHLR12 (r4)
+
+	rts
+	or	r4,DBLRH
+	
+.L_ret_zero:
+	mov	#0,r1
+	rts
+	mov	#0,r0
+
+LOCAL(x41f0):	.word	0x41f0
+	.align 2
+
+ENDFUNC (GLOBAL (floatunsidf))
Index: gcc/config/sh/IEEE-754/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/addsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/addsf3.S	(revision 0)
@@ -0,0 +1,530 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Add floating point numbers in r4, r5.
+
+! Author: Rakesh Kumar
+
+! Arguments are in r4, r5 and result in r0
+
+! Entry points: ___subsf3, ___addsf3
+
+! r4 and r5 are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+        .global GLOBAL (subsf3)
+	.global	GLOBAL (addsf3)
+	FUNC (GLOBAL (subsf3))
+	FUNC (GLOBAL (addsf3))
+
+GLOBAL (subsf3):
+        mov.l   .L_sign_bit,r1
+        xor     r1,r5
+
+GLOBAL (addsf3):
+	mov.l	r8,@-r15
+	mov	r4,r3
+
+	mov.l	.L_pinf,r2
+	mov	#0,r8
+
+	and	r2,r3 ! op1's exponent.
+	mov	r5,r6
+
+	! Check NaN or Infinity
+	and	r2,r6 ! op2's exponent.
+	cmp/eq	r2,r3
+
+	! go if op1 is NaN or INF. 
+	mov.l	.L_sign_bit,r0
+	SL(bt,	.L_inv_op1,
+	 mov	#-23,r1)
+	
+	! Go if op2 is NaN/INF.
+	cmp/eq	r2,r6
+	mov	r0,r7
+	bt	.L_ret_op2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r3)
+#else
+	shld	r1,r3
+#endif
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+#else
+	shld	r1,r6
+#endif
+
+	! Check for negative zero
+	cmp/eq	r0,r5
+
+	mov	r5,r1
+	SL(bt,	.L_ret_op1,
+	 and	r7,r1)
+
+	cmp/eq	r0,r4
+	bt	.L_ret_op2
+
+	! if op1 is zero return op2
+	tst	r4,r4
+	bt	.L_ret_op2
+
+	! Equal numbers with opposite sign
+	mov	r4,r2
+	xor	r5,r2
+
+	cmp/eq	r0,r2
+	bt	.L_ret_zero
+
+	! if op2 is zero return op1
+	mov.l	.L_mask_fra,r2
+	tst	r5,r5
+
+	! Extract the mantissa
+	mov	r4,r0
+	SL(bt,	.L_ret_op1,
+	 and	r2,r5)
+
+	and	r2,r4
+
+	mov.l	.L_imp_bit,r2
+	and	r7,r0	! sign bit of op1
+
+	! Check for denormals
+	tst	r3,r3
+	bt	.L_norm_op1
+
+	! Attach the implicit bit
+	or	r2,r4
+	tst	r6,r6
+
+	bt	.L_norm_op2
+
+	or	r2,r5
+	tst	r0,r0
+
+	! operands are +ve or -ve??
+	bt	.L_ptv_op1
+
+	neg	r4,r4
+
+.L_ptv_op1:
+	tst	r1,r1
+	bt	.L_ptv_op2
+
+	neg	r5,r5
+
+! Test exponents for equality
+.L_ptv_op2:
+	cmp/eq	r3,r6
+	bt	.L_exp_eq
+
+! Make exponents of two arguments equal
+.L_exp_ne:
+	! r0, r1 contain sign bits.
+	! r4, r5 contain mantissas.
+	! r3, r6 contain exponents.
+	! r2, r7 scratch.
+
+	! Calculate result exponent.
+	mov	r6,r2
+	sub	r3,r2	! e2 - e1
+
+	cmp/pl	r2
+	mov	#23,r7
+
+	! e2 - e1 is -ve
+	bf	.L_exp_ne_1
+
+	mov	r6,r3 ! Result exp.
+	cmp/gt	r7,r2 ! e2-e1 > 23
+
+	mov	#1,r7
+	bt	.L_pack_op2_0
+
+	! Align the mantissa
+.L_loop_ne:
+	shar	r4
+
+	rotcr	r8
+	cmp/eq	r7,r2
+
+	add	#-1,r2
+	bf	.L_loop_ne
+
+	bt	.L_exp_eq
+
+! Exponent difference is too high.
+! Return op2 after placing pieces in proper place
+.L_pack_op2_0:
+	! If op1 is -ve
+	tst	r1,r1
+	bt	.L_pack_op2
+
+	neg	r5,r5
+
+! r6 has exponent
+! r5 has mantissa, r1 has sign
+.L_pack_op2:
+	mov.l	.L_nimp_bit,r2
+	mov	#23,r3
+
+	mov	r1,r0
+	
+	and	r2,r5
+	mov.l	@r15+,r8
+
+	or	r5,r0
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+        rts
+	or	r6,r0
+
+! return op1. It is NAN or INF or op2 is zero.
+.L_ret_op1:
+	mov	r4,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! return zero
+.L_ret_zero:
+	mov	#0,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! return op2. It is NaN or INF or op1 is zero.
+.L_ret_op2:
+	mov	r5,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! op2 is denormal. Normalize it.
+.L_norm_op2:
+	shll	r5
+	add	#-1,r6
+
+	tst	r2,r5
+	bt	.L_norm_op2
+
+	! Check sign
+	tst	r1,r1
+	bt	.L_norm_op2_2
+
+	neg	r5,r5
+
+.L_norm_op2_2:
+	add	#1,r6
+	cmp/eq	r3,r6
+
+	bf	.L_exp_ne
+	bt	.L_exp_eq
+
+! Normalize op1
+.L_norm_op1:
+	shll	r4
+	add	#-1,r3
+
+	tst	r2,r4
+	bt	.L_norm_op1
+
+	! Check sign
+	tst	r0,r0
+	bt	.L_norm_op1_1
+
+	neg	r4,r4
+
+.L_norm_op1_1:
+	! Adjust biasing
+	add	#1,r3
+
+	! Check op2 for denormalized value
+	tst	r6,r6
+	bt	.L_norm_op2
+
+	mov.l	.L_imp_bit,r2
+
+	tst	r1,r1	! Check sign
+	or	r2,r5	! Attach 24th bit
+
+	bt	.L_norm_op1_2
+
+	neg	r5,r5
+
+.L_norm_op1_2:
+	cmp/eq	r3,r6
+
+	bt	.L_exp_eq
+	bf	.L_exp_ne
+
+! op1 is NaN or Inf
+.L_inv_op1:
+	! Return op1 if it is NAN. 
+	! r2 is infinity
+	cmp/gt	r2,r4
+	bt	.L_ret_op1
+
+	! op1 is +/- INF
+	! If op2 is same return now.
+	cmp/eq	r4,r5
+	bt	.L_ret_op1
+
+	! return op2 if it is NAN
+	cmp/gt	r2,r5
+	bt	.L_ret_op2
+
+	! Check if op2 is inf
+	cmp/eq	r2,r6
+	bf	.L_ret_op1
+	
+	! Both op1 and op2 are infinities 
+	!of opp signs, or there is -NAN. Return a NAN.
+	mov.l	@r15+,r8
+	rts
+	mov	#-1,r0
+
+! Make unequal exponents equal.
+.L_exp_ne_1:
+	mov	#-25,r7
+	cmp/gt	r2,r7 ! -23 > e2 - e1
+
+	add	#1,r2
+	bf	.L_exp_ne_2
+
+	tst	r0,r0
+	bt	.L_pack_op1
+
+.L_pack_op1_0:
+	bra	.L_pack_op1
+	neg	r4,r4
+
+! Accumulate the shifted bits in r8
+.L_exp_ne_2:
+	! Shift with rounding
+	shar	r5
+	rotcr	r8
+
+	tst	r2,r2
+
+	add	#1,r2
+	bf	.L_exp_ne_2
+
+! Exponents of op1 and op2 are equal (or made so)
+! The mantissas are in r4-r5 and remaining bits in r8
+.L_exp_eq:
+	add	r5,r4 ! Add fractions.
+	mov.l	.L_sign_bit,r2
+
+	! Check for negative result
+	mov	#0,r0
+	tst	r2,r4
+
+	mov.l	.L_255,r5
+	bt	.L_post_add
+
+	negc	r8,r8
+	negc	r4,r4
+	or	r2,r0
+
+.L_post_add:
+	! Check for extra MSB
+	mov.l	.L_chk_25,r2
+
+	tst	r2,r4
+	bt	.L_imp_check
+
+	shar 	r4
+	rotcr	r8
+
+	add	#1,r3
+	cmp/ge	r5,r3
+
+	! Return Inf if exp > 254
+	bt	.L_ret_inf
+
+! Check for implicit (24th) bit in result
+.L_imp_check:
+        mov.l	.L_imp_bit,r2
+	tst	r2,r4
+
+	bf	.L_pack_op1
+
+! Result needs left shift
+.L_lft_shft:
+	shll	r8
+	rotcl	r4
+
+	add	#-1,r3
+	tst	r2,r4
+
+	bt	.L_lft_shft
+	
+! Pack the result after rounding
+.L_pack_op1:
+	! See if denormalized result is possible 
+	mov.l	.L_chk_25,r5
+	cmp/pl	r3
+
+	bf	.L_denorm_res
+
+	! Are there any bits shifted previously?
+	tst	r8,r8
+	bt	.L_pack_1
+
+	! Round
+	shll	r8
+	movt	r6
+
+	add	r6,r4
+
+	! If we are halfway between two numbers,
+	! round towards LSB = 0
+	tst	r8,r8
+
+	bf	.L_pack_1
+
+	shlr	r4
+	shll	r4
+
+.L_pack_1:
+	! Adjust extra MSB generated after rounding
+	tst	r4,r5
+	mov.l	.L_255,r2
+
+	bt	.L_pack_2
+	shar	r4
+
+	add	#1,r3 
+	cmp/ge	r2,r3	! Check for exp overflow
+
+	bt	.L_ret_inf
+	
+! Pack it finally
+.L_pack_2:
+	! Do not store implicit bit
+	mov.l	.L_nimp_bit,r2
+	mov	#23,r1
+
+	and	r2,r4
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r3)
+#else
+	shld	r1,r3
+#endif
+	mov.l	@r15+,r8
+
+	or	r4,r0
+        rts
+	or	r3,r0
+
+! Return infinity
+.L_ret_inf:
+	mov.l	.L_pinf,r2
+
+	mov.l	@r15+,r8
+	rts
+	or	r2,r0
+
+! Result must be denormalized
+.L_denorm_res:
+	mov	#0,r2
+	
+! Denormalizing loop with rounding
+.L_den_1:
+	shar	r4
+	movt	r6
+
+	tst	r3,r3
+	bt	.L_den_2
+
+	! Increment the exponent
+	add	#1,r3
+
+	tst	r6,r6
+	bt	.L_den_0
+
+	! Count number of ON bits shifted
+	add	#1,r2
+
+.L_den_0:
+	bra	.L_den_1
+	nop
+
+! Apply rounding
+.L_den_2:
+	cmp/eq	r6,r1
+	bf	.L_den_3
+
+	add	r6,r4
+	mov	#1,r1
+
+	! If halfway between two numbers,
+	! round towards LSB = 0
+	cmp/eq	r2,r1
+	bf	.L_den_3
+
+	shar	r4
+	shll	r4
+
+.L_den_3:
+
+	mov.l	@r15+,r8
+	rts
+	or	r4,r0
+	
+	.align 2
+.L_imp_bit:
+        .long   0x00800000
+
+.L_nimp_bit:
+	.long	0xFF7FFFFF
+
+.L_mask_fra:
+        .long   0x007FFFFF
+
+.L_pinf:
+        .long   0x7F800000
+
+.L_sign_bit:
+	.long	0x80000000
+
+.L_bit_25:
+	.long	0x01000000
+
+.L_chk_25:
+        .long   0x7F000000
+
+.L_255:
+	.long	0x000000FF
+
+ENDFUNC (GLOBAL (addsf3))
+ENDFUNC (GLOBAL (subsf3))
Index: gcc/config/sh/IEEE-754/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/mulsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/mulsf3.S	(revision 0)
@@ -0,0 +1,347 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for multiplying two floating point numbers
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 and r5
+! Result: r0
+
+! The arguments are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (mulsf3)
+        FUNC (GLOBAL (mulsf3))
+
+GLOBAL (mulsf3):
+	! Extract the sign bits
+	mov.l	.L_sign,r3
+	mov	r3,r0
+
+	and	r4,r3		! sign bit for op1
+	mov.l	.L_sign_mask,r6
+
+	! Mask out the sign bit from op1 and op2
+	and	r5,r0		! sign bit for op2
+	mov.l	.L_inf,r2
+
+	and	r6,r4
+	xor	r3,r0		! Final sign in r0
+
+	and	r6,r5
+	tst	r4,r4
+
+	! Check for zero
+	mov	r5,r7
+	! Check op1 for zero
+	SL(bt,	.L_op1_zero,
+	 mov	r4,r6)
+
+	tst	r5,r5
+	bt	.L_op2_zero	! op2 is zero
+
+	! Extract the exponents
+	and	r2,r6		! Exponent of op1
+	cmp/eq	r2,r6
+
+	and	r2,r7
+	bt	.L_inv_op1	! op1 is NaN or Inf
+
+	mov.l	.L_mant,r3
+	cmp/eq	r2,r7
+
+	and	r3,r4	! Mantissa of op1
+	bt	.L_ret_op2	! op2 is Nan or Inf
+
+	and	r3,r5	! Mantissa of op2
+
+	mov	#-23,r3
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+	SHLR23 (r7)
+#else
+	shld	r3,r6
+	shld	r3,r7
+#endif
+	! Check for denormals
+	mov.l	.L_24bit,r3
+	tst	r6,r6
+
+	bt	.L_norm_op1	! op1 is denormal
+	add	#-127,r6	! Unbias op1's exp
+
+	tst	r7,r7
+	bt	.L_norm_op2	! op2 is denormal
+
+	add	#-127,r7	! Unbias op2's exp
+
+.L_multiply:
+	add	r6,r7	! Final exponent in r7
+	mov.l	.L_24bit,r1
+
+	! set 24th bit of mantissas
+	mov	#127,r3
+	or	r1,r4
+
+	DMULU_SAVE
+
+	! Multiply
+	or	r1,r5
+	DMULUL	(r4,r5,r4)
+
+	DMULUH	(r5)
+
+	DMULU_RESTORE
+
+	mov.l	.L_16bit,r6
+
+	! Check for extra MSB generated
+	tst	r5,r6
+
+	mov.l	.L_255,r1
+	bf	.L_shift_by_1	! Adjust the extra MSB
+	
+! Normalize the result with rounding
+.L_epil:
+	! Bias the exponent
+	add	#127,r7
+	cmp/ge	r1,r7
+	
+	! Check exponent overflow and underflow
+	bt	.L_ret_inf
+
+	cmp/pl	r7
+	bf	.L_denorm
+
+.L_epil_0:
+	mov	#-23,r3
+	shll	r5
+	mov	#0,r6
+
+! Fit resultant mantissa in 24 bits
+! Apply default rounding
+.L_loop_epil_0:
+        tst	r3,r3
+	bt	.L_loop_epil_out
+
+	add	#1,r3
+	shlr	r4
+
+	bra	.L_loop_epil_0
+	rotcr	r6
+
+! Round mantissa
+.L_loop_epil_out:
+	shll8	r5
+	or	r5,r4
+
+	mov.l	.L_mant,r2
+	mov	#23,r3
+
+	! Check last bit shifted out of result
+	tst	r6,r6
+	bt	.L_epil_2
+
+	! Round
+	shll	r6
+	movt	r5
+
+	add	r5,r4
+
+	! If this is the only ON bit shifted
+	! Round towards LSB = 0
+	tst	r6,r6
+	bf	.L_epil_2
+
+	shlr	r4
+	shll	r4
+
+.L_epil_2:
+	! Rounding may have produced extra MSB.
+	mov.l	.L_25bit,r5
+	tst	r4,r5
+
+	bt	.L_epil_1
+
+	add	#1,r7
+	shlr	r4
+
+.L_epil_1:
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r7)
+#else
+	shld	r3,r7
+#endif
+
+	and	r2,r4
+
+	or	r7,r4
+	rts
+	or	r4,r0
+
+.L_denorm:
+	mov	#0,r3
+
+.L_den_1:
+	shlr	r5
+	rotcr	r4
+
+	cmp/eq	r3,r7
+	bt	.L_epil_0
+
+	bra	.L_den_1
+	add	#1,r7
+	
+
+! Normalize the first argument
+.L_norm_op1:
+	shll	r4
+	tst	r3,r4
+
+	add	#-1,r6
+	bt	.L_norm_op1
+
+	! The biasing is by 126
+	add	#-126,r6
+	tst	r7,r7
+
+	bt      .L_norm_op2
+
+	bra	.L_multiply
+	add	#-127,r7
+
+! Normalize the second argument
+.L_norm_op2:
+	shll	r5
+	tst	r3,r5
+
+	add	#-1,r7
+	bt	.L_norm_op2
+
+	bra	.L_multiply
+	add	#-126,r7
+
+! op2 is zero. Check op1 for exceptional cases
+.L_op2_zero:
+	mov.l	.L_inf,r2
+	and	r2,r6
+
+	! Check if op1 is deterministic
+	cmp/eq	r2,r6
+	SL(bf,	.L_ret_op2,
+	 mov	#1,r1)
+
+	! Return NaN
+	rts
+	mov	#-1,r0
+
+! Adjust the extra MSB
+.L_shift_by_1:
+	shlr	r5
+	rotcr	r4
+
+	add	#1,r7		! Show the shift in exponent
+
+	cmp/gt	r3,r7
+	bf	.L_epil
+
+	! The resultant exponent is invalid
+	mov.l	.L_inf,r1
+	rts
+	or	r1,r0
+
+.L_ret_op1:
+	rts
+	or	r4,r0
+
+! op1 is zero. Check op2 for exceptional cases
+.L_op1_zero:
+	mov.l	.L_inf,r2
+	and	r2,r7
+	
+	! Check if op2 is deterministic
+	cmp/eq	r2,r7
+	SL(bf,	.L_ret_op1,
+	 mov	#1,r1)
+
+	! Return NaN
+	rts
+	mov	#-1,r0
+
+.L_inv_op1:
+	mov.l	.L_mant,r3
+	mov	r4,r6
+
+	and	r3,r6
+	tst	r6,r6
+
+	bf	.L_ret_op1	! op1 is Nan
+	! op1 is not Nan. It is Inf
+
+	cmp/eq	r2,r7
+	bf	.L_ret_op1	! op2 has a valid exponent
+
+! op2 has a invalid exponent. It could be Inf, -Inf, Nan.
+! It doesn't make any difference.
+.L_ret_op2:
+	rts
+	or	r5,r0
+
+.L_ret_inf:
+	rts
+	or	r2,r0
+
+.L_ret_zero:
+	mov	#0,r2
+	rts
+	or	r2,r0
+
+	
+	.align 2
+.L_mant:
+	.long 0x007FFFFF
+
+.L_inf:
+	.long 0x7F800000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_25bit:
+	.long 0x01000000
+
+.L_16bit:
+	.long 0x00008000
+
+.L_sign:
+	.long 0x80000000
+
+.L_sign_mask:
+	.long 0x7FFFFFFF
+
+.L_255:
+	.long 0x000000FF
+
+ENDFUNC (GLOBAL (mulsf3))
Index: gcc/config/sh/IEEE-754/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatsidf.S	(revision 0)
@@ -0,0 +1,146 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of signed integer to double precision floating point number
+!Author:Rakesh Kumar
+!
+!Entry:
+!r4:operand 
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in 
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatsidf)
+	FUNC (GLOBAL (floatsidf))
+
+GLOBAL (floatsidf):
+        mov.l   .L_sign,r0
+        mov     #0,r1
+
+	mov	r0,r2
+	tst	r4,r4 ! check r4 for zero
+
+	! Extract the sign
+	mov	r2,r3
+	SL(bt,	.L_ret_zero,
+	 and	r4,r0)
+
+	cmp/eq	r1,r0
+	not	r3,r3
+
+	mov	r1,r7
+	SL(bt,	.L_loop,
+	 and	r4,r3)
+
+	! Treat -2147483648 as special case
+	cmp/eq	r1,r3
+	neg	r4,r4
+
+	bt	.L_ret_min	
+
+.L_loop:
+	shll	r4	
+	mov	r4,r5
+
+	and	r2,r5
+	cmp/eq	r1,r5
+	
+	add	#1,r7
+	bt	.L_loop
+
+	mov.l	.L_initial_exp,r6
+	not	r2,r2
+	
+	and	r2,r4
+	mov	#21,r3
+
+	sub	r7,r6
+	mov	r4,r1
+
+	mov	#20,r7
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r1
+#else
+        SHLL21 (r1)
+#endif
+	mov	#-11,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r6	! Exponent in proper place
+#else
+        SHLL20 (r6)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r2,r4
+#else
+        SHLR11 (r4)
+#endif
+	or	r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+#ifdef __LITTLE_ENDIAN__
+	or	r4,r1
+#else
+	or	r4,r0
+#endif
+	
+.L_ret_zero:
+	rts
+	mov	#0,r0
+
+.L_ret_min:
+	mov.l	.L_min,r0
+	
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	nop
+
+	.align 2
+
+.L_initial_exp:
+	.long 0x0000041E
+
+.L_sign:
+	.long 0x80000000
+
+.L_min:
+	.long 0xC1E00000
+
+ENDFUNC (GLOBAL (floatsidf))
Index: gcc/config/sh/IEEE-754/fixsfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixsfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixsfsi.S	(revision 0)
@@ -0,0 +1,160 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion routine for float to integer
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 (in floating point format)
+! Return: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global	GLOBAL (fixsfsi)
+	FUNC (GLOBAL (fixsfsi))
+
+GLOBAL (fixsfsi):
+	mov.l	.L_mask_sign,r7
+	mov	r4,r2
+
+	! Check for NaN
+	mov.l	.L_inf,r1
+	and	r7,r2
+
+	cmp/gt	r1,r2
+	mov	#127,r5
+
+	mov	r4,r3
+	SL(bt,	.L_epil,
+	 mov	#0,r0)
+
+	shll	r2
+	mov.l	.L_frac,r6
+
+	shlr16	r2
+	and	r6,r3	! r3 has fraction
+
+	shlr8	r2	! r2 has exponent
+	mov.l	.L_24bit,r1
+
+	! If exponent is less than 127, return 0
+	cmp/gt	r2,r5
+	or	r1,r3	! Set the implicit bit
+
+	mov.l	.L_157,r1
+	SL1(bt,	.L_epil,
+	 shll8	r3)
+
+	! If exponent is greater than 157,
+	! return the maximum/minumum integer
+	! value deducing from sign
+	cmp/gt	r1,r2
+	sub	r2,r1
+
+	mov.l	.L_sign,r2
+	SL(bt,	.L_ret_max,
+	 add	#1,r1)
+
+	and	r4,r2	! Sign in r2
+	neg	r1,r1
+
+	! Shift mantissa by exponent difference from 157
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r1,r3
+#else
+        cmp/gt  r0,r1
+        bt      .L_mov_left
+
+.L_mov_right:
+        cmp/eq  r1,r0
+        bt      .L_ret
+
+        add     #1,r1
+        bra     .L_mov_right
+
+        shlr    r3
+
+.L_mov_left:
+        add     #-1,r1
+
+        shll    r3
+        cmp/eq  r1,r0
+
+        bf      .L_mov_left
+.L_ret:
+#endif
+	! If op1 is negative, negate the result
+	cmp/eq	r0,r2
+	SL(bf,	.L_negate,
+	 mov	r3,r0)
+
+! r0 has the appropriate value
+.L_epil:
+	rts
+	nop
+
+! Return the max/min integer value
+.L_ret_max:
+	and	r4,r2	! Sign in r2
+	mov.l	.L_max,r3
+
+	mov.l	.L_sign,r1
+	cmp/eq	r0,r2
+
+	mov	r3,r0
+	bt	.L_epil
+
+	! Negative number, return min int
+	rts
+	mov	r1,r0
+
+! Negate the result
+.L_negate:
+	rts
+	neg	r0,r0
+
+	.align 2
+.L_inf:
+	.long 0x7F800000
+
+.L_157:
+	.long 157
+
+.L_max:
+	.long 0x7FFFFFFF
+
+.L_frac:
+	.long 0x007FFFFF
+
+.L_sign:
+	.long 0x80000000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_mask_sign:
+	.long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixsfsi))
Index: gcc/config/sh/ieee-754-sf.S
===================================================================
--- gcc/config/sh/ieee-754-sf.S	(revision 0)
+++ gcc/config/sh/ieee-754-sf.S	(revision 0)
@@ -0,0 +1,692 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_ANY__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Single-precision floating-point emulation.
+   We handle NANs, +-infinity, and +-zero.
+   However, we assume that for NANs, the topmost bit of the fraction is set.  */
+#ifdef L_nesf2
+/* -ffinite-math-only inline version, T := r4:SF == r5:SF
+	cmp/eq	r4,r5
+	mov	r4,r0
+	bt	0f
+	or	r5,r0
+	add	r0,r0
+	tst	r0,r0	! test for +0.0 == -0.0 ; -0.0 == +0.0
+	0:			*/
+	.balign 4
+	.global GLOBAL(nesf2)
+	HIDDEN_FUNC(GLOBAL(nesf2))
+GLOBAL(nesf2):
+        /* If the raw values are unequal, the result is unequal, unless
+	   both values are +-zero.
+	   If the raw values are equal, the result is equal, unless
+	   the values are NaN.  */
+	cmp/eq	r4,r5
+	mov.l   LOCAL(c_SF_NAN_MASK),r1
+	not	r4,r0
+	bt	LOCAL(check_nan)
+	mov	r4,r0
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(check_nan):
+	tst	r1,r0
+	rts
+	movt	r0
+	.balign 4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(nesf2))
+#endif /* L_nesf2 */
+
+#ifdef L_unordsf2
+	.balign 4
+	.global GLOBAL(unordsf2)
+	HIDDEN_FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	not	r4,r0
+	tst	r1,r0
+	not	r5,r0
+	bt	LOCAL(unord)
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
+#if defined(L_gtsf2t) || defined(L_gtsf2t_trap)
+/* -ffinite-math-only inline version, T := r4:SF > r5:SF ? 0 : 1
+	cmp/pz	r4
+	mov	r4,r0
+	bf/s	0f
+	 cmp/hs	r5,r4
+	cmp/ge	r4,r5
+	or	r5,r0
+	bt	0f
+	add	r0,r0
+	tst	r0,r0
+	0:			*/
+#ifdef L_gtsf2t
+#define fun_label GLOBAL(gtsf2t)
+#else
+#define fun_label GLOBAL(gtsf2t_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater, the result true, unless
+	   any of them is a nan (but infinity is fine), or both values are
+	   +- zero.  Otherwise, the result false.  */
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	cmp/pz	r4
+	not	r5,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	r4,r0
+	bt	LOCAL(nan)
+	cmp/gt	r5,r4
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/gt	r4,r1)
+	bf	LOCAL(nan)
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	not	r4,r0
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	r4,r5
+#if defined(L_gtsf2t) && defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+#endif /* DELAYED_BRANCHES */
+	rts
+	movt	r0
+#ifdef L_gtsf2t
+LOCAL(check_nan):
+LOCAL(nan):
+	rts
+	mov	#0,r0
+#else /* ! L_gtsf2t */
+LOCAL(check_nan):
+	SLI(cmp/gt	r4,r1)
+	bf	LOCAL(nan)
+	rts
+	movt	r0
+LOCAL(nan):
+	mov	#0,r0
+	trapa	#0
+#endif /* ! L_gtsf2t */
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(fun_label)
+#endif /* L_gtsf2t */
+
+#if defined(L_gesf2f) || defined(L_gesf2f_trap)
+/* -ffinite-math-only inline version, T := r4:SF >= r5:SF */
+	cmp/pz	r5
+	mov	r4,r0
+	bf/s	0f
+	 cmp/hs	r4,r5
+	cmp/ge	r5,r4
+	or	r5,r0
+	bt	0f
+	add	r0,r0
+	tst	r0,r0
+	0:
+#ifdef L_gesf2f
+#define fun_label GLOBAL(gesf2f)
+#else
+#define fun_label GLOBAL(gesf2f_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater or equal, the result is
+	   true, unless any of them is a nan.  If both are -+zero, the
+	   result is true; otherwise, it is false.
+	   We use 0 as true and nonzero as false for this function.  */
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	cmp/pz	r5
+	not	r4,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	r4,r0
+	bt	LOCAL(nan)
+	cmp/gt	r4,r5
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/ge	r1,r5)
+	bt	LOCAL(nan)
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	not	r5,r0
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	r5,r4
+#if defined(L_gesf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+	rts
+	movt	r0
+#if defined(L_gesf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+	cmp/ge	r1,r5
+LOCAL(nan):
+	rts
+	movt	r0
+#endif /* ! DELAYED_BRANCHES */
+#ifdef L_gesf2f_trap
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,r5)
+	bt	LOCAL(nan)
+	rts
+LOCAL(nan):
+	movt	r0
+	trapa	#0
+#endif /* L_gesf2f_trap */
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(gesf2f))
+#endif /* L_gesf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_add_sub_sf3
+#include "IEEE-754/addsf3.S"
+#endif /* _add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+#include "IEEE-754/fixunssfsi.S"
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+#include "IEEE-754/fixsfsi.S"
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/divsf3.S"
+#endif /* L_divsf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift.  Supporting SH1 / SH2 here would
+   make this code too hard to maintain, so if you want to add SH1 / SH2
+   support, do it in a separate copy.  */
+#ifdef DYN_SHIFT
+#ifdef L_add_sub_sf3
+#include "IEEE-754/m3/addsf3.S"
+#endif /* L_add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/m3/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get UINT_MAX, for set sign bit, you get 0.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixunssfsi)
+	FUNC(GLOBAL(fixunssfsi))
+GLOBAL(fixunssfsi):
+	mov.l	LOCAL(max),r2
+	mov	#-23,r1
+	mov	r4,r0
+	shad	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/ge	r2,r0
+	or	r2,r0
+	bt	LOCAL(retmax)
+	cmp/pz	r4
+	and	r1,r0
+	bf	LOCAL(ret0)
+	add	#-23,r4
+	rts
+	shld	r4,r0
+LOCAL(ret0):
+LOCAL(retmax):
+	rts
+	subc	r0,r0
+	.balign 4
+LOCAL(mask):
+	.long	0x00ffffff
+LOCAL(max):
+	.long	0x4f800000
+	ENDFUNC(GLOBAL(fixunssfsi))
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get INT_MAX, for set sign bit, you get INT_MIN.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixsfsi)
+	FUNC(GLOBAL(fixsfsi))
+	.balign	4
+GLOBAL(fixsfsi):
+	mov	r4,r0
+	shll	r4
+	mov	#-24,r1
+	bt	LOCAL(neg)
+	mov.l	LOCAL(max),r2
+	shld	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/pz	r4
+	add	#-23,r4
+	bf	LOCAL(ret0)
+	cmp/gt	r0,r2
+	bf	LOCAL(retmax)
+	and	r1,r0
+	addc	r1,r0
+	rts
+	shld	r4,r0
+
+	.balign	4
+LOCAL(neg):
+	mov.l	LOCAL(min),r2
+	shld	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/pz	r4
+	add	#-23,r4
+	bf	LOCAL(ret0)
+	cmp/gt	r0,r2
+	bf	LOCAL(retmin)
+	and	r1,r0
+	addc	r1,r0
+	shld	r4,r0	! SH4-200 will start this insn on a new cycle
+	rts
+	neg	r0,r0
+
+	.balign	4
+LOCAL(ret0):
+	rts
+	mov	#0,r0
+
+LOCAL(retmax):
+	mov	#-1,r0
+	rts
+	shlr	r0
+
+LOCAL(retmin):
+	mov	#1,r0
+	rts
+	rotr	r0
+
+	.balign 4
+LOCAL(mask):
+	.long	0x007fffff
+LOCAL(max):
+	.long	0x4f000000
+LOCAL(min):
+	.long	0xcf000000
+	ENDFUNC(GLOBAL(fixsfsi))
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/m3/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/m3/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/m3/divsf3.S"
+#endif /* L_divsf3 */
+
+#ifdef L_hypotf
+	.balign 4
+	.global GLOBAL(hypotf)
+	FUNC(GLOBAL(hypotf))
+GLOBAL(hypotf):
+/* This integer implementation takes 71 to 72 cycles in the main path.
+   This is a bit slower than the SH4 can do this computation using double
+   precision hardware floating point - 57 cycles, or 69 with mode switches.  */
+ /* First, calculate x (r4) as the sum of the square of the fractions -
+    the exponent is calculated separately in r3.
+    Then, calculate sqrt(x) for the fraction by reciproot iteration.
+    We get an 7.5 bit inital value using linear approximation with two slopes
+    that are powers of two.
+    x (- [1. .. 2.)  y0 := 1.25 - x/4 - tab(x)   y (- (0.8 .. 1.0)
+    x (- [2. .. 4.)  y0 := 1.   - x/8 - tab(x)   y (- (0.5 .. 0.8)
+ x is represented with two bits before the point,
+ y with 0 bits before the binary point.
+ Thus, to calculate y0 := 1. - x/8 - tab(x), all you have to do is to shift x
+ right by 1, negate it, and subtract tab(x).  */
+
+ /* y1 := 1.5*y0 - 0.5 * (x * y0) * (y0 * y0)
+    z0 := x * y1
+    z1 := z0 + 0.5 * (y1 - (y1*y1) * z0) */
+
+	mov.l	LOCAL(xff000000),r1
+	add	r4,r4
+	mov	r4,r0
+	add	r5,r5
+	cmp/hs	r5,r4
+	sub	r5,r0
+	mov	#-24,r2
+	bf/s	LOCAL(r5_large)
+	shad	r2,r0
+	mov	r4,r3
+	shll8	r4
+	rotcr	r4
+	tst	#0xe0,r0
+	neg	r0,r0
+	bt	LOCAL(ret_abs_r3)
+	tst	r1,r5
+	shll8	r5
+	bt/s	LOCAL(denorm_r5)
+	cmp/hi	r3,r1
+	dmulu.l	r4,r4
+	bf	LOCAL(inf_nan)
+	rotcr	r5
+	shld	r0,r5
+LOCAL(denorm_r5_done):
+	sts	mach,r4
+	dmulu.l	r5,r5
+	mov.l	r6,@-r15
+	mov	#20,r6
+
+	sts	mach,r5
+LOCAL(add_frac):
+	mova	LOCAL(tab)-32,r0
+	mov.l	r7,@-r15
+	mov.w	LOCAL(x1380),r7
+	and	r1,r3
+	addc	r5,r4
+	mov.w	LOCAL(m25),r2	! -25
+	bf	LOCAL(frac_ok)
+	sub	r1,r3
+	rotcr	r4
+	cmp/eq	r1,r3	! did we generate infinity ?
+	bt	LOCAL(inf_nan)
+	shlr	r4
+	mov	r4,r1
+	shld	r2,r1
+	mov.b	@(r0,r1),r0
+	mov	r4,r1
+	shld	r6,r1
+	bra	LOCAL(frac_low2)
+	sub	r1,r7
+
+LOCAL(frac_ok):
+	mov	r4,r1
+	shld	r2,r1
+	mov.b	@(r0,r1),r1
+	cmp/pz	r4
+	mov	r4,r0
+	bt/s	LOCAL(frac_low)
+	shld	r6,r0
+	mov.w	LOCAL(xf80),r7
+	shlr	r0
+LOCAL(frac_low):
+	sub	r0,r7
+LOCAL(frac_low2):
+	mov.l	LOCAL(x40000080),r0 ! avoid denorm results near 1. << r3
+	sub	r1,r7	! {0.12}
+	mov.l	LOCAL(xfffe0000),r5 ! avoid rounding overflow near 4. << r3
+	swap.w	r7,r1	! {0.28}
+	dmulu.l	r1,r4 /* two issue cycles */
+	mulu.w	r7,r7  /* two issue cycles */
+	sts	mach,r2	! {0.26}
+	mov	r1,r7
+	shlr	r1
+	sts	macl,r6	! {0.24}
+	cmp/hi	r0,r4
+	shlr2	r2
+	bf	LOCAL(near_one)
+	shlr	r2	! {0.23} systemic error of linear approximation keeps y1 < 1
+	dmulu.l	r2,r6
+	cmp/hs	r5,r4
+	add	r7,r1	! {1.28}
+	bt	LOCAL(near_four)
+	shlr2	r1	! {1.26}
+	sts	mach,r0	! {0.15} x*y0^3 == {0.16} 0.5*x*y0^3
+	shlr2	r1	! {1.24}
+	shlr8	r1	! {1.16}
+	sett		! compensate for truncation of subtrahend, keep y1 < 1
+	subc	r0,r1   ! {0.16} y1;  max error about 3.5 ulp
+	swap.w	r1,r0
+	dmulu.l	r0,r4	! { 1.30 }
+	mulu.w	r1,r1
+	sts	mach,r2
+	shlr2	r0
+	sts	macl,r1
+	add	r2,r0
+	mov.l	LOCAL(xff000000),r6
+	add	r2,r0
+	dmulu.l	r1,r2
+	add	#127,r0
+	add	r6,r3	! precompensation for adding leading 1
+	sts	mach,r1
+	shlr	r3
+	mov.l	@r15+,r7
+	sub	r1,r0	! {0.31} max error about 50 ulp (+127)
+	mov.l	@r15+,r6
+	shlr8	r0	! {0.23} max error about 0.7 ulp
+	rts
+	add	r3,r0
+	
+LOCAL(r5_large):
+	mov	r5,r3
+	mov	#-31,r2
+	cmp/ge	r2,r0
+	shll8	r5
+	bf	LOCAL(ret_abs_r3)
+	rotcr	r5
+	tst	r1,r4
+	shll8	r4
+	bt/s	LOCAL(denorm_r4)
+	cmp/hi	r3,r1
+	dmulu.l	r5,r5
+	bf	LOCAL(inf_nan)
+	rotcr	r4
+LOCAL(denorm_r4_done):
+	shld	r0,r4
+	sts	mach,r5
+	dmulu.l	r4,r4
+	mov.l	r6,@-r15
+	mov	#20,r6
+	bra	LOCAL(add_frac)
+	sts	mach,r4
+
+LOCAL(near_one):
+	bra	LOCAL(assemble_sqrt)
+	mov	#0,r0
+LOCAL(near_four):
+	! exact round-to-nearest would add 255.  We add 256 for speed & compactness.
+	mov	r4,r0
+	shlr8	r0
+	add	#1,r0
+	tst	r0,r0
+	addc	r0,r3	! might generate infinity.
+LOCAL(assemble_sqrt):
+	mov.l	@r15+,r7
+	shlr	r3
+	mov.l	@r15+,r6
+	rts
+	add	r3,r0
+LOCAL(inf_nan):
+LOCAL(ret_abs_r3):
+	mov	r3,r0
+	rts
+	shlr	r0
+LOCAL(denorm_r5):
+	bf	LOCAL(inf_nan)
+	tst	r1,r4
+	bt	LOCAL(denorm_both)
+	dmulu.l	r4,r4
+	bra	LOCAL(denorm_r5_done)
+	shld	r0,r5
+LOCAL(denorm_r4):
+	bf	LOCAL(inf_nan)
+	tst	r1,r5
+	dmulu.l	r5,r5
+	bf	LOCAL(denorm_r4_done)
+LOCAL(denorm_both):	! normalize according to r3.
+	extu.w	r3,r2
+	mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r3,r2
+	mov	#-8,r2
+	bt	0f
+	tst	r1,r3
+	mov	#-16,r2
+	bt	0f
+	mov	#-24,r2
+0:
+	shld	r2,r3
+	mov.l	r7,@-r15
+#ifdef __pic__
+	add	r0,r3
+	mova	 LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r3),r0
+	add	#32,r2
+	sub	r0,r2
+	shld	r2,r4
+	mov	r2,r7
+	dmulu.l	r4,r4
+	sts.l	pr,@-r15
+	mov	#1,r3
+	bsr	LOCAL(denorm_r5_done)
+	shld	r2,r5
+	mov.l	LOCAL(x01000000),r1
+	neg	r7,r2
+	lds.l	@r15+,pr
+	tst	r1,r0
+	mov.l	@r15+,r7
+	bt	0f
+	add	#1,r2
+	sub	r1,r0
+0:
+	rts
+	shld	r2,r0
+
+LOCAL(m25):
+	.word	-25
+LOCAL(x1380):
+	.word	0x1380
+LOCAL(xf80):
+	.word	0xf80
+	.balign	4
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x40000080):
+	.long	0x40000080
+LOCAL(xfffe0000):
+	.long	0xfffe0000
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+
+/*
+double err(double x)
+{
+  return (x < 2. ? 1.25 - x/4. : 1. - x/8.) - 1./sqrt(x);
+}
+
+int
+main ()
+{
+  int i = 0;
+  double x, s, v;
+  double lx, hx;
+
+  s = 1./32.;
+  for (x = 1.; x < 4; x += s, i++)
+    {
+      lx = x;
+      hx = x + s - 1. / (1 << 30);
+      v = 0.5 * (err (lx) + err (hx));
+      printf ("%s% 4d%c",
+              (i & 7) == 0 ? "\t.byte\t" : "",
+              (int)(v * 4096 + 0.5) - 128,
+              (i & 7) == 7 ? '\n' : ',');
+    }
+  return 0;
+} */
+
+	.balign	4
+LOCAL(tab):
+	.byte	-113, -84, -57, -33, -11,   8,  26,  41
+	.byte	  55,  67,  78,  87,  94, 101, 106, 110
+	.byte	 113, 115, 115, 115, 114, 112, 109, 106
+	.byte	 101,  96,  91,  84,  77,  69,  61,  52
+	.byte	  51,  57,  63,  68,  72,  77,  80,  84
+	.byte	  87,  89,  91,  93,  95,  96,  97,  97
+	.byte	  97,  97,  97,  96,  95,  94,  93,  91
+	.byte	  89,  87,  84,  82,  79,  76,  72,  69
+	.byte	  65,  61,  57,  53,  49,  44,  39,  34
+	.byte	  29,  24,  19,  13,   8,   2,  -4, -10
+	.byte	 -17, -23, -29, -36, -43, -50, -57, -64
+	.byte	 -71, -78, -85, -93,-101,-108,-116,-124
+	ENDFUNC(GLOBAL(hypotf))
+#endif /* L_hypotf */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_ANY__ */
Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md	(revision 162269)
+++ gcc/config/sh/sh.md	(working copy)
@@ -107,6 +107,7 @@ (define_constants [
   (DR0_REG	64)
   (DR2_REG	66)
   (DR4_REG	68)
+  (FR4_REG	68)
   (FR23_REG	87)
 
   (TR0_REG	128)
@@ -174,6 +175,16 @@ (define_constants [
   (UNSPECV_WINDOW_END	10)
   (UNSPECV_CONST_END	11)
   (UNSPECV_EH_RETURN	12)
+
+  ;; NaN handling for software floating point:
+  ;; We require one bit specific for a precision to be set in all NaNs,
+  ;; so that we can test them with a not / tst sequence.
+  ;; ??? Ironically, this is the quiet bit for now, because that is the
+  ;; only bit set by __builtin_nan ("").
+  ;; ??? Should really use one bit lower and force it set by using
+  ;; a custom encoding function.
+  (SF_NAN_MASK         0x7fc00000)
+  (DF_NAN_MASK         0x7ff80000)
 ])
 
 ;; -------------------------------------------------------------------------
@@ -615,6 +626,14 @@ (define_insn "cmpeqsi_t"
 	cmp/eq	%1,%0"
    [(set_attr "type" "mt_group")])
 
+(define_insn "fpcmp_i1"
+  [(set (reg:SI T_REG)
+	(match_operator:SI 1 "soft_fp_comparison_operator"
+	  [(match_operand 0 "soft_fp_comparison_operand" "r") (const_int 0)]))]
+  "TARGET_SH1_SOFTFP"
+  "tst	%0,%0"
+   [(set_attr "type" "mt_group")])
+
 (define_insn "cmpgtsi_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1154,9 +1173,9 @@ (define_insn_and_split "*movsicc_umin"
 
 (define_insn "*movsicc_t_false"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else (eq (reg:SI T_REG) (const_int 0))
-		      (match_operand:SI 1 "general_movsrc_operand" "r,I08")
-		      (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+	(if_then_else:SI (eq (reg:SI T_REG) (const_int 0))
+			 (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+			 (match_operand:SI 2 "arith_reg_operand" "0,0")))]
   "TARGET_PRETEND_CMOVE
    && (arith_reg_operand (operands[1], SImode)
        || (immediate_operand (operands[1], SImode)
@@ -1167,9 +1186,9 @@ (define_insn "*movsicc_t_false"
 
 (define_insn "*movsicc_t_true"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else (ne (reg:SI T_REG) (const_int 0))
-		      (match_operand:SI 1 "general_movsrc_operand" "r,I08")
-		      (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+	(if_then_else:SI (ne (reg:SI T_REG) (const_int 0))
+			 (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+			 (match_operand:SI 2 "arith_reg_operand" "0,0")))]
   "TARGET_PRETEND_CMOVE
    && (arith_reg_operand (operands[1], SImode)
        || (immediate_operand (operands[1], SImode)
@@ -6849,6 +6868,50 @@ (define_insn "stuff_delay_slot"
 
 ;; Conditional branch insns
 
+(define_expand "cmpun_sdf"
+  [(unordered (match_operand 0 "" "") (match_operand 1 "" ""))]
+  ""
+  "
+{
+  HOST_WIDE_INT mask;
+  switch (GET_MODE (operands[0]))
+    {
+    case SFmode:
+      mask = SF_NAN_MASK;
+      break;
+    case DFmode:
+      mask = DF_NAN_MASK;
+      break;
+    default:
+      FAIL;
+    }
+  emit_insn (gen_cmpunsf_i1 (operands[0], operands[1],
+                            force_reg (SImode, GEN_INT (mask))));
+  DONE;
+}")
+
+(define_expand "cmpuneq_sdf"
+  [(uneq (match_operand 0 "" "") (match_operand 1 "" ""))]
+  ""
+  "
+{
+  HOST_WIDE_INT mask;
+  switch (GET_MODE (operands[0]))
+    {
+    case SFmode:
+      mask = SF_NAN_MASK;
+      break;
+    case DFmode:
+      mask = DF_NAN_MASK;
+      break;
+    default:
+      FAIL;
+    }
+  emit_insn (gen_cmpuneqsf_i1 (operands[0], operands[1],
+                              force_reg (SImode, GEN_INT (mask))));
+  DONE;
+}")
+
 (define_expand "cbranchint4_media"
   [(set (pc)
 	(if_then_else (match_operator 0 "shmedia_cbranch_comparison_operator"
@@ -9394,11 +9457,15 @@ (define_split
 (define_expand "cstoresf4"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(match_operator:SI 1 "sh_float_comparison_operator"
-	 [(match_operand:SF 2 "arith_operand" "")
-	  (match_operand:SF 3 "arith_operand" "")]))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+	 [(match_operand:SF 2 "nonmemory_operand" "")
+	  (match_operand:SF 3 "nonmemory_operand" "")]))]
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "if (TARGET_SHMEDIA)
      {
+       if (!arith_operand (operands[2], DFmode))
+	 operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+       if (!arith_operand (operands[3], DFmode))
+	 operands[3] = copy_to_mode_reg (DFmode, operands[3]);
        emit_insn (gen_cstore4_media (operands[0], operands[1],
 				     operands[2], operands[3]));
        DONE;
@@ -9407,18 +9474,22 @@ (define_expand "cstoresf4"
    if (! currently_expanding_to_rtl)
      FAIL;
    
-   sh_emit_compare_and_set (operands, SFmode);
+   sh_expand_float_scc (operands);
    DONE;
 ")
 
 (define_expand "cstoredf4"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(match_operator:SI 1 "sh_float_comparison_operator"
-	 [(match_operand:DF 2 "arith_operand" "")
-	  (match_operand:DF 3 "arith_operand" "")]))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+	 [(match_operand:DF 2 "nonmemory_operand" "")
+	  (match_operand:DF 3 "nonmemory_operand" "")]))]
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "if (TARGET_SHMEDIA)
      {
+       if (!arith_operand (operands[2], DFmode))
+	 operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+       if (!arith_operand (operands[3], DFmode))
+	 operands[3] = copy_to_mode_reg (DFmode, operands[3]);
        emit_insn (gen_cstore4_media (operands[0], operands[1],
 				     operands[2], operands[3]));
        DONE;
@@ -9427,7 +9498,7 @@ (define_expand "cstoredf4"
     if (! currently_expanding_to_rtl)
       FAIL;
    
-   sh_emit_compare_and_set (operands, DFmode);
+   sh_expand_float_scc (operands);
    DONE;
 ")
 
@@ -9765,7 +9836,7 @@ (define_expand "addsf3"
   [(set (match_operand:SF 0 "arith_reg_operand" "")
 	(plus:SF (match_operand:SF 1 "arith_reg_operand" "")
 		 (match_operand:SF 2 "arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH2E)
@@ -9773,6 +9844,12 @@ (define_expand "addsf3"
       expand_sf_binop (&gen_addsf3_i, operands);
       DONE;
     }
+  else if (TARGET_OSFP)
+    {
+      expand_sfunc_binop (SFmode, &gen_addsf3_i3, \"__addsf3\", PLUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*addsf3_media"
@@ -9871,6 +9948,22 @@ (define_insn_and_split "binary_sf_op1"
 }"
   [(set_attr "type" "fparith_media")])
 
+(define_insn "addsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(plus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R6_REG))
+   (clobber (reg:SI R7_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "addsf3_i"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(plus:SF (match_operand:SF 1 "fp_arith_reg_operand" "%0")
@@ -9885,7 +9978,7 @@ (define_expand "subsf3"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
 	(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
 		  (match_operand:SF 2 "fp_arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH2E)
@@ -9893,6 +9986,12 @@ (define_expand "subsf3"
       expand_sf_binop (&gen_subsf3_i, operands);
       DONE;
     }
+  else if (TARGET_OSFP)
+    {
+      expand_sfunc_binop (SFmode, &gen_subsf3_i3, \"__subsf3\", MINUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*subsf3_media"
@@ -9903,6 +10002,23 @@ (define_insn "*subsf3_media"
   "fsub.s	%1, %2, %0"
   [(set_attr "type" "fparith_media")])
 
+(define_insn "subsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(minus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R5_REG))
+   (clobber (reg:SI R6_REG))
+   (clobber (reg:SI R7_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "subsf3_i"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")
@@ -9915,10 +10031,15 @@ (define_insn "subsf3_i"
 
 (define_expand "mulsf3"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
-	(mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
-		 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
-  "")
+        (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+                 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
+  "if (TARGET_SH1_SOFTFP_MODE (SFmode))
+     { 
+       expand_sfunc_binop (SFmode, &gen_mulsf3_i3, \"__mulsf3\", MULT,
+			   operands);
+       DONE;
+    }")
 
 (define_insn "*mulsf3_media"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
@@ -9959,6 +10080,22 @@ (define_insn "mulsf3_i4"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "single")])
 
+(define_insn "mulsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(mult:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "mac_media"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(plus:SF (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -10119,6 +10256,149 @@ (define_insn "*fixsfsi"
   "ftrc	%1,%0"
   [(set_attr "type" "fp")])
 
+(define_insn "cmpnesf_i1"
+  [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+	(compare:CC_FP_NE (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtsf_i1"
+  [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+	(compare:CC_FP_GT (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltsf_i1"
+  [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+	(compare:CC_FP_UNLT (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqsf_i1_finite"
+  [(set (reg:SI T_REG)
+	(eq:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+	       (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+   (clobber (match_scratch:SI 2 "=0,1,?r"))]
+  "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+  "*
+{
+  if (which_alternative == 0)
+     output_asm_insn (\"cmp/eq\t%0,%1\;or\t%1,%2\;bt\t0f\", operands);
+  else if (which_alternative == 1)
+     output_asm_insn (\"cmp/eq\t%0,%1\;or\t%0,%2\;bt\t0f\", operands);
+  else
+    output_asm_insn (\"cmp/eq\t%0,%1\;mov\t%0,%2\;bt\t0f\;or\t%1,%2\",
+		     operands);
+  return \"add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+  [(set_attr "length" "10,10,12")])
+
+(define_insn "cmplesf_i1_finite"
+  [(set (reg:SI T_REG)
+	(le:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+	       (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+   (clobber (match_scratch:SI 2 "=0,1,r"))]
+  "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+  "*
+{
+  output_asm_insn (\"cmp/pz\t%0\", operands);
+  if (which_alternative == 2)
+    output_asm_insn (\"mov\t%0,%2\", operands);
+  if (TARGET_SH2)
+    output_asm_insn (\"bf/s\t0f\;cmp/hs\t%1,%0\;cmp/ge\t%0,%1\", operands);
+  else
+    output_asm_insn (\"bt\t1f\;bra\t0f\;cmp/hs\t%1,%0\\n1:\tcmp/ge\t%0,%1\",
+		     operands);
+  if (which_alternative == 1)
+    output_asm_insn (\"or\t%0,%2\", operands);
+  else
+    output_asm_insn (\"or\t%1,%2\", operands);
+  return \"bt\t0f\;add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+  [(set_attr "length" "18,18,20")])
+
+(define_insn "cmpunsf_i1"
+  [(set (reg:SI T_REG)
+	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
+		      (match_operand:SF 1 "arith_reg_operand" "r,r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+   (clobber (match_scratch:SI 3 "=0,&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+  [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqsf_i1"
+  [(set (reg:SI T_REG)
+	(uneq:SI (match_operand:SF 0 "arith_reg_operand" "r")
+		 (match_operand:SF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "*
+{
+  output_asm_insn (\"not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\", operands);
+  output_asm_insn (\"bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%0,%1\", operands);
+  output_asm_insn (\"mov\t%0,%3\;bt\t0f\;or\t%1,%3\", operands);
+  return \"add\t%3,%3\;tst\t%3,%3\\n0:\";
+}"
+  [(set_attr "length" "24")])
+
+(define_insn "movcc_fp_ne"
+  [(set (match_operand:CC_FP_NE 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_NE 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_gt"
+  [(set (match_operand:CC_FP_GT 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_GT 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_unlt"
+  [(set (match_operand:CC_FP_UNLT 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_UNLT 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
 (define_insn "cmpgtsf_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
@@ -10146,6 +10426,22 @@ (define_insn "ieee_ccmpeqsf_t"
   "* return output_ieee_ccmpeq (insn, operands);"
   [(set_attr "length" "4")])
 
+(define_insn "*cmpltgtsf_t"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+  "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")])
+
+(define_insn "*cmporderedsf_t"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+  "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")])
+
 
 (define_insn "cmpgtsf_t_i4"
   [(set (reg:SI T_REG)
@@ -10178,6 +10474,26 @@ (define_insn "*ieee_ccmpeqsf_t_4"
   [(set_attr "length" "4")
    (set_attr "fp_mode" "single")])
 
+(define_insn "*cmpltgtsf_t_4"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "single")])
+
+(define_insn "*cmporderedsf_t_4"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "single")])
+
 (define_insn "cmpeqsf_media"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(eq:SI (match_operand:SF 1 "fp_arith_reg_operand" "f")
@@ -10213,18 +10529,24 @@ (define_insn "cmpunsf_media"
 (define_expand "cbranchsf4"
   [(set (pc)
 	(if_then_else (match_operator 0 "sh_float_comparison_operator"
-		       [(match_operand:SF 1 "arith_operand" "")
-			(match_operand:SF 2 "arith_operand" "")])
+		       [(match_operand:SF 1 "nonmemory_operand" "")
+			(match_operand:SF 2 "nonmemory_operand" "")])
 		      (match_operand 3 "" "")
 		      (pc)))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SHMEDIA)
-    emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
-					  operands[3]));
+    {
+      if (!arith_operand (operands[1], SFmode))
+	operands[1] = copy_to_mode_reg (SFmode, operands[1]);
+      if (!arith_operand (operands[2], SFmode))
+	operands[2] = copy_to_mode_reg (SFmode, operands[2]);
+      emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+					    operands[2], operands[3]));
+    }
   else
-    sh_emit_compare_and_branch (operands, SFmode);
+    sh_expand_float_cbranch (operands);
   DONE;
 }")
 
@@ -10426,11 +10748,39 @@ (define_insn "abssf2_i"
   [(set_attr "type" "fmove")
    (set_attr "fp_mode" "single")])
 
+(define_expand "abssc2"
+  [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+	(abs:SF (match_operand:SC 1 "fp_arith_reg_operand" "")))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "
+{
+  expand_sfunc_unop (SCmode, &gen_abssc2_i3, \"__hypotf\", ABS, operands);
+  DONE;
+}")
+
+(define_insn "abssc2_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(abs:SF (reg:SC R4_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI R5_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "adddf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(plus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
 		 (match_operand:DF 2 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_FPU_DOUBLE || TARGET_SH3"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10438,6 +10788,12 @@ (define_expand "adddf3"
       expand_df_binop (&gen_adddf3_i, operands);
       DONE;
     }
+  else if (TARGET_SH3)
+    {
+      expand_sfunc_binop (DFmode, &gen_adddf3_i3_wrap, \"__adddf3\", PLUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*adddf3_media"
@@ -10458,6 +10814,30 @@ (define_insn "adddf3_i"
   [(set_attr "type" "dfp_arith")
    (set_attr "fp_mode" "double")])
 
+(define_expand "adddf3_i3_wrap"
+  [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+  "TARGET_SH3"
+  "
+{
+  emit_insn (gen_adddf3_i3 (operands[1]));
+  emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+  DONE;
+}")
+
+(define_insn "adddf3_i3"
+  [(set (reg:DF R0_REG)
+	(plus:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:DI R2_REG))
+   (clobber (reg:DF R4_REG))
+   (clobber (reg:DF R6_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH3"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "subdf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(minus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10494,7 +10874,7 @@ (define_expand "muldf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(mult:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
 		 (match_operand:DF 2 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_FPU_DOUBLE || TARGET_SH3"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10502,6 +10882,12 @@ (define_expand "muldf3"
       expand_df_binop (&gen_muldf3_i, operands);
       DONE;
     }
+  else if (TARGET_SH3)
+    {
+      expand_sfunc_binop (DFmode, &gen_muldf3_i3_wrap, \"__muldf3\", MULT,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*muldf3_media"
@@ -10522,6 +10908,32 @@ (define_insn "muldf3_i"
   [(set_attr "type" "dfp_mul")
    (set_attr "fp_mode" "double")])
 
+(define_expand "muldf3_i3_wrap"
+  [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+  "TARGET_SH3"
+  "
+{
+  emit_insn (gen_muldf3_i3 (operands[1]));
+  emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+  DONE;
+}")
+
+(define_insn "muldf3_i3"
+  [(set (reg:DF R0_REG)
+	(mult:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:DI R2_REG))
+   (clobber (reg:DF R4_REG))
+   (clobber (reg:DF R6_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH3"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "divdf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(div:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10651,6 +11063,73 @@ (define_insn "fix_truncdfsi2_i"
 ;; 	      (use (match_dup 2))])
 ;;    (set (match_dup 0) (reg:SI FPUL_REG))])
 
+(define_insn "cmpnedf_i1"
+  [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+	(compare:CC_FP_NE (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtdf_i1"
+  [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+	(compare:CC_FP_GT (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltdf_i1"
+  [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+	(compare:CC_FP_UNLT (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqdf_i1_finite"
+  [(set (reg:SI T_REG)
+	(eq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+	       (match_operand:DF 1 "arith_reg_operand" "r")))
+   (clobber (match_scratch:SI 2 "=&r"))]
+  "TARGET_SH1_SOFTFP && flag_finite_math_only"
+  "cmp/eq\t%R0,%R1\;mov\t%S0,%2\;bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;or\t%S1,%2\;add\t%2,%2\;or\t%R0,%2\;tst\t%2,%2\\n0:"
+  [(set_attr "length" "18")])
+
+(define_insn "cmpundf_i1"
+  [(set (reg:SI T_REG)
+	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
+		      (match_operand:DF 1 "arith_reg_operand" "r,r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+   (clobber (match_scratch:SI 3 "=0,&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+  [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqdf_i1"
+  [(set (reg:SI T_REG)
+	(uneq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+		 (match_operand:DF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
+  "TARGET_SH1_SOFTFP"
+  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%R0,%R1\; bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;mov\t%S0,%3\;or\t%S1,%3\;add\t%3,%3\;or\t%R0,%3\;tst\t%3,%3\\n0:"
+  [(set_attr "length" "30")])
+
 (define_insn "cmpgtdf_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:DF 0 "arith_reg_operand" "f")
@@ -10682,6 +11161,26 @@ (define_insn "*ieee_ccmpeqdf_t"
   [(set_attr "length" "4")
    (set_attr "fp_mode" "double")])
 
+(define_insn "*cmpltgtdf_t"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_DOUBLE"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "double")])
+
+(define_insn "*cmpordereddf_t_4"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "double")])
+
 (define_insn "cmpeqdf_media"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(eq:SI (match_operand:DF 1 "fp_arith_reg_operand" "f")
@@ -10717,18 +11216,24 @@ (define_insn "cmpundf_media"
 (define_expand "cbranchdf4"
   [(set (pc)
 	(if_then_else (match_operator 0 "sh_float_comparison_operator"
-		       [(match_operand:DF 1 "arith_operand" "")
-			(match_operand:DF 2 "arith_operand" "")])
+		       [(match_operand:DF 1 "nonmemory_operand" "")
+			(match_operand:DF 2 "nonmemory_operand" "")])
 		      (match_operand 3 "" "")
 		      (pc)))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SHMEDIA)
-    emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
-					  operands[3]));
+    {
+      if (!arith_operand (operands[1], DFmode))
+	operands[1] = copy_to_mode_reg (DFmode, operands[1]);
+      if (!arith_operand (operands[2], DFmode))
+	operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+      emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+					    operands[2], operands[3]));
+    }
   else
-    sh_emit_compare_and_branch (operands, DFmode);
+    sh_expand_float_cbranch (operands);
   DONE;
 }")
 
@@ -10823,7 +11328,7 @@ (define_insn "absdf2_i"
 (define_expand "extendsfdf2"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(float_extend:DF (match_operand:SF 1 "fpul_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10832,6 +11337,18 @@ (define_expand "extendsfdf2"
 					get_fpscr_rtx ()));
       DONE;
     }
+  else if (TARGET_SH2E)
+    {
+      expand_sfunc_unop (SFmode, &gen_extendsfdf2_i2e, \"__extendsfdf2\",
+			 FLOAT_EXTEND, operands);
+      DONE;
+    }
+  else if (TARGET_SH1)
+    {
+      expand_sfunc_unop (SFmode, &gen_extendsfdf2_i1, \"__extendsfdf2\",
+			 FLOAT_EXTEND, operands);
+      DONE;
+    }
 }")
 
 (define_insn "*extendsfdf2_media"
@@ -10850,16 +11367,94 @@ (define_insn "extendsfdf2_i4"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "double")])
 
+;; ??? In order to use this efficiently, we'd have to have an extra
+;; register class for r0 and r1 - and that would cause repercussions in
+;; register allocation elsewhere.  So just say we clobber r0 / r1, and
+;; that we can use an arbitrary target.  */
+(define_insn_and_split "extendsfdf2_i1"
+  [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+	(float_extend:DF (reg:SF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (reg:DF R0_REG))]
+  "emit_insn (gen_extendsfdf2_i1_r0 (operands[1]));"
+  [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i1_r0"
+  [(set (reg:DF R0_REG) (float_extend:DF (reg:SF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "extendsfdf2_i2e"
+  [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+	(float_extend:DF (reg:SF FR4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI FPUL_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (reg:DF R0_REG))]
+  "emit_insn (gen_extendsfdf2_i2e_r0 (operands[1]));"
+  [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i2e_r0"
+  [(set (reg:DF R0_REG) (float_extend:DF (reg:SF FR4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI FPUL_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "truncdfsf2"
   [(set (match_operand:SF 0 "fpul_operand" "")
-	(float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
-  "
-{
+        (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU" 
+  "                      
+{                     
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
     {
       emit_df_insn (gen_truncdfsf2_i4 (operands[0], operands[1],
-				       get_fpscr_rtx ()));
+                                       get_fpscr_rtx ()));
+      DONE;
+    }
+  else if (TARGET_SH2E)
+    {
+      expand_sfunc_unop (DFmode, &gen_truncdfsf2_i2e, \"__truncdfsf2\",
+			 FLOAT_TRUNCATE, operands);
+      DONE;
+    }
+  else if (TARGET_SH1)
+    {
+      expand_sfunc_unop (DFmode, &gen_truncdfsf2_i1, \"__truncdfsf2\",
+			 FLOAT_TRUNCATE, operands);
       DONE;
     }
 }")
@@ -10879,6 +11474,37 @@ (define_insn "truncdfsf2_i4"
   "fcnvds  %1,%0"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "double")])
+
+(define_insn "truncdfsf2_i1"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+       (float_truncate:SF (reg:DF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "jsr @%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "truncdfsf2_i2e"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=w")
+	(float_truncate:SF (reg:DF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI FPUL_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 
 ;; Bit field extract patterns.  These give better code for packed bitfields,
 ;; because they allow auto-increment addresses to be generated.

Follow-Ups:
- Re: SH optimized software floating point routines
  - From: Kaz Kojima

References:
- RE: SH optimized software floating point routines
  - From: Naveen H. S
- RE: SH optimized software floating point routines
  - From: Joern Rennecke

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]