This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RE: SH optimized software floating point routines
- From: Joern Rennecke <joern dot rennecke at embecosm dot com>
- To: "Naveen H. S" <Naveen dot S at kpitcummins dot com>, Kaz Kojima <kkojima at rr dot iij4u dot or dot jp>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Cc: Prafulla Thakare <Prafulla dot Thakare at kpitcummins dot com>
- Date: Sun, 18 Jul 2010 20:58:38 -0400
- Subject: RE: SH optimized software floating point routines
- References: <371569CBCFB2E745B891DBB88B2DFDDD19CAF5484D@KCINPUNHJCMS01.kpit.com> <20100614.101458.229239801.kkojima@rr.iij4u.or.jp> <371569CBCFB2E745B891DBB88B2DFDDD19DC264F56@KCINPUNHJCMS01.kpit.com> <20100717092859.dkxjsdzg0okk8o4c-nzlynne@webmail.spamcop.net>
I've found two bugs in truncdfsf2;
I've also added back a number of hunks that Naveen had dropped.
Note that most of the patch has been prepared in 2006, so that is the
proper most recent copyright date for those files that haven't been touched
save for swapping the Copyright notice.
TODO:
- Test & submit companion patches separately.
- Test.
2010-07-18 Joern Rennecke <joern.rennecke@embecosm.com>
* config/sh/IEEE-754/divsf3.S (divsf3):
Fix sign for zero r4 input.
Fix comments for NaN return.
Remove redundant some code.
* config/sh/ieee-754-df.S: Add comments on
RETURN_R0_MAIN / RETURN_R0 / RETURN_FR0.
(RETURN_FR0): Add missing backslash.
[!DYN_SHIFT] (extendsfdf2) <zero_denorm>: Fix mask used in
shift_byte loop.
[!DYN_SHIFT] (extendsfdf2) <x00ff0000>: New constant.
[!DYN_SHIFT] (truncdfsf2) <inf>: Fix returned value.
[DYN_SHIFT] (truncdfsf2) <inf>: Likewise.
[!DYN_SHIFT] (truncdfsf2) <xffe00000>: Remove now unused constant.
[DYN_SHIFT] (truncdfsf2) <xffe00000>: Likewise.
* config/sh/sh.c (sh_expand_float_condop): Changed parameters to
allow separate passing of comparison operands and destination.
Changed callers.
Replace use of from_compare.
Use emit instead of emit_jump_insn.
(sh_soft_fp_cmp): Remove REG_LIBCALL / REG_RETCAL code.
Use set_unique_reg_note.
(expand_sfunc_op): Likewise.
* config/sh/sh.md (cstoresf4): Add support for software floating point.
(cstoredf4, cbranchsf4, cbranchdf4): Likewise.
(cmpnedf_i1): Fix predicate.
(truncdfsf2): Add TARGET_SH2E case.
(mulsf3): Fix condition for emitting mulsf3_i3.
* config/sh/IEEE-754/adddf3.S: Adjust NaN value returned for
+inf + -inf to agree with DF_NAN_MASK mask.
* config/sh/t-sh (gt-sh.h): Remove redundant rule.
(LIB1ASMFUNCS): Add _unordsf2 and _unorddf2.
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
* targhooks.c (regs.h): #include.
(default_match_adjust): New function.
* targhooks.h (default_match_adjust): Declare.
* reload.c (operands_match_p): Use targetm.match_adjust.
* target.h (struct gcc_target): Add member match_adjust.
* target-def.h (TARGET_MATCH_ADJUST): New macro.
* Makefile.in (targhooks.o): Depend on $(REGS_H).
* config/sh/sh-protos.h (sh_match_adjust): Declare.
* config/sh/sh.c (TARGET_MATCH_ADJUST): Define as sh_match_adjust.
(sh_match_adjust): New function.
2006-09-15 J"orn Rennecke <joern.rennecke@st.com>
* sched-deps.c (sched_analyze_2): When a likely spilled register
is used, put in into a scheduling group with the insn that
sets it and with all the insns in-between.
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
config/sh/t-sh: ($(T)ic_invalidate_array_4-100.o): Add -I. .
($(T)ic_invalidate_array_4-200.o): Likewise.
($(T)ic_invalidate_array_4a.o): Likewise.
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
* config/sh/sh.h (LIBGCC2_DOUBLE_TYPE_SIZE): Define
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
* sh.md (*movsicc_t_false, *movsicc_t_true): Add mode.
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
Aanchal Khanna <aanchalk@noida.hcltech.com>
Rakesh Kumar <rakesh.kumar@noida.hcltech.com>
* config/sh/sh-protos.h (sh_function_kind): New enumerator
SFUNC_FREQUENT.
(expand_sfunc_unop, expand_sfunc_binop): Declare.
(sh_expand_float_cbranch): Likewise.
* config/sh/lib1funcs.asm (ieee-754-sf.S, ieee-754-df.S): #include.
* config/sh/t-sh (LIB1ASMFUNCS): Add nesf2, _nedf2, _gtsf2t, _gtdf2t,
_gesf2f, _gedf2f, _extendsfdf2, , _truncdfsf2, _add_sub_sf3, _mulsf3,
_hypotf, _muldf3, _add_sub_df3, _divsf3, _divdf3, _fixunssfsi,
_fixsfsi, _fixunsdfsi, _fixdfsi, _floatunssisf, _floatsisf,
_floatunssidf and _floatsidf.
(FPBIT, DPBIT, dp-bit.c, fp-bit.c): Removed.
* config/sh/ieee-754-df.S, config/sh/ieee-754-sf.S: New files.
* config/sh/predicates.md (soft_fp_comparison_operand): New predicate.
(soft_fp_comparison_operator): Likewise.
* config/sh/sh.c (sh_soft_fp_cmp, expand_sfunc_op): New functions.
(expand_sfunc_unop, expand_sfunc_binop): Likewise.
(sh_expand_float_cbranch): Likewise.
(sh_expand_float_condop, sh_expand_float_scc): Likewise.
(from_compare): Add support for software floating point.
(function_symbol): Always look up name. Add SFUNC_FREQUENT case.
* config/sh/sh.h (TARGET_SH1_SOFTFP): New macro.
(TARGET_SH1_SOFTFP_MODE): Likewise.
* config/sh/sh-modes.def (CC_FP_NE, CC_FP_GT, CC_FP_UNLT): New modes.
* config/sh/lib1funcs.h (SLC, SLI, SLCMP, DMULU_SAVE): New macros.
(DMULUL, DMULUH, DMULU_RESTORE, SHLL4, SHLR4, SHLL6, SHLR6): Likewise.
(SHLL12, SHLR12, SHLR19, SHLL23, SHLR24, SHLR21, SHLL21): Likewise.
(SHLR11, SHLR22, SHLR23, SHLR20, SHLL20, SHLD_COUNT, SHLRN): Likewise.
(SHLLN, DYN_SHIFT): Likewise.
(SUPPORT_SH3_OSFP, SUPPORT_SH3E_OSFP): Likewise.
(SUPPORT_SH4_NOFPU_OSFP, SUPPORT_SH4_SINGLE_ONLY_OSFP): Likewise.
(TARGET_OSFP): Likewise.
* config/sh/IEEE-754/m3/divsf3.S: New file.
* config/sh/IEEE-754/m3/divdf3.S: Likewise.
* config/sh/IEEE-754/m3/floatunssisf.S: Likewise.
* config/sh/IEEE-754/m3/floatunssidf.S: Likewise.
* config/sh/IEEE-754/m3/fixunsdfsi.S: Likewise.
* config/sh/IEEE-754/m3/divdf3-rt.S: Likewise.
* config/sh/IEEE-754/m3/addsf3.S: Likewise.
* config/sh/IEEE-754/m3/adddf3.S: Likewise.
* config/sh/IEEE-754/m3/mulsf3.S: Likewise.
* config/sh/IEEE-754/m3/muldf3.S: Likewise.
* config/sh/IEEE-754/m3/floatsisf.S: Likewise.
* config/sh/IEEE-754/m3/floatsidf.S: Likewise.
* config/sh/IEEE-754/m3/fixdfsi.S: Likewise.
* config/sh/IEEE-754/divdf3.S: Likewise.
* config/sh/IEEE-754/floatunssisf.S: Likewise.
* config/sh/IEEE-754/fixunsdfsi.S: Likewise.
* config/sh/IEEE-754/adddf3.S: Likewise.
* config/sh/IEEE-754/floatsisf.S: Likewise.
* config/sh/IEEE-754/muldf3.S: Likewise.
* config/sh/IEEE-754/fixdfsi.S: Likewise.
* config/sh/IEEE-754/divsf3.S: Likewise.
* config/sh/IEEE-754/fixunssfsi.S: Likewise.
* config/sh/IEEE-754/floatunssidf.S: Likewise.
* config/sh/IEEE-754/addsf3.S: Likewise.
* config/sh/IEEE-754/mulsf3.S: Likewise.
* config/sh/IEEE-754/floatsidf.S: Likewise.
* config/sh/IEEE-754/fixsfsi.S: Likewise.
* config/sh/sh.md (SF_NAN_MASK, DF_NAN_MASK, FR4_REG): New constants.
(fpcmp_i1, addsf3_i3, subsf3_i3): New patterns.
(mulsf3_i3, cmpnesf_i1, cmpgtsf_i1, cmpunltsf_i1): Likewise.
(cmpeqsf_i1_finite, cmplesf_i1_finite, cmpunsf_i1): Likewise.
(cmpuneqsf_i1, movcc_fp_ne, movcc_fp_gtmovcc_fp_unlt): Likewise.
(cmpltgtsf_t, cmporderedsf_t, cmpltgtsf_t_4): Likewise.
(cmporderedsf_t_4, abssc2, adddf3_i3_wrap, adddf3_i3): Likewise.
(muldf3_i3_wrap, muldf3_i3, cmpnedf_i1, cmpgtdf_i1): Likewise.
(cmpunltdf_i1, cmpeqdf_i1_finite, cmpundf_i1, cmpuneqdf_i1): Likewise.
(cmpltgtdf_t, cmpordereddf_t_4, extendsfdf2_i1): Likewise.
(extendsfdf2_i2e, extendsfdf2_i2e_r0, truncdfsf2_i2e): Likewise.
(extendsfdf2_i1_r0, truncdfsf2_i1): Likewise.
(cmpun_sdf, cmpuneq_sdf): Likewise.
(addsf3, subsf3, mulsf3): Add support for software floating point.
(adddf3, subdf3, muldf3, extendsfdf2, truncdfsf2): Likewise.
(cmpsf, cmpdf): Don't enable for TARGET_SH2E.
(movnegt): Match only one operand. Changed user.
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi (revision 162269)
+++ gcc/doc/tm.texi (working copy)
@@ -2753,6 +2753,10 @@ of the individual moves due to expected
forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
@end deftypefn
+@deftypefn {Target Hook} int TARGET_MATCH_ADJUST (rtx, @var{int})
+This hook is documented in @file{target.def} / @file{targhooks.c}.
+@end deftypefn
+
@defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in (revision 162269)
+++ gcc/doc/tm.texi.in (working copy)
@@ -2753,6 +2753,8 @@ of the individual moves due to expected
forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
@end deftypefn
+@hook TARGET_MATCH_ADJUST
+
@defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c (revision 162269)
+++ gcc/targhooks.c (working copy)
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.
#include "reload.h"
#include "optabs.h"
#include "recog.h"
+#include "regs.h"
bool
@@ -906,6 +907,27 @@ default_secondary_reload (bool in_p ATTR
return rclass;
}
+/* Given an rtx and its regno, return a regno value that shall be used for
+ purposes of comparison in operands_match_p.
+ Generally, we say that integer registers are subject to big-endian
+ adjustment. This default target hook should generally work if the mode
+ of a register is a sufficient indication if this adjustment is to take
+ place; this will not work when software floating point is done in integer
+ registers. */
+int
+default_match_adjust (rtx x, int regno)
+{
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+ && SCALAR_INT_MODE_P (GET_MODE (x))
+ && regno < FIRST_PSEUDO_REGISTER)
+ regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+ return regno;
+}
+
void
default_target_option_override (void)
{
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h (revision 162269)
+++ gcc/targhooks.h (working copy)
@@ -121,6 +121,7 @@ extern const reg_class_t *default_ira_co
extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
enum machine_mode,
secondary_reload_info *);
+extern int default_match_adjust (rtx, int);
extern void default_target_option_override (void);
extern void hook_void_bitmap (bitmap);
extern bool default_handle_c_option (size_t, const char *, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def (revision 162269)
+++ gcc/target.def (working copy)
@@ -1945,6 +1945,14 @@ DEFHOOK
secondary_reload_info *sri),
default_secondary_reload)
+/* Take an rtx and its regno, and return the regno for purposes of
+ checking a matching constraint. */
+DEFHOOK
+(match_adjust,
+ "This hook is documented in @file{target.def} / @file{targhooks.c}.",
+ int, (rtx, int),
+ default_match_adjust)
+
/* This target hook allows the backend to perform additional
processing while initializing for variable expansion. */
DEFHOOK
Index: gcc/reload.c
===================================================================
--- gcc/reload.c (revision 162269)
+++ gcc/reload.c (working copy)
@@ -2216,14 +2216,8 @@ operands_match_p (rtx x, rtx y)
multiple hard register group of scalar integer registers, so that
for example (reg:DI 0) and (reg:SI 1) will be considered the same
register. */
- if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
- && SCALAR_INT_MODE_P (GET_MODE (x))
- && i < FIRST_PSEUDO_REGISTER)
- i += hard_regno_nregs[i][GET_MODE (x)] - 1;
- if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (y)) > UNITS_PER_WORD
- && SCALAR_INT_MODE_P (GET_MODE (y))
- && j < FIRST_PSEUDO_REGISTER)
- j += hard_regno_nregs[j][GET_MODE (y)] - 1;
+ i = targetm.match_adjust (x, i);
+ j = targetm.match_adjust (y, j);
return i == j;
}
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in (revision 162269)
+++ gcc/Makefile.in (working copy)
@@ -2806,7 +2806,7 @@ opts-common.o : opts-common.c opts.h opt
targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
$(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
$(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
- $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h
+ $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h $(REGS_H)
bversion.h: s-bversion; @true
s-bversion: BASE-VER
Index: gcc/config/sh/sh-protos.h
===================================================================
--- gcc/config/sh/sh-protos.h (revision 162269)
+++ gcc/config/sh/sh-protos.h (working copy)
@@ -25,8 +25,13 @@ along with GCC; see the file COPYING3.
#define GCC_SH_PROTOS_H
enum sh_function_kind {
- /* A function with normal C ABI */
+ /* A function with normal C ABI, or an SH1..SH4 sfunc that may resolved via
+ a PLT. */
FUNCTION_ORDINARY,
+ /* A function that is a bit large to put it in every calling dso, but that's
+ typically used often enough so that calling via GOT makes sense for
+ speed. */
+ SFUNC_FREQUENT,
/* A special function that guarantees that some otherwise call-clobbered
registers are not clobbered. These can't go through the SH5 resolver,
because it only saves argument passing registers. */
@@ -115,6 +120,10 @@ extern void expand_sf_binop (rtx (*)(rtx
extern void expand_df_unop (rtx (*)(rtx, rtx, rtx), rtx *);
extern void expand_df_binop (rtx (*)(rtx, rtx, rtx, rtx), rtx *);
extern void expand_fp_branch (rtx (*)(void), rtx (*)(void));
+extern void expand_sfunc_unop (enum machine_mode, rtx (*) (rtx, rtx),
+ const char *, enum rtx_code code, rtx *);
+extern void expand_sfunc_binop (enum machine_mode, rtx (*) (rtx, rtx),
+ const char *, enum rtx_code code, rtx *);
extern int sh_insn_length_adjustment (rtx);
extern int sh_can_redirect_branch (rtx, rtx);
extern void sh_expand_unop_v2sf (enum rtx_code, rtx, rtx);
@@ -132,6 +141,8 @@ extern struct rtx_def *get_fpscr_rtx (vo
extern int sh_media_register_for_return (void);
extern void sh_expand_prologue (void);
extern void sh_expand_epilogue (bool);
+extern void sh_expand_float_cbranch (rtx operands[4]);
+extern void sh_expand_float_scc (rtx operands[4]);
extern int sh_need_epilogue (void);
extern void sh_set_return_address (rtx, rtx);
extern int initial_elimination_offset (int, int);
@@ -176,6 +187,7 @@ struct secondary_reload_info;
extern reg_class_t sh_secondary_reload (bool, rtx, reg_class_t,
enum machine_mode,
struct secondary_reload_info *);
+extern int sh_match_adjust (rtx, int);
extern int sh2a_get_function_vector_number (rtx);
extern int sh2a_is_function_vector_call (rtx);
extern void sh_fix_range (const char *);
Index: gcc/config/sh/lib1funcs.asm
===================================================================
--- gcc/config/sh/lib1funcs.asm (revision 162269)
+++ gcc/config/sh/lib1funcs.asm (working copy)
@@ -3931,3 +3931,6 @@ GLOBAL(udiv_qrnnd_16):
ENDFUNC(GLOBAL(udiv_qrnnd_16))
#endif /* !__SHMEDIA__ */
#endif /* L_udiv_qrnnd_16 */
+
+#include "ieee-754-sf.S"
+#include "ieee-754-df.S"
Index: gcc/config/sh/t-sh
===================================================================
--- gcc/config/sh/t-sh (revision 162269)
+++ gcc/config/sh/t-sh (working copy)
@@ -25,30 +25,16 @@ sh-c.o: $(srcdir)/config/sh/sh-c.c \
LIB1ASMSRC = sh/lib1funcs.asm
LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movmem \
_movmem_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr \
- _div_table _udiv_qrnnd_16 \
+ _div_table _udiv_qrnnd_16 _unordsf2 _unorddf2 \
+ _nesf2 _nedf2 _gtsf2t _gtdf2t _gesf2f _gedf2f _extendsfdf2 _truncdfsf2 \
+ _add_sub_sf3 _mulsf3 _hypotf _muldf3 _add_sub_df3 _divsf3 _divdf3 \
+ _fixunssfsi _fixsfsi _fixunsdfsi _fixdfsi _floatunssisf _floatsisf \
+ _floatunssidf _floatsidf \
$(LIB1ASMFUNCS_CACHE)
LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
TARGET_LIBGCC2_CFLAGS = -mieee
-# We want fine grained libraries, so use the new code to build the
-# floating point emulation libraries.
-FPBIT = fp-bit.c
-DPBIT = dp-bit.c
-
-dp-bit.c: $(srcdir)/config/fp-bit.c
- echo '#ifdef __LITTLE_ENDIAN__' > dp-bit.c
- echo '#define FLOAT_BIT_ORDER_MISMATCH' >>dp-bit.c
- echo '#endif' >> dp-bit.c
- cat $(srcdir)/config/fp-bit.c >> dp-bit.c
-
-fp-bit.c: $(srcdir)/config/fp-bit.c
- echo '#define FLOAT' > fp-bit.c
- echo '#ifdef __LITTLE_ENDIAN__' >> fp-bit.c
- echo '#define FLOAT_BIT_ORDER_MISMATCH' >>fp-bit.c
- echo '#endif' >> fp-bit.c
- cat $(srcdir)/config/fp-bit.c >> fp-bit.c
-
DEFAULT_ENDIAN = $(word 1,$(TM_ENDIAN_CONFIG))
OTHER_ENDIAN = $(word 2,$(TM_ENDIAN_CONFIG))
@@ -120,7 +106,6 @@ $(T)crtn.o: $(srcdir)/config/sh/crtn.asm
$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)crtn.o -x assembler-with-cpp $(srcdir)/config/sh/crtn.asm
$(out_object_file): gt-sh.h
-gt-sh.h : s-gtype ; @true
# These are not suitable for COFF.
# EXTRA_MULTILIB_PARTS= crt1.o crti.o crtn.o crtbegin.o crtend.o
@@ -131,17 +116,17 @@ OPT_EXTRA_PARTS= libgcc-Os-4-200.a libgc
EXTRA_MULTILIB_PARTS= $(IC_EXTRA_PARTS) $(OPT_EXTRA_PARTS)
$(T)ic_invalidate_array_4-100.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4-100.a: $(T)ic_invalidate_array_4-100.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-100.a $(T)ic_invalidate_array_4-100.o
$(T)ic_invalidate_array_4-200.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4-200.a: $(T)ic_invalidate_array_4-200.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-200.a $(T)ic_invalidate_array_4-200.o
$(T)ic_invalidate_array_4a.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4a.a: $(T)ic_invalidate_array_4a.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4a.a $(T)ic_invalidate_array_4a.o
Index: gcc/config/sh/sh.opt
===================================================================
--- gcc/config/sh/sh.opt (revision 162269)
+++ gcc/config/sh/sh.opt (working copy)
@@ -21,7 +21,7 @@
;; Used for various architecture options.
Mask(SH_E)
-;; Set if the default precision of th FPU is single.
+;; Set if the default precision of the FPU is single.
Mask(FPU_SINGLE)
;; Set if we should generate code using type 2A insns.
Index: gcc/config/sh/ieee-754-df.S
===================================================================
--- gcc/config/sh/ieee-754-df.S (revision 0)
+++ gcc/config/sh/ieee-754-df.S (revision 0)
@@ -0,0 +1,791 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_DOUBLE__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Double-precision floating-point emulation.
+ We handle NANs, +-infinity, and +-zero.
+ However, we assume that for NANs, the topmost bit of the fraction is set. */
+
+#ifdef __LITTLE_ENDIAN__
+#define DBL0L r4
+#define DBL0H r5
+#define DBL1L r6
+#define DBL1H r7
+#define DBLRL r0
+#define DBLRH r1
+#else
+#define DBL0L r5
+#define DBL0H r4
+#define DBL1L r7
+#define DBL1H r6
+#define DBLRL r1
+#define DBLRH r0
+#endif
+
+/* The SH[123] ABI returns floats in r0, -m4-single returns it in fr0.
+ To abstract from this, a function that returns the single-precision
+ float value in r0 should use as in-line epilogue:
+ RETURN_R0_MAIN
+ <delay-slot insn>
+ RETURN_FR0
+ and may branch to that epilogue with:
+ RETURN_R0
+ <delay-slot insn> */
+#ifdef __SH_FPU_ANY__
+#define RETURN_R0_MAIN
+#define RETURN_R0 bra LOCAL(return_r0)
+#define RETURN_FR0 \
+LOCAL(return_r0): \
+ lds r0,fpul; \
+ rts; \
+ fsts fpul,fr0
+#define ARG_TO_R4 \
+ flds fr4,fpul; \
+ sts fpul,r4
+#else /* ! __SH_FPU_ANY__ */
+#define RETURN_R0_MAIN rts
+#define RETURN_R0 rts
+#define RETURN_FR0
+#define ARG_TO_R4
+#endif /* ! __SH_FPU_ANY__ */
+
+#ifdef L_nedf2
+/* -ffinite-math-only -mb inline version, T := r4:DF == r6:DF
+ cmp/eq r5,r7
+ mov r4,r0
+ bf 0f
+ cmp/eq r4,r6
+ bt 0f
+ or r6,r0
+ add r0,r0
+ or r5,r0
+ tst r0,r0
+ 0: */
+ .balign 4
+ .global GLOBAL(nedf2)
+ HIDDEN_FUNC(GLOBAL(nedf2))
+GLOBAL(nedf2):
+ cmp/eq DBL0L,DBL1L
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ bf LOCAL(ne)
+ cmp/eq DBL0H,DBL1H
+ not DBL0H,r0
+ bt LOCAL(check_nan)
+ mov DBL0H,r0
+ or DBL1H,r0
+ add r0,r0
+ rts
+ or DBL0L,r0
+LOCAL(check_nan):
+ tst r1,r0
+ rts
+ movt r0
+LOCAL(ne):
+ rts
+ mov #1,r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(nedf2))
+#endif /* L_nedf2 */
+
+#ifdef L_unorddf2
+ .balign 4
+ .global GLOBAL(unorddf2)
+ HIDDEN_FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ not DBL0H,r0
+ tst r1,r0
+ not r6,r0
+ bt LOCAL(unord)
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
+#if defined(L_gtdf2t) || defined(L_gtdf2t_trap)
+#ifdef L_gtdf2t
+#define fun_label GLOBAL(gtdf2t)
+#else
+#define fun_label GLOBAL(gtdf2t_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater, the result true, unless
+ any of them is a nan (but infinity is fine), or both values are
+ +- zero. Otherwise, the result false. */
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ cmp/pz DBL0H
+ not DBL1H,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov DBL0H,r0
+ bt LOCAL(nan) /* return zero if DBL1 is NAN. */
+ cmp/eq DBL1H,DBL0H
+ bt LOCAL(cmp_low)
+ cmp/gt DBL1H,DBL0H
+ or DBL1H,r0
+ SLC(bf, LOCAL(check_nan),
+ cmp/gt DBL0H,r1)
+ add r0,r0
+ bf LOCAL(nan) /* return zero if DBL0 is NAN. */
+ or DBL0L,r0
+ rts
+ or DBL1L,r0 /* non-zero unless both DBL0 and DBL1 are +-zero. */
+LOCAL(cmp_low):
+ cmp/hi DBL1L,DBL0L
+ rts
+ movt r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan) /* return zero if DBL1 is NAN. */
+ cmp/eq DBL1H,DBL0H
+ SLC(bt, LOCAL(neg_cmp_low),
+ cmp/hi DBL0L,DBL1L)
+ not DBL0H,r0
+ tst r1,r0
+ bt LOCAL(nan) /* return zero if DBL0 is NAN. */
+ cmp/hi DBL0H,DBL1H
+ SLI(rts !,)
+ SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+ SLI(cmp/hi DBL0L,DBL1L)
+ rts
+ movt r0
+LOCAL(check_nan):
+#ifdef L_gtdf2t
+LOCAL(nan):
+ rts
+ mov #0,r0
+#else
+ SLI(cmp/gt DBL0H,r1)
+ bf LOCAL(nan) /* return zero if DBL0 is NAN. */
+ rts
+ mov #0,r0
+LOCAL(nan):
+ mov #0,r0
+ trapa #0
+#endif
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(fun_label)
+#endif /* defined(L_gtdf2t) || defined(L_gtdf2t_trap) */
+
+#ifdef L_gedf2f
+ .balign 4
+ .global GLOBAL(gedf2f)
+ HIDDEN_FUNC(GLOBAL(gedf2f))
+GLOBAL(gedf2f):
+ /* If the raw values compare greater or equal, the result is
+ true, unless any of them is a nan, or both are the
+ same infinity. If both are -+zero, the result is true;
+ otherwise, it is false.
+ We use 0 as true and nonzero as false for this function. */
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ cmp/pz DBL1H
+ not DBL0H,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov DBL0H,r0
+ bt LOCAL(nan)
+ cmp/eq DBL0H,DBL1H
+ bt LOCAL(cmp_low)
+ cmp/gt DBL0H,DBL1H
+ or DBL1H,r0
+ SLC(bf, LOCAL(check_nan),
+ cmp/ge r1,DBL1H)
+ add r0,r0
+ bt LOCAL(nan)
+ or DBL0L,r0
+ rts
+ or DBL1L,r0
+LOCAL(cmp_low):
+ cmp/hi DBL0L,DBL1L
+#if defined(L_gedf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+ rts
+ movt r0
+#if defined(L_gedf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+ SLI(cmp/ge r1,DBL1H)
+LOCAL(nan):
+ rts
+ movt r0
+#elif defined(L_gedf2f_trap)
+LOCAL(check_nan):
+ SLI(cmp/ge r1,DBL1H)
+ bt LOCAL(nan)
+ rts
+LOCAL(nan):
+ movt r0
+ trapa #0
+#endif /* L_gedf2f_trap */
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ cmp/eq DBL0H,DBL1H
+ not DBL1H,r0
+ SLC(bt, LOCAL(neg_cmp_low),
+ cmp/hi DBL1L,DBL0L)
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi DBL1H,DBL0H
+ SLI(rts !,)
+ SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+ SLI(cmp/hi DBL1L,DBL0L)
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(gedf2f))
+#endif /* L_gedf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_extendsfdf2
+ .balign 4
+ .global GLOBAL(extendsfdf2)
+ FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+ ARG_TO_R4
+ mov.l LOCAL(x7f800000),r3
+ mov r4,DBLRL
+ tst r3,r4
+ bt LOCAL(zero_denorm)
+ mov.l LOCAL(xe0000000),r2
+ rotr DBLRL
+ rotr DBLRL
+ rotr DBLRL
+ and r2,DBLRL
+ mov r4,DBLRH
+ not r4,r2
+ tst r3,r2
+ mov.l LOCAL(x38000000),r2
+ bf 0f
+ add r2,r2 ! infinity / NaN adjustment
+0: shll DBLRH
+ shlr2 DBLRH
+ shlr2 DBLRH
+ add DBLRH,DBLRH
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+LOCAL(zero_denorm):
+ mov.l r4,@-r15
+ add r4,r4
+ tst r4,r4
+ bt LOCAL(zero)
+ mov.l LOCAL(x00ff0000),r3
+ mov.w LOCAL(x389),r2
+LOCAL(shift_byte):
+ tst r3,r4
+ shll8 r4
+ SL(bt, LOCAL(shift_byte),
+ add #-8,r2)
+LOCAL(shift_bit):
+ shll r4
+ SL(bf, LOCAL(shift_bit),
+ add #-1,r2)
+ mov #0,DBLRL
+ mov r4,DBLRH
+ mov.l @r15+,r4
+ shlr8 DBLRH
+ shlr2 DBLRH
+ shlr DBLRH
+ rotcr DBLRL
+ cmp/gt r4,DBLRH ! get sign
+ rotcr DBLRH
+ rotcr DBLRL
+ shll16 r2
+ shll8 r2
+ rts
+ add r2,DBLRH
+LOCAL(zero):
+ mov.l @r15+,DBLRH
+ rts
+ mov #0,DBLRL
+LOCAL(x389): .word 0x389
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(xe0000000):
+ .long 0xe0000000
+LOCAL(x00ff0000):
+ .long 0x00ff0000
+ ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+ .balign 4
+ .global GLOBAL(truncdfsf2)
+ FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+ mov.l LOCAL(x38000000),r3 ! exponent adjustment DF -> SF
+ mov DBL0H,r1
+ mov.l LOCAL(x70000000),r2 ! mask for out-of-range exponent bits
+ mov DBL0H,r0
+ mov.l DBL0L,@-r15
+ sub r3,r1
+ tst r2,r1
+ shll8 r0 !
+ shll2 r0 ! Isolate highpart fraction.
+ shll2 r0 !
+ bf LOCAL(ill_exp)
+ shll2 r1
+ mov.l LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits. */
+ shll2 r1
+ mov.l LOCAL(xff000000),r3
+ shlr8 r0
+ tst r2,DBL0L /* Check if msb guard bit wants rounding up. */
+ shlr16 DBL0L
+ shlr8 DBL0L
+ shlr2 DBL0L
+ SL1(bt, LOCAL(add_frac),
+ shlr2 DBL0L)
+ add #1,DBL0L
+LOCAL(add_frac):
+ add DBL0L,r0
+ mov.l LOCAL(x01000000),r2
+ and r3,r1
+ mov.l @r15+,DBL0L
+ add r1,r0
+ tst r3,r0
+ bt LOCAL(inf_denorm0)
+ cmp/hs r3,r0
+LOCAL(denorm_noup_sh1):
+ bt LOCAL(inf)
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ RETURN_R0_MAIN
+ rotcr r0
+RETURN_FR0
+LOCAL(inf_denorm0): ! We might need to undo previous rounding.
+ mov.l LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits. */
+ tst r1,r1
+ bf LOCAL(inf)
+ add #-1,r0
+ tst r3,DBL0L /* Check if msb guard bit was rounded up. */
+ mov.l LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits. */
+ addc r2,r0
+ shlr r0
+ tst r3,DBL0L /* Check if msb guard bit wants rounding up. */
+#ifdef DELAYED_BRANCHES
+ bt/s LOCAL(denorm_noup)
+#else
+ bt LOCAL(denorm_noup_sh1)
+#endif
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ add #1,r0
+LOCAL(denorm_noup):
+ RETURN_R0
+ rotcr r0
+LOCAL(ill_exp):
+ div0s DBL0H,r1
+ mov.l LOCAL(x7ff80000),r2
+ add r1,r1
+ bf LOCAL(inf_nan)
+ mov.w LOCAL(m32),r3 /* Handle denormal or zero. */
+ shlr16 r1
+ exts.w r1,r1
+ shll2 r1
+ add r1,r1
+ shlr8 r1
+ exts.w r1,r1
+ add #-8,r1 /* Go from 9 to 1 guard bit in MSW. */
+ cmp/gt r3,r1
+ mov.l @r15+,r3 /* DBL0L */
+ bf LOCAL(zero)
+ mov.l DBL0L, @-r15
+ shll8 DBL0L
+ rotcr r0 /* Insert leading 1. */
+ shlr16 r3
+ shll2 r3
+ add r3,r3
+ shlr8 r3
+ cmp/pl DBL0L /* Check lower 23 guard bits if guard bit 23 is 0. */
+ addc r3,r0 /* Assemble fraction with compressed guard bits. */
+ mov.l @r15+,DBL0L
+ mov #0,r2
+ neg r1,r1
+LOCAL(denorm_loop):
+ shlr r0
+ rotcl r2
+ dt r1
+ bf LOCAL(denorm_loop)
+ tst #2,r0
+ rotcl r0
+ tst r2,r2
+ rotcl r0
+ xor #3,r0
+ add #3,r0 /* Even overflow gives the correct result. */
+ shlr2 r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(zero):
+ mov #0,r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(inf_nan):
+ not DBL0H,r0
+ tst r2,r0
+ mov.l @r15+,DBL0L
+ bf LOCAL(inf)
+ RETURN_R0
+ mov #-1,r0 /* NAN */
+LOCAL(inf): /* r2 must be positive here. */
+ mov.l LOCAL(xff000000),r0
+ div0s r2,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(m32):
+ .word -32
+ .balign 4
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(x70000000):
+ .long 0x70000000
+LOCAL(x2fffffff):
+ .long 0x2fffffff
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x5fffffff):
+ .long 0x5fffffff
+LOCAL(x7ff80000):
+ .long 0x7ff80000
+ ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+#ifdef L_add_sub_df3
+#include "IEEE-754/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift. Supporting SH1 / SH2 here would
+ make this code too hard to maintain, so if you want to add SH1 / SH2
+ support, do it in a separate copy. */
+#ifdef DYN_SHIFT
+#ifdef L_extendsfdf2
+ .balign 4
+ .global GLOBAL(extendsfdf2)
+ FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+ ARG_TO_R4
+ mov.l LOCAL(x7f800000),r2
+ mov #29,r3
+ mov r4,DBLRL
+ not r4,DBLRH
+ tst r2,r4
+ shld r3,DBLRL
+ bt LOCAL(zero_denorm)
+ mov #-3,r3
+ tst r2,DBLRH
+ mov r4,DBLRH
+ mov.l LOCAL(x38000000),r2
+ bt/s LOCAL(inf_nan)
+ shll DBLRH
+ shld r3,DBLRH
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+ .balign 4
+LOCAL(inf_nan):
+ shld r3,DBLRH
+ add r2,r2
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+LOCAL(zero_denorm):
+ mov.l r4,@-r15
+ add r4,r4
+ tst r4,r4
+ extu.w r4,r2
+ bt LOCAL(zero)
+ cmp/eq r4,r2
+ extu.b r4,r1
+ bf/s LOCAL(three_bytes)
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r4,r1
+ mov #22,DBLRH
+ bt LOCAL(one_byte)
+ shlr8 r2
+ mov #14,DBLRH
+LOCAL(one_byte):
+#ifdef __pic__
+ add r0,r2
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r2),r2
+ mov #21,r3
+ mov.w LOCAL(x0),DBLRL
+ sub r2,DBLRH
+LOCAL(norm_shift):
+ shld DBLRH,r4
+ mov.l @r15+,r2
+ shld r3,DBLRH
+ mov.l LOCAL(xb7ffffff),r3
+ add r4,DBLRH
+ cmp/pz r2
+ mov r2,r4
+ rotcr DBLRH
+ rts
+ sub r3,DBLRH
+LOCAL(three_bytes):
+ mov r4,r2
+ shlr16 r2
+#ifdef __pic__
+ add r0,r2
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r2),r2
+ mov #21,r3
+ mov #6-32,DBLRH
+ sub r2,DBLRH
+ mov r4,DBLRL
+ shld DBLRH,DBLRL
+ bra LOCAL(norm_shift)
+ add #32,DBLRH
+LOCAL(zero):
+ rts /* DBLRL has already been zeroed above. */
+ mov.l @r15+,DBLRH
+LOCAL(x0):
+ .word 0
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(xb7ffffff):
+ /* Flip sign back, do exponent adjustment, and remove leading one. */
+ .long 0x80000000 + 0x38000000 - 1
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+ .balign 4
+ .global GLOBAL(truncdfsf2)
+ FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+ mov.l LOCAL(x38000000),r3
+ mov DBL0H,r1
+ mov.l LOCAL(x70000000),r2
+ mov DBL0H,r0
+ sub r3,r1
+ mov.l DBL0L,@-r15
+ tst r2,r1
+ mov #12,r3
+ shld r3,r0 ! Isolate highpart fraction.
+ bf LOCAL(ill_exp)
+ shll2 r1
+ mov.l LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits. */
+ shll2 r1
+ mov.l LOCAL(xff000000),r3
+ shlr8 r0
+ tst r2,DBL0L /* Check if msb guard bit wants rounding up. */
+ mov #-28,r2
+ bt/s LOCAL(add_frac)
+ shld r2,DBL0L
+ add #1,DBL0L
+LOCAL(add_frac):
+ add DBL0L,r0
+ mov.l LOCAL(x01000000),r2
+ and r3,r1
+ mov.l @r15+,DBL0L
+ add r1,r0
+ tst r3,r0
+ bt LOCAL(inf_denorm0)
+#if 0 // No point checking overflow -> infinity if we dont't raise a signal.
+ cmp/hs r3,r0
+ bt LOCAL(inf)
+#endif
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ RETURN_R0_MAIN
+ rotcr r0
+RETURN_FR0
+LOCAL(inf_denorm0): ! We might need to undo previous rounding.
+ mov.l LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits. */
+ tst r1,r1
+ bf LOCAL(inf)
+ add #-1,r0
+ tst r3,DBL0L /* Check if msb guard bit was rounded up. */
+ mov.l LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits. */
+ addc r2,r0
+ shlr r0
+ tst r3,DBL0L /* Check if msb guard bit wants rounding up. */
+ bt/s LOCAL(denorm_noup)
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ add #1,r0
+LOCAL(denorm_noup):
+ RETURN_R0
+ rotcr r0
+LOCAL(ill_exp):
+ div0s DBL0H,r1
+ mov.l LOCAL(x7ff80000),r2
+ add r1,r1
+ bf LOCAL(inf_nan)
+ mov.w LOCAL(m32),r3 /* Handle denormal or zero. */
+ mov #-21,r2
+ shad r2,r1
+ add #-8,r1 /* Go from 9 to 1 guard bit in MSW. */
+ cmp/gt r3,r1
+ mov.l @r15+,r3 /* DBL0L */
+ bf LOCAL(zero)
+ mov.l DBL0L, @-r15
+ shll8 DBL0L
+ rotcr r0 /* Insert leading 1. */
+ shld r2,r3
+ cmp/pl DBL0L /* Check lower 23 guard bits if guard bit 23 is 0. */
+ addc r3,r0 /* Assemble fraction with compressed guard bits. */
+ mov r0,r2
+ shld r1,r0
+ mov.l @r15+,DBL0L
+ add #32,r1
+ shld r1,r2
+ tst #2,r0
+ rotcl r0
+ tst r2,r2
+ rotcl r0
+ xor #3,r0
+ add #3,r0 /* Even overflow gives the correct result. */
+ shlr2 r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(zero):
+ mov #0,r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(inf_nan):
+ not DBL0H,r0
+ tst r2,r0
+ mov.l @r15+,DBL0L
+ bf LOCAL(inf)
+ RETURN_R0
+ mov #-1,r0 /* NAN */
+LOCAL(inf): /* r2 must be positive here. */
+ mov.l LOCAL(xff000000),r0
+ div0s r2,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(m32):
+ .word -32
+ .balign 4
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(x70000000):
+ .long 0x70000000
+LOCAL(x2fffffff):
+ .long 0x2fffffff
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x5fffffff):
+ .long 0x5fffffff
+LOCAL(x7ff80000):
+ .long 0x7ff80000
+ ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+
+
+#ifdef L_add_sub_df3
+#include "IEEE-754/m3/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/m3/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/m3/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/m3/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/m3/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/m3/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/m3/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_DOUBLE__ */
Index: gcc/config/sh/predicates.md
===================================================================
--- gcc/config/sh/predicates.md (revision 162269)
+++ gcc/config/sh/predicates.md (working copy)
@@ -719,6 +719,33 @@ (define_predicate "shift_operator"
(define_predicate "symbol_ref_operand"
(match_code "symbol_ref"))
+(define_special_predicate "soft_fp_comparison_operand"
+ (match_code "subreg,reg")
+{
+ switch (GET_MODE (op))
+ {
+ default:
+ return 0;
+ case CC_FP_NEmode: case CC_FP_GTmode: case CC_FP_UNLTmode:
+ break;
+ }
+ return register_operand (op, mode);
+})
+
+(define_predicate "soft_fp_comparison_operator"
+ (match_code "eq, unle, ge")
+{
+ switch (GET_CODE (op))
+ {
+ default:
+ return 0;
+ case EQ: mode = CC_FP_NEmode; break;
+ case UNLE: mode = CC_FP_GTmode; break;
+ case GE: mode = CC_FP_UNLTmode; break;
+ }
+ return register_operand (XEXP (op, 0), mode);
+})
+
;; Same as target_reg_operand, except that label_refs and symbol_refs
;; are accepted before reload.
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c (revision 162269)
+++ gcc/config/sh/sh.c (working copy)
@@ -284,6 +284,7 @@ static int sh_arg_partial_bytes (CUMULAT
tree, bool);
static bool sh_scalar_mode_supported_p (enum machine_mode);
static int sh_dwarf_calling_convention (const_tree);
+static void sh_expand_float_condop (rtx *operands, rtx, rtx (*[2]) (rtx));
static void sh_encode_section_info (tree, rtx, int);
static int sh2a_function_vector_p (tree);
static void sh_trampoline_init (rtx, tree, rtx);
@@ -551,6 +552,9 @@ static const struct attribute_spec sh_at
/* Machine-specific symbol_ref flags. */
#define SYMBOL_FLAG_FUNCVEC_FUNCTION (SYMBOL_FLAG_MACH_DEP << 0)
+#undef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST sh_match_adjust
+
struct gcc_target targetm = TARGET_INITIALIZER;
/* Implement TARGET_HANDLE_OPTION. */
@@ -2180,6 +2184,72 @@ sh_emit_cheap_store_flag (enum machine_m
return gen_rtx_fmt_ee (code, VOIDmode, target, const0_rtx);
}
+static rtx
+sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+{
+ const char *name = NULL;
+ rtx (*fun) (rtx, rtx), addr, tmp, last, equiv;
+ int df = op_mode == DFmode;
+ enum machine_mode mode = CODE_FOR_nothing; /* shut up warning. */
+
+ switch (code)
+ {
+ case EQ:
+ if (!flag_finite_math_only)
+ {
+ name = df ? "__nedf2" : "__nesf2";
+ fun = df ? gen_cmpnedf_i1 : gen_cmpnesf_i1;
+ mode = CC_FP_NEmode;
+ break;
+ } /* Fall through. */
+ case UNEQ:
+ fun = gen_cmpuneq_sdf;
+ break;
+ case UNLE:
+ if (flag_finite_math_only && !df)
+ {
+ fun = gen_cmplesf_i1_finite;
+ break;
+ }
+ name = df ? "__gtdf2t" : "__gtsf2t";
+ fun = df ? gen_cmpgtdf_i1 : gen_cmpgtsf_i1;
+ mode = CC_FP_GTmode;
+ break;
+ case GE:
+ if (flag_finite_math_only && !df)
+ {
+ tmp = op0; op0 = op1; op1 = tmp;
+ fun = gen_cmplesf_i1_finite;
+ break;
+ }
+ name = df ? "__gedf2f" : "__gesf2f";
+ fun = df ? gen_cmpunltdf_i1 : gen_cmpunltsf_i1;
+ mode = CC_FP_UNLTmode;
+ break;
+ case UNORDERED:
+ fun = gen_cmpun_sdf;
+ break;
+ default: gcc_unreachable ();
+ }
+
+ if (!name)
+ return fun (force_reg (op_mode, op0), force_reg (op_mode, op1));
+
+ tmp = gen_reg_rtx (mode);
+ addr = gen_reg_rtx (Pmode);
+ function_symbol (addr, name, SFUNC_STATIC);
+ emit_move_insn (gen_rtx_REG (op_mode, R4_REG), op0);
+ emit_move_insn (gen_rtx_REG (op_mode, R5_REG + df), op1);
+ last = emit_insn (fun (tmp, addr));
+ equiv = gen_rtx_fmt_ee (COMPARE, mode, op0, op1);
+ set_unique_reg_note (last, REG_EQUAL, equiv);
+ /* Use fpcmp_i1 rather than cmpeqsi_t, so that the optimizers can grok
+ the computation. */
+ return gen_rtx_SET (VOIDmode,
+ gen_rtx_REG (SImode, T_REG),
+ gen_rtx_fmt_ee (code, SImode, tmp, CONST0_RTX (mode)));
+}
+
/* Called from the md file, set up the operands of a compare instruction. */
void
@@ -8662,6 +8732,49 @@ sh_fix_range (const char *const_str)
str = comma + 1;
}
}
+
+/* Expand an sfunc operation taking NARGS MODE arguments, using generator
+ function FUN, which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using EQUIV. */
+static void
+expand_sfunc_op (int nargs, enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, rtx equiv, rtx *operands)
+{
+ int next_reg = FIRST_PARM_REG, i;
+ rtx addr, last, insn;
+
+ addr = gen_reg_rtx (Pmode);
+ function_symbol (addr, name, SFUNC_FREQUENT);
+ for ( i = 1; i <= nargs; i++)
+ {
+ insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+ next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+ }
+ last = emit_insn ((*fun) (operands[0], addr));
+ set_unique_reg_note (last, REG_EQUAL, equiv);
+}
+
+/* Expand an sfunc unary operation taking an MODE argument, using generator
+ function FUN, which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using CODE. */
+void
+expand_sfunc_unop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, enum rtx_code code, rtx *operands)
+{
+ rtx equiv = gen_rtx_fmt_e (code, GET_MODE (operands[0]), operands[1]);
+ expand_sfunc_op (1, mode, fun, name, equiv, operands);
+}
+
+/* Expand an sfunc binary operation in MODE, using generator function FUN,
+ which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using CODE. */
+void
+expand_sfunc_binop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, enum rtx_code code, rtx *operands)
+{
+ rtx equiv = gen_rtx_fmt_ee (code, mode, operands[1], operands[2]);
+ expand_sfunc_op (2, mode, fun, name, equiv, operands);
+}
/* Insert any deferred function attributes from earlier pragmas. */
static void
@@ -11593,11 +11706,10 @@ function_symbol (rtx target, const char
{
rtx sym;
- /* If this is not an ordinary function, the name usually comes from a
- string literal or an sprintf buffer. Make sure we use the same
+ /* The name usually comes from a string literal or an sprintf buffer.
+ Make sure we use the same
string consistently, so that cse will be able to unify address loads. */
- if (kind != FUNCTION_ORDINARY)
- name = IDENTIFIER_POINTER (get_identifier (name));
+ name = IDENTIFIER_POINTER (get_identifier (name));
sym = gen_rtx_SYMBOL_REF (Pmode, name);
SYMBOL_REF_FLAGS (sym) = SYMBOL_FLAG_FUNCTION;
if (flag_pic)
@@ -11605,6 +11717,10 @@ function_symbol (rtx target, const char
{
case FUNCTION_ORDINARY:
break;
+ case SFUNC_FREQUENT:
+ if (!optimize || optimize_size)
+ break;
+ /* Fall through. */
case SFUNC_GOT:
{
rtx reg = target ? target : gen_reg_rtx (Pmode);
@@ -11715,6 +11831,168 @@ sh_expand_t_scc (rtx operands[])
return 1;
}
+void
+sh_expand_float_cbranch (rtx operands[4])
+{
+ static rtx (*branches[]) (rtx) = { gen_branch_true, gen_branch_false };
+
+ sh_expand_float_condop (operands, operands[3], branches);
+}
+
+void
+sh_expand_float_scc (rtx operands[4])
+{
+ static rtx (*movts[]) (rtx) = { gen_movt, gen_movnegt };
+
+ sh_expand_float_condop (&operands[1], operands[0], movts);
+}
+
+/* The first element of USER is for positive logic, the second one for
+ negative logic. */
+static void
+sh_expand_float_condop (rtx *operands, rtx dest, rtx (*user[2]) (rtx))
+{
+ enum machine_mode mode = GET_MODE (operands[1]);
+ enum rtx_code comparison = GET_CODE (operands[0]);
+ int swap_operands = 0;
+ rtx op0, op1;
+ rtx lab = NULL_RTX;
+
+ if (TARGET_SH1_SOFTFP_MODE (mode))
+ {
+ switch (comparison)
+ {
+ case NE:
+ comparison = EQ;
+ user++;
+ break;
+ case LT:
+ swap_operands = 1; /* Fall through. */
+ case GT:
+ comparison = UNLE;
+ user++;
+ break;
+ case UNGT:
+ swap_operands = 1; /* Fall through. */
+ case UNLT:
+ comparison = GE;
+ user++;
+ break;
+ case UNGE:
+ swap_operands = 1;
+ comparison = UNLE;
+ break;
+ case LE:
+ swap_operands = 1;
+ comparison = GE; /* Fall through. */
+ case EQ:
+ case UNEQ:
+ case GE:
+ case UNLE:
+ case UNORDERED:
+ break;
+ case LTGT:
+ comparison = UNEQ;
+ user++;
+ break;
+ case ORDERED:
+ comparison = UNORDERED;
+ user++;
+ break;
+
+ default: gcc_unreachable ();
+ }
+ }
+ else /* SH2E .. SH4 Hardware floating point */
+ {
+ switch (comparison)
+ {
+ case LTGT:
+ if (!flag_finite_math_only)
+ break;
+ /* Fall through. */
+ case NE:
+ comparison = EQ;
+ user++;
+ break;
+ case LT:
+ swap_operands = 1;
+ comparison = GT; /* Fall through. */
+ case GT:
+ case EQ:
+ case ORDERED:
+ break;
+ case LE:
+ swap_operands = 1;
+ comparison = GE; /* Fall through. */
+ case GE:
+ if (flag_finite_math_only)
+ {
+ swap_operands ^= 1;
+ comparison = GT;
+ user++;
+ break;
+ }
+ break;
+ case UNGT:
+ swap_operands = 1; /* Fall through. */
+ case UNLT:
+ if (flag_finite_math_only)
+ {
+ swap_operands ^= 1;
+ comparison = GT;
+ break;
+ }
+ comparison = GE;
+ user++;
+ break;
+ case UNGE:
+ swap_operands = 1; /* Fall through. */
+ case UNLE:
+ comparison = GT;
+ user++;
+ break;
+ case UNEQ:
+ if (flag_finite_math_only)
+ {
+ comparison = EQ;
+ break;
+ }
+ comparison = LTGT;
+ user++;
+ break;
+ case UNORDERED:
+ comparison = ORDERED;
+ user++;
+ break;
+
+ default: gcc_unreachable ();
+ }
+ operands[1] = force_reg (mode, operands[1]);
+ operands[2] = force_reg (mode, operands[2]);
+ if (comparison == GE)
+ {
+ lab = gen_label_rtx ();
+ sh_emit_scc_to_t (GT, operands[1+swap_operands],
+ operands[2-swap_operands]);
+ emit_jump_insn (gen_branch_true (lab));
+ comparison = EQ;
+ }
+ }
+ op0 = operands[1+swap_operands];
+ op1 = operands[2-swap_operands];
+ if (GET_MODE_CLASS (mode) == MODE_FLOAT && TARGET_SH1_SOFTFP_MODE (mode))
+ emit_insn (sh_soft_fp_cmp (comparison, mode, op0, op1));
+ else
+ sh_emit_set_t_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, T_REG),
+ gen_rtx_fmt_ee (comparison, SImode,
+ op0, op1)),
+ mode);
+ if (lab)
+ emit_label (lab);
+ emit ((*user) (dest));
+}
+
/* INSN is an sfunc; return the rtx that describes the address used. */
static rtx
extract_sfunc_addr (rtx insn)
@@ -12266,6 +12544,19 @@ sh_secondary_reload (bool in_p, rtx x, r
return NO_REGS;
}
+int
+sh_match_adjust (rtx x, int regno)
+{
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+ && regno < FIRST_PSEUDO_REGISTER)
+ regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+ return regno;
+}
+
enum sh_divide_strategy_e sh_div_strategy = SH_DIV_STRATEGY_DEFAULT;
#include "gt-sh.h"
Index: gcc/config/sh/sh.h
===================================================================
--- gcc/config/sh/sh.h (revision 162269)
+++ gcc/config/sh/sh.h (working copy)
@@ -183,6 +183,11 @@ do { \
#define TARGET_FPU_DOUBLE \
((target_flags & MASK_SH4) != 0 || TARGET_SH2A_DOUBLE)
+#define TARGET_SH1_SOFTFP (TARGET_SH1 && !TARGET_FPU_DOUBLE)
+
+#define TARGET_SH1_SOFTFP_MODE(MODE) \
+ (TARGET_SH1_SOFTFP && (!TARGET_SH2E || (MODE) == DFmode))
+
/* Nonzero if an FPU is available. */
#define TARGET_FPU_ANY (TARGET_SH2E || TARGET_FPU_DOUBLE)
@@ -329,6 +334,38 @@ do { \
#define SUPPORT_ANY_SH5 \
(SUPPORT_ANY_SH5_32MEDIA || SUPPORT_ANY_SH5_64MEDIA)
+/* Check if we have support for optimized software floating point using
+ dynamic shifts - then some function calls clobber fewer registers. */
+#ifdef SUPPORT_SH3
+#define SUPPORT_SH3_OSFP 1
+#else
+#define SUPPORT_SH3_OSFP 0
+#endif
+
+#ifdef SUPPORT_SH3E
+#define SUPPORT_SH3E_OSFP 1
+#else
+#define SUPPORT_SH3E_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_NOFPU) || defined(SUPPORT_SH3_OSFP)
+#define SUPPORT_SH4_NOFPU_OSFP 1
+#else
+#define SUPPORT_SH4_NOFPU_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_SINGLE_ONLY) || defined (SUPPORT_SH3E_OSFP)
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 1
+#else
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 0
+#endif
+
+#define TARGET_OSFP (0 \
+ || (TARGET_SH3 && !TARGET_SH2E && SUPPORT_SH3_OSFP) \
+ || (TARGET_SH3E && SUPPORT_SH3E_OSFP) \
+ || (TARGET_HARD_SH4 && !TARGET_SH2E && SUPPORT_SH4_NOFPU_OSFP) \
+ || (TARGET_HARD_SH4 && TARGET_SH2E && SUPPORT_SH4_SINGLE_ONLY_OSFP))
+
/* Reset all target-selection flags. */
#define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
| MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
@@ -2047,6 +2084,12 @@ struct sh_args {
#define LIBGCC2_DOUBLE_TYPE_SIZE 64
#endif
+#if defined(__SH2E__) || defined(__SH3E__) || defined( __SH4_SINGLE_ONLY__)
+#define LIBGCC2_DOUBLE_TYPE_SIZE 32
+#else
+#define LIBGCC2_DOUBLE_TYPE_SIZE 64
+#endif
+
/* 'char' is signed by default. */
#define DEFAULT_SIGNED_CHAR 1
Index: gcc/config/sh/sh-modes.def
===================================================================
--- gcc/config/sh/sh-modes.def (revision 162269)
+++ gcc/config/sh/sh-modes.def (working copy)
@@ -22,6 +22,11 @@ PARTIAL_INT_MODE (SI);
/* PDI mode is used to represent a function address in a target register. */
PARTIAL_INT_MODE (DI);
+/* For software floating point comparisons. */
+CC_MODE (CC_FP_NE);
+CC_MODE (CC_FP_GT);
+CC_MODE (CC_FP_UNLT);
+
/* Vector modes. */
VECTOR_MODE (INT, QI, 2); /* V2QI */
VECTOR_MODES (INT, 4); /* V4QI V2HI */
Index: gcc/config/sh/lib1funcs.h
===================================================================
--- gcc/config/sh/lib1funcs.h (revision 162269)
+++ gcc/config/sh/lib1funcs.h (working copy)
@@ -64,13 +64,151 @@ see the files COPYING3 and COPYING.RUNTI
#endif /* !__LITTLE_ENDIAN__ */
#ifdef __sh1__
+/* branch with two-argument delay slot insn */
#define SL(branch, dest, in_slot, in_slot_arg2) \
in_slot, in_slot_arg2; branch dest
+/* branch with one-argument delay slot insn */
#define SL1(branch, dest, in_slot) \
in_slot; branch dest
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+ branch dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg2) in_slot, in_slot_arg2
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+ branch .+6; bra .+6; cmp2, cmp2arg2; cmp1, cmp1arg2
+#define DMULU_SAVE \
+ mov.l r10,@-r15; \
+ mov.l r11,@-r15; \
+ mov.l r12,@-r15; \
+ mov.l r13,@-r15
+#define DMULUL(m1, m2, rl) \
+ swap.w m1,r12; \
+ mulu.w r12,m2; \
+ swap.w m2,r13; \
+ sts macl,r10; \
+ mulu.w r13,m1; \
+ clrt; \
+ sts macl,r11; \
+ mulu.w r12,r13; \
+ addc r11,r10; \
+ sts macl,r12; \
+ mulu.w m1,m2; \
+ movt r11; \
+ sts macl,rl; \
+ mov r10,r13; \
+ shll16 r13; \
+ addc r13,rl; \
+ xtrct r11,r10; \
+ addc r10,r12 \
+/* N.B. the carry is cleared here. */
+#define DMULUH(rh) mov r12,rh
+#define DMULU_RESTORE \
+ mov.l @r15+,r13; \
+ mov.l @r15+,r12; \
+ mov.l @r15+,r11; \
+ mov.l @r15+,r10
#else /* ! __sh1__ */
+/* branch with two-argument delay slot insn */
#define SL(branch, dest, in_slot, in_slot_arg2) \
- branch##.s dest; in_slot, in_slot_arg2
+ branch##/s dest; in_slot, in_slot_arg2
+/* branch with one-argument delay slot insn */
#define SL1(branch, dest, in_slot) \
branch##/s dest; in_slot
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+ branch##/s dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg)
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+ branch##/s .+6; cmp1, cmp1arg2; cmp2, cmp2arg2
+#define DMULU_SAVE
+#define DMULUL(m1, m2, rl) dmulu.l m1,m2; sts macl,rl
+#define DMULUH(rh) sts mach,rh
+#define DMULU_RESTORE
#endif /* !__sh1__ */
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+/* don't #define DYN_SHIFT */
+ #define SHLL4(REG) \
+ shll2 REG; \
+ shll2 REG
+
+ #define SHLR4(REG) \
+ shlr2 REG; \
+ shlr2 REG
+
+ #define SHLL6(REG) \
+ shll2 REG; \
+ shll2 REG; \
+ shll2 REG
+
+ #define SHLR6(REG) \
+ shlr2 REG; \
+ shlr2 REG; \
+ shlr2 REG
+
+ #define SHLL12(REG) \
+ shll8 REG; \
+ SHLL4 (REG)
+
+ #define SHLR12(REG) \
+ shlr8 REG; \
+ SHLR4 (REG)
+
+ #define SHLR19(REG) \
+ shlr16 REG; \
+ shlr2 REG; \
+ shlr REG
+
+ #define SHLL23(REG) \
+ shll16 REG; \
+ shlr REG; \
+ shll8 REG
+
+ #define SHLR24(REG) \
+ shlr16 REG; \
+ shlr8 REG
+
+ #define SHLR21(REG) \
+ shlr16 REG; \
+ shll2 REG; \
+ add REG,REG;\
+ shlr8 REG
+
+ #define SHLL21(REG) \
+ shll16 REG; \
+ SHLL4 (REG); \
+ add REG,REG
+
+ #define SHLR11(REG) \
+ shlr8 REG; \
+ shlr2 REG; \
+ shlr REG
+
+ #define SHLR22(REG) \
+ shlr16 REG; \
+ shll2 REG; \
+ shlr8 REG
+
+ #define SHLR23(REG) \
+ shlr16 REG; \
+ add REG,REG;\
+ shlr8 REG
+
+ #define SHLR20(REG) \
+ shlr16 REG; \
+ SHLR4 (REG)
+
+ #define SHLL20(REG) \
+ shll16 REG; \
+ SHLL4 (REG)
+#define SHLD_COUNT(N,COUNT)
+#define SHLRN(N,COUNT,REG) SHLR##N(REG)
+#define SHLLN(N,COUNT,REG) SHLL##N(REG)
+#else
+#define SHLD_COUNT(N,COUNT) mov #N,COUNT
+#define SHLRN(N,COUNT,REG) shld COUNT,REG
+#define SHLLN(N,COUNT,REG) shld COUNT,REG
+#define DYN_SHIFT 1
+#endif
Index: gcc/config/sh/IEEE-754/m3/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divsf3.S (revision 0)
@@ -0,0 +1,360 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! divsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+! long 0th..3rd significant byte
+#ifdef __LITTLE_ENDIAN__
+#define L0SB 3
+#define L1SB 2
+#define L2SB 1
+#define L3SB 0
+#else
+#define L0SB 0
+#define L1SB 1
+#define L2SB 2
+#define L3SB 3
+#endif
+
+! clobbered: r0,r1,r2,r3,r6,r7,T (and for sh.md's purposes PR)
+!
+! Note: When the divisor is larger than the divident, we have to adjust the
+! exponent down by one. We do this automatically when subtracting the entire
+! exponent/fraction bitstring as an integer, by means of the borrow from
+! bit 23 to bit 24.
+! Note: non-denormal rounding of a division result cannot cause fraction
+! overflow / exponent change. (r4 > r5 : fraction must stay in (2..1] interval;
+! r4 < r5: having an extra bit of precision available, even the smallest
+! possible difference of the result from one is rounded in all rounding modes
+! to a fraction smaller than one.)
+! sh4-200: 59 cycles
+! sh4-300: 44 cycles
+! tab indent: exponent / sign computations
+! tab+space indent: fraction computation
+FUNC(GLOBAL(divsf3))
+ .global GLOBAL(divsf3)
+ .balign 4
+GLOBAL(divsf3):
+ mov.l LOCAL(x7f800000),r3
+ mov #1,r2
+ mov r4,r6
+ shll8 r6
+ mov r5,r7
+ shll8 r7
+ rotr r2
+ tst r3,r4
+ or r2,r6
+ bt/s LOCAL(denorm_arg0)
+ or r2,r7
+ tst r3,r5
+ bt LOCAL(denorm_arg1)
+ shlr r6
+ mov.l LOCAL(x3f000000),r3 ! bias minus explict leading 1
+ div0u
+LOCAL(denorm_done):
+ div1 r7,r6
+ mov.l r8,@-r15
+ bt 0f
+ div1 r7,r6
+0: mov.l r9,@-r15
+ div1 r7,r6
+ add r4,r3
+ div1 r7,r6
+ sub r5,r3 ! result sign/exponent minus 1 if no overflow/underflow
+ div1 r7,r6
+ or r3,r2
+ div1 r7,r6
+ mov.w LOCAL(xff00),r9
+ div1 r7,r6
+ mov.l r2,@-r15 ! L0SB is 0xff iff denorm / infinity exp is computed
+ div1 r7,r6
+ mov.w LOCAL(m23),r2
+ div1 r7,r6
+ mov r4,r0
+ div1 r7,r6
+ extu.b r6,r1
+ and r9,r6
+ swap.w r1,r1 ! first 8 bits of result fraction in bit 23..16
+ div1 r7,r6
+ shld r2,r0
+ div1 r7,r6
+ mov.b r0,@(L3SB,r15) ! 0xff iff divident was infinity / nan
+ div1 r7,r6
+ mov r5,r0
+ div1 r7,r6
+ shld r2,r0
+ div1 r7,r6
+ mov.b r0,@(L2SB,r15) ! 0xff iff divisor was infinity / nan
+ div1 r7,r6
+ mov r4,r0
+ div1 r7,r6
+ mov.w LOCAL(m31),r2
+ div1 r7,r6
+ extu.b r6,r8 ! second 8 bits of result fraction in bit 7..0
+ and r9,r6
+ mov.l LOCAL(xff800000),r9
+ div1 r7,r6
+ xor r5,r0 ! msb := correct result sign
+ div1 r7,r6
+ xor r3,r0 ! xor with sign of result sign/exponent word
+ div1 r7,r6
+ shad r2,r0
+ div1 r7,r6
+ mov.b r0,@(L1SB,r15) ! 0xff iff exponent over/underflows
+ and r9,r3 ! isolate sign / exponent
+ mov.w LOCAL(xff01),r2
+ div1 r7,r6
+ swap.b r8,r0 ! second 8 bits of result fraction in bit 15..8
+ div1 r7,r6
+ or r1,r0 ! first 16 bits of result fraction in bit 23..8
+ div1 r7,r6
+ mov.w LOCAL(m1),r9
+ div1 r7,r6
+ mov.l @r15+,r8 ! load encoding of unusal exponent conditions
+ and r6,r2 ! rest | result lsb
+ mov #0,r1
+ bf 0f ! bit below lsb clear -> no rounding
+ cmp/hi r1,r2
+0: extu.b r6,r1
+ or r1,r0 ! 24 bit result fraction with explicit leading 1
+ addc r3,r0 ! add in exponent / sign
+ cmp/str r9,r8
+ ! (no stall *here* for SH4-100 / SH4-200)
+ bt/s LOCAL(inf_nan_denorm_zero)
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+/* The exponennt adjustment for denormal numbers is done by leaving an
+ adjusted value in r3; r4/r5 are not changed. */
+ .balign 4
+LOCAL(denorm_arg0):
+ mov.w LOCAL(xff00),r1
+ sub r2,r6 ! 0x800000000 : remove implict 1
+ tst r6,r6
+ sts.l pr,@-r15
+ bt LOCAL(div_zero)
+ bsr LOCAL(clz)
+ mov r6,r0
+ shld r0,r6
+ tst r3,r5
+ mov.l LOCAL(x3f800000),r3 ! bias - 1 + 1
+ mov #23,r1
+ shld r1,r0
+ bt/s LOCAL(denorm_arg1_2)
+ sub r0,r3
+ shlr r6
+ bra LOCAL(denorm_done)
+ div0u
+
+LOCAL(denorm_arg1):
+ mov.l LOCAL(x3f000000),r3 ! bias - 1
+LOCAL(denorm_arg1_2):
+ sub r2,r7 ! 0x800000000 : remove implict 1
+ mov.w LOCAL(xff00),r1
+ tst r7,r7
+ sts.l pr,@-r15
+ bt LOCAL(div_by_zero)
+ bsr LOCAL(clz)
+ mov r7,r0
+ shld r0,r7
+ add #-1,r0
+ mov #23,r1
+ shld r1,r0
+ add r0,r3
+ shlr r6
+ bra LOCAL(denorm_done)
+ div0u
+
+ .balign 4
+LOCAL(inf_nan_denorm_zero):
+! r0 has the rounded result, r6 has the non-rounded lowest bits & rest.
+! the bit just below the LSB of r6 is available as ~Q
+
+! Alternative way to get at ~Q:
+! if rounding took place, ~Q must be set.
+! if the rest appears to be zero, ~Q must be set.
+! if the rest appears to be nonzero, but rounding didn't take place,
+! ~Q must be clear; the apparent rest will then require adjusting to test if
+! the actual rest is nonzero.
+ mov r0,r2
+ not r8,r0
+ tst #0xff,r0
+ shlr8 r0
+ mov.l @r15+,r8
+ bt/s LOCAL(div_inf_or_nan)
+ tst #0xff,r0
+ mov r4,r0
+ bt LOCAL(div_by_inf_or_nan)
+ add r0,r0
+ mov r5,r1
+ add r1,r1
+ cmp/hi r1,r0
+ mov r6,r0
+ bt LOCAL(overflow)
+ sub r2,r0
+ exts.b r0,r0 ! -1 if rounding took place
+ shlr8 r6 ! isolate div1-mangled rest
+ addc r2,r0 ! generate carry if rounding took place
+ shlr8 r7
+ sub r3,r0 ! pre-rounding fraction
+ bt 0f ! going directly to denorm_sticky would cause mispredicts
+ tst r6,r6 ! rest can only be zero if lost bit was set
+0: add r7,r6 ! (T ? corrupt : reconstruct) actual rest
+ bt 0f
+ cmp/pl r6
+0: mov.w LOCAL(m24),r1
+ addc r0,r0 ! put in sticky bit
+ add #-1,r3
+ mov.l LOCAL(x40000000),r6
+ add r3,r3
+ mov r0,r2
+ shad r1,r3 ! exponent ; s32.0
+ !
+ shld r3,r0
+ add #30,r3
+ cmp/pl r3
+ shld r3,r2
+ bf LOCAL(zero_nan) ! return zero
+ rotl r2
+ cmp/hi r6,r2
+ mov #0,r7
+ addc r7,r0
+ div0s r4,r5
+ rts
+ rotcr r0
+
+! ????
+! undo normal rounding (lowest bits still in r6). then do denormal rounding.
+
+LOCAL(overflow):
+ mov.l LOCAL(xff000000),r0
+ div0s r4,r5
+ rts
+ rotcl r0
+
+LOCAL(div_inf_or_nan):
+ mov r4,r0
+ bra LOCAL(nan_if_t)
+ add r0,r0
+
+LOCAL(div_by_inf_or_nan):
+ mov.l LOCAL(xff000000),r1
+ mov #0,r0
+ mov r5,r2
+ add r2,r2
+ bra LOCAL(nan_if_t)
+ cmp/hi r1,r2
+
+
+
+! still need to check for divide by zero or divide by nan
+! r3: 0x7f800000
+ .balign 4
+LOCAL(div_zero):
+ mov r5,r1
+ add r1,r1
+ tst r1,r1 ! 0 / 0 -> nan
+ not r5,r1
+ bt LOCAL(nan)
+ add r3,r3
+ cmp/hi r3,r1 ! 0 / nan -> nan (but 0 / inf -> 0)
+LOCAL(zero_nan):
+ mov #0,r0
+LOCAL(nan_if_t):
+ bf 0f:
+LOCAL(nan):
+ mov #-1,r0
+0: div0s r4,r5 ! compute sign
+ rts
+ rotcr r0 ! insert sign
+
+LOCAL(div_by_zero):
+ mov.l LOCAL(xff000000),r0
+ mov r5,r2
+ add r2,r2
+ bra LOCAL(nan_if_t)
+ cmp/hi r0,r2
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #8-8,r9
+ xtrct r0,r8
+ add #16,r9
+0: tst r1,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(m23): .word -23
+LOCAL(m24): .word -24
+LOCAL(m31): .word -31
+LOCAL(xff01): .word 0xff01
+ .balign 4
+LOCAL(xff000000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(xff00): .word 0xff00
+LOCAL(m1): .word -1
+#else
+LOCAL(m1): .word -1
+LOCAL(xff00): .word 0xff00
+#endif
+LOCAL(x7f800000): .long 0x7f800000
+LOCAL(x3f000000): .long 0x3f000000
+LOCAL(x3f800000): .long 0x3f800000
+LOCAL(xff800000): .long 0xff800000
+LOCAL(x40000000): .long 0x40000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(divsf3))
Index: gcc/config/sh/IEEE-754/m3/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3.S (revision 0)
@@ -0,0 +1,603 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* y = 1/x ; x (- [1,2)
+ y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+ y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
+ y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+ z0 = y2*a ; a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+ z1 = y2*a1 (round to nearest odd 0.5 ulp);
+ a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+ z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+ Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+ with suitable scaling and/or top truncation.
+ We use a slightly modified algorithm here that checks if the lower
+ bits in z1 are sufficient to determine the outcome of rounding - in that
+ case a2 is not computed.
+ -z1 is computed in units of 1/128 ulp, with an error in the range
+ -0x3.e/128 .. +0 ulp.
+ Thus, after adding three, the result can be safely rounded for normal
+ numbers if any of the bits 5..2 is set, or if the highest guard bit
+ (bit 6 if y <1, otherwise bit 7) is set.
+ (Because of the way truncation works, we would be fine for an open
+ error interval of (-4/128..+1/128) ulp )
+ For denormal numbers, the rounding point lies higher, but it would be
+ quite cumbersome to calculate where exactly; it is sufficient if any
+ of the bits 7..3 is set.
+ x truncated to 20 bits is sufficient to calculate y0 or even y1.
+ Table entries are adjusted by about +128 to use full signed byte range.
+ This adjustment has been perturbed slightly to allow cse with the
+ shift count constant -26.
+ The threshold point for the shift adjust before rounding is found by
+ comparing the fractions, which is exact, unlike the top bit of y2.
+ Therefore, the top bit of y2 becomes slightly random after the adjustment
+ shift, but that's OK because this can happen only at the boundaries of
+ the interval, and the biasing of the error means that it can in fact happen
+ only at the bottom end. And there, the carry propagation will make sure
+ that in the end we will have in effect an implicit 1 (or two whem rounding
+ up...) */
+/* If an exact result exists, it can have no more bits than the divident.
+ Hence, we don't need to bother with the round-to-even tie breaker
+ unless the result is denormalized. */
+/* 64 cycles through main path for sh4-300 (about 93.7% of normalized numbers),
+ 82 for the path for rounding tie-breaking for normalized numbers
+ (including one branch mispredict).
+ Some cycles might be saved by more careful register allocation. */
+
+#define x_h r12
+#define yn r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too. We still have to come back to denorm_arg1_done,
+ since we heven't done any of the work yet that we do till the denorm_arg0
+ entry point. We know that neither of the arguments is inf/nan, but
+ arg0 might be zero. Check for that first to avoid having to establish an
+ rts return address. */
+LOCAL(both_denorm):
+ mov.l r9,@-r15
+ mov DBL0H,r1
+ mov.l r0,@-r15
+ shll2 r1
+ mov.w LOCAL(both_denorm_cleanup_off),r9
+ or DBL0L,r1
+ tst r1,r1
+ mov DBL0H,r0
+ bf/s LOCAL(zero_denorm_arg0_1)
+ shll2 r0
+ mov.l @(4,r15),r9
+ add #8,r15
+ bra LOCAL(ret_inf_nan_0)
+ mov r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+ mov.l @r15+,r0
+ !
+ mov.l @r15+,r9
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ bra LOCAL(denorm_arg1_done)
+ !
+ add r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+ (implicit 1). To leave the result exponent unaltered, the other
+ argument's exponent is adjusted by the the shift count. */
+
+ .balign 4
+LOCAL(arg0_tiny):
+ bsr LOCAL(clz)
+ mov DBL0L,r0
+ shll DBL0H
+ add #1,r0
+ mov DBL0L,DBL0H
+ shld r0,DBL0H
+ rotcr DBL0H
+ tst DBL0L,DBL0L /* Check for divide of zero. */
+ add #-33,r0
+ shld r0,DBL0L
+ bf/s LOCAL(adjust_arg1_exp)
+ add #64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign. */
+ mov.l @r15+,r10
+ mov #0,DBLRH
+ mov.l @r15+,r9
+ bra LOCAL(ret_inf_nan_0)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(arg1_tiny):
+ bsr LOCAL(clz)
+ mov DBL1L,r0
+ shll DBL1H
+ add #1,r0
+ mov DBL1L,DBL1H
+ shld r0,DBL1H
+ rotcr DBL1H
+ tst DBL1L,DBL1L /* Check for divide by zero. */
+ add #-33,r0
+ shld r0,DBL1L
+ bf/s LOCAL(adjust_arg0_exp)
+ add #64,r0
+ mov DBL0H,r0
+ add r0,r0
+ tst r0,r0 ! 0 / 0 ?
+ mov #-1,DBLRH
+ bf LOCAL(return_inf)
+ !
+ bt LOCAL(ret_inf_nan_0)
+ !
+
+ .balign 4
+LOCAL(zero_denorm_arg1):
+ not DBL0H,r3
+ mov DBL1H,r0
+ tst r2,r3
+ shll2 r0
+ bt LOCAL(early_inf_nan_arg0)
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg1_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ !
+ shll DBL1H
+ mov DBL1L,r3
+ shld r0,DBL1H
+ shld r0,DBL1L
+ rotcr DBL1H
+ add #-32,r0
+ shld r0,r3
+ add #32,r0
+ or r3,DBL1H
+LOCAL(adjust_arg0_exp):
+ tst r2,DBL0H
+ mov #20,r3
+ shld r3,r0
+ bt LOCAL(both_denorm)
+ add DBL0H,r0
+ div0s r0,DBL0H ! Check for obvious overflow. */
+ not r0,r3 ! Check for more subtle overflow - lest
+ bt LOCAL(return_inf)
+ mov r0,DBL0H
+ tst r2,r3 ! we mistake it for NaN later
+ mov #12,r3
+ bf LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign. */
+ mov #20,r3
+ mov #-2,DBLRH
+ bra LOCAL(ret_inf_nan_0)
+ shad r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan nan/x -> nan */
+LOCAL(inf_nan_arg0):
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+LOCAL(early_inf_nan_arg0):
+ not DBL1H,r3
+ mov DBL0H,DBLRH
+ tst r2,r3 ! both inf/nan?
+ add DBLRH,DBLRH
+ bf LOCAL(ret_inf_nan_0)
+ mov #-1,DBLRH
+LOCAL(ret_inf_nan_0):
+ mov #0,DBLRL
+ mov.l @r15+,r12
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+/* Already handled: inf/x, nan/x . Thus: x/inf -> 0; x/nan -> nan */
+ .balign 4
+LOCAL(inf_nan_arg1):
+ mov DBL1H,r2
+ mov #12,r1
+ shld r1,r2
+ mov.l @r15+,r10
+ mov #0,DBLRL
+ mov.l @r15+,r9
+ or DBL1L,r2
+ mov.l @r15+,r8
+ cmp/hi DBLRL,r2
+ mov.l @r15+,r12
+ subc DBLRH,DBLRH
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(zero_denorm_arg0):
+ mov.w LOCAL(denorm_arg0_done_off),r9
+ not DBL1H,r1
+ mov DBL0H,r0
+ tst r2,r1
+ shll2 r0
+ bt LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg0_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ shll DBL0H
+ mov DBL0L,r12
+ shld r0,DBL0H
+ shld r0,DBL0L
+ rotcr DBL0H
+ add #-32,r0
+ shld r0,r12
+ add #32,r0
+ or r12,DBL0H
+LOCAL(adjust_arg1_exp):
+ mov #20,r12
+ shld r12,r0
+ add DBL1H,r0
+ div0s r0,DBL1H ! Check for obvious underflow. */
+ not r0,r12 ! Check for more subtle underflow - lest
+ bt LOCAL(return_0)
+ mov r0,DBL1H
+ tst r2,r12 ! we mistake it for NaN later
+ bt LOCAL(return_0)
+ !
+ braf r9
+ mov #13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00): .word 0xff00
+LOCAL(denorm_arg0_done_off):
+ .word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+ .word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign 8
+GLOBAL(divdf3):
+ mov.l LOCAL(x7ff00000),r2
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst r2,DBL1H
+ mov.l r12,@-r15
+ bt LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov DBL1H,x_h ! x_h live in r12
+ shld r3,x_h ! x - 1 ; u0.20
+ mov x_h,yn
+ mova LOCAL(ytab),r0
+ mov.l r8,@-r15
+ shld r1,yn ! x-1 ; u26.6
+ mov.b @(r0,yn),yn
+ mov #6,r0
+ mov.l r9,@-r15
+ mov x_h,r8
+ mov.l r10,@-r15
+ shlr16 x_h ! x - 1; u16.16 ! x/2 - 0.5 ; u15.17
+ add x_h,r1 ! SH4-200 single-issues this insn
+ shld r0,yn
+ sub r1,yn ! yn := y0 ; u15.17
+ mov DBL1L,r1
+ mov #-20,r10
+ mul.l yn,x_h ! r12 dead
+ swap.w yn,r9
+ shld r10,r1
+ sts macl,r0 ! y0 * (x-1) - n ; u-1.32
+ add r9,r0 ! y0 * x - 1 ; s-1.32
+ tst r2,DBL0H
+ dmuls.l r0,yn
+ mov.w LOCAL(d13),r0
+ or r1,r8 ! x - 1; u0.32
+ add yn,yn ! yn = y0 ; u14.18
+ bt LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):
+ sts mach,r1 ! d0 ; s14.18
+ sub r1,yn ! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov DBL0L,r12
+ shld r0,yn ! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld r10,r12
+ mov yn,r0
+ mov DBL0H,r8
+ add yn,yn ! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts mach,r1 ! y1 * (x-1); u1.31
+ add r0,r1 ! y1 * x ; u1.31
+ dmulu.l yn,r1
+ not DBL0H,r10
+ shld r9,r8
+ tst r2,r10
+ or r8,r12 ! a - 1; u0.32
+ bt LOCAL(inf_nan_arg0)
+ sts mach,r1 ! d1+yn; u1.31
+ sett ! adjust y2 so that it can be interpreted as s1.31
+ not DBL1H,r10
+ subc r1,yn ! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst r2,r10
+ or DBL1H,r2
+ bt LOCAL(inf_nan_arg1)
+ mov.l r11,@-r15
+ sts mach,r12 ! y2*(a-1) ; u1.31
+ add yn,r12 ! z0 ; u1.31
+ dmulu.l r12,DBL1L
+ mov.l LOCAL(x40000000),DBLRH ! bias + 1
+ and r9,r2 ! x ; u12.20
+ cmp/hi DBL0L,DBL1L
+ sts macl,r8
+ mov #-24,r11
+ sts mach,r9 ! r9:r8 := z0 * DBL1L; u-19.64
+ subc DBL1H,DBLRH
+ mul.l r12,r2 ! (r9+macl):r8 == z0*x; u-19.64
+ shll r8
+ add DBL0H,DBLRH ! result sign/exponent + 1
+ mov r8,r10
+ sts macl,DBLRL
+ add DBLRL,r9
+ rotcl r9 ! r9:r8 := z*x; u-20.63
+ shld r11,r10
+ mov.l LOCAL(x7fe00000),DBLRL
+ sub DBL0L,r9 ! r9:r8 := -a ; u-20.63
+ cmp/pz r9 ! In corner cases this shift can loose ..
+ shll8 r9 ! .. the sign, so check it first.
+ mov.l LOCAL(x00200000),r11
+ or r10,r9 ! -a1 ; s-28.32
+ mov.l LOCAL(x00100000),r10
+ dmulu.l r9,yn ! sign for r9 is in T
+ xor DBL0H,DBL1H ! calculate expected sign & bit20
+ mov.w LOCAL(d120),DBL0H ! to test bits 6..4
+ xor DBLRH,DBL1H
+ !
+ sts mach,DBL0L ! -z1 ; s-27.32
+ bt 0f
+ sub yn,DBL0L ! multiply adjust for -a1 negative; r3 dies here
+0:tst r10,DBL1H ! set T if a >= x
+ mov.l LOCAL(xfff00000),r3
+ bt 0f
+ add DBL0L,DBL0L ! z1 ; s-27.32 / s-28.32
+0:bt 0f
+ add r12,r12 ! z0 ; u1.31 / u0.31
+0:add #6-64,DBL0L
+ and r3,DBLRH ! isolate sign / exponent
+ tst DBL0H,DBL0L
+ bf/s LOCAL(exact) ! make the hot path taken for best branch prediction
+ cmp/pz DBL1H
+
+! Unless we follow the next branch, we need to test which way the rounding
+! should go.
+! For normal numbers, we know that the result is not exact, so the sign
+! of the rest will be conclusive.
+! We generate a number that looks safely rounded so that denorm handling
+! can safely test the number twice.
+! r10:r8 == 0 will indicate if the number was exact, which can happen
+! when we come here for denormals to check a number that is close or
+! equal to a result in whole ulps.
+ bf LOCAL(ret_denorm_inf) ! denorm or infinity, DBLRH has inverted sign
+ add #64,DBL0L
+LOCAL(find_adjust): tst r10,DBL1H ! set T if a >= x
+ mov #-2,r10
+ addc r10,r10
+ mov DBL0L,DBLRL ! z1 ; s-27.32 / s-28.32 ; lower 4 bits unsafe.
+ shad r10,DBLRL ! tentatively rounded z1 ; s-24.32
+ shll8 r8 ! r9:r8 := -a1 ; s-28.64
+ clrt
+ dmuls.l DBLRL,DBL1L ! DBLRL signed, DBL1L unsigned
+ mov r8,r10
+ shll16 r8 ! r8 := lowpart of -a1 ; s-44.48
+ xtrct r9,r10 ! r10 := highpart of -a1 ; s-44.48
+ !
+ sts macl,r3
+ subc r3,r8
+ sts mach,r3
+ subc r3,r10
+ cmp/pz DBL1L
+ mul.l DBLRL,r2
+ bt 0f
+ sub DBLRL,r10 ! adjust for signed/unsigned multiply
+0: mov.l LOCAL(x7fe00000),DBLRL
+ mov #-26,r2
+ sts macl,r9
+ sub r9,r10 ! r10:r8 := -a2
+ add #-64+16,DBL0L ! the denorm code negates this adj. for exact results
+ shld r2,r10 ! convert sign into adjustment in the range 32..63
+ sub r10,DBL0L
+ cmp/pz DBL1H
+
+ .balign 4
+LOCAL(exact):
+ bf LOCAL(ret_denorm_inf) ! denorm or infinity, DBLRH has inverted sign
+ tst DBLRL,DBLRH
+ bt LOCAL(ret_denorm_inf) ! denorm, DBLRH has correct sign
+ mov #-7,DBL1H
+ cmp/pz DBL0L ! T is sign extension of z1
+ not DBL0L,DBLRL
+ subc r11,DBLRH ! calculate sign / exponent minus implicit 1 minus T
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ shad DBL1H,DBLRL
+ mov.l @r15+,r9
+ mov #-11,DBL1H
+ mov r12,r8 ! z0 contributes to DBLRH and DBLRL
+ shld DBL1H,r12
+ mov #21,DBL1H
+ clrt
+ shld DBL1H,r8
+ addc r8,DBLRL
+ mov.l @r15+,r8
+ addc r12,DBLRH
+ rts
+ mov.l @r15+,r12
+
+! sign in DBLRH ^ DBL1H
+! If the last 7 bits are in the range 64..64+7, we might have an exact
+! value in the preceding bits - or we might not. For denorms, we need to
+! find out.
+! if r10:r8 is zero, we just have found out that there is an exact value.
+ .balign 4
+LOCAL(ret_denorm_inf):
+ mov DBLRH,r3
+ add r3,r3
+ div0s DBL1H,r3
+ mov #120,DBLRL
+ bt LOCAL(ret_inf_late)
+ add #64,DBL0L
+ tst DBLRL,DBL0L
+ mov #-21,DBLRL
+ bt LOCAL(find_adjust)
+ or r10,r8
+ tst r8,r8 ! check if find_adjust found an exact value.
+ shad DBLRL,r3
+ bf 0f
+ add #-16,DBL0L ! if yes, cancel adjustment
+0: mov #-8,DBLRL ! remove the three lowest (inexact) bits
+ and DBLRL,DBL0L
+ add #-2-11,r3 ! shift count for denorm generation
+ mov DBL0L,DBLRL
+ mov #28,r2
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ shll2 DBLRL
+ mov.l @r15+,r9
+ shld r2,DBL0L
+ mov.l @r15+,r8
+ mov #-31,r2
+ cmp/ge r2,r3
+ shll2 DBLRL
+ bt/s 0f
+ add DBL0L,r12 ! fraction in r12:DBLRL ; u1.63
+ negc DBLRL,DBLRL ! T := DBLRL != 0
+ add #31,r3
+ mov r12,DBLRL
+ rotcl DBLRL ! put in sticky bit
+ movt r12
+ cmp/ge r2,r3
+ bt/s LOCAL(return_0_late)
+0: div0s DBL1H,DBLRH ! calculate sign
+ mov r12,DBLRH
+ shld r3,DBLRH
+ mov DBLRL,r2
+ shld r3,DBLRL
+ add #32,r3
+ add DBLRH,DBLRH
+ mov.l LOCAL(x80000000),DBL1H
+ shld r3,r12
+ rotcr DBLRH ! combine sign with highpart
+ add #-1,r3
+ shld r3,r2
+ mov #0,r3
+ rotl r2
+ cmp/hi DBL1H,r2
+ addc r12,DBLRL
+ mov.l @r15+,r12
+ rts
+ addc r3,DBLRH
+
+LOCAL(ret_inf_late):
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ mov DBLRH,DBL0H
+ mov.l @r15+,r9
+ bra LOCAL(return_inf)
+ mov.l @r15+,r8
+
+LOCAL(return_0_late):
+ div0s DBLRH,DBL1H
+ mov.l @r15+,r12
+ mov #0,DBLRH
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #21,r9
+ xtrct r0,r8
+ add #-16,r9
+0: tst r12,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #-8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(d1): .word 1
+LOCAL(d12): .word 12
+LOCAL(d13): .word 13
+LOCAL(d120): .word 120
+
+ .balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+ .byte 120, 105, 91, 78, 66, 54, 43, 33
+ .byte 24, 15, 8, 0, -5, -12, -17, -22
+ .byte -27, -31, -34, -37, -40, -42, -44, -45
+ .byte -46, -46, -47, -46, -46, -45, -44, -42
+ .byte -41, -39, -36, -34, -31, -28, -24, -20
+ .byte -17, -12, -8, -4, 0, 5, 10, 16
+ .byte 21, 27, 33, 39, 45, 52, 58, 65
+ .byte 72, 79, 86, 93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssisf.S (revision 0)
@@ -0,0 +1,89 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsisf))
+ .global GLOBAL(floatunsisf)
+ .balign 4
+GLOBAL(floatunsisf):
+ mov.l LOCAL(c_clz_tab),r0
+ extu.w r4,r1
+ mov.w LOCAL(xff00),r3
+ cmp/eq r4,r1
+ mov #24,r2
+ bt 0f
+ mov r4,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r1
+ mov r4,r0
+ mov.l LOCAL(x4a800000),r3 ! bias + 23 - implicit 1
+ tst r4,r4
+ bt LOCAL(ret0)
+ !
+ sub r1,r2
+ mov.l LOCAL(x80000000),r1
+ shld r2,r0
+ cmp/pz r2
+ add r3,r0
+ bt LOCAL(noround)
+ add #31,r2
+ shld r2,r4
+ rotl r4
+ add #-31,r2
+ cmp/hi r1,r4
+ mov #0,r3
+ addc r3,r0
+LOCAL(noround):
+ mov #23,r1
+ shld r1,r2
+ rts
+ sub r2,r0
+LOCAL(ret0):
+ rts
+ nop
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsisf))
Index: gcc/config/sh/IEEE-754/m3/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssidf.S (revision 0)
@@ -0,0 +1,91 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! floatunssidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsidf))
+ .global GLOBAL(floatunsidf)
+ .balign 4
+GLOBAL(floatunsidf):
+ mov.l LOCAL(c_clz_tab),r0
+ extu.w r4,r1
+ mov.w LOCAL(0xff00),r3
+ cmp/eq r4,r1
+ mov #21,r2
+ bt 0f
+ mov r4,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r5
+ mov r4,DBLRL
+ mov.l LOCAL(x41200000),r3 ! bias + 20 - implicit 1
+ tst r4,r4
+ mov r4,DBLRH
+ bt LOCAL(ret0)
+ sub r5,r2
+ mov r2,r5
+ shld r2,DBLRH
+ cmp/pz r2
+ add r3,DBLRH
+ add #32,r2
+ shld r2,DBLRL
+ bf 0f
+ mov.w LOCAL(d0),DBLRL
+0: mov #20,r2
+ shld r2,r5
+ rts
+ sub r5,DBLRH
+LOCAL(ret0):
+ mov r4,DBLRL
+ rts
+ mov r4,DBLRH
+
+LOCAL(0xff00): .word 0xff00
+ .balign 4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0): .word 0
+ .word 0x4120
+#else
+ .word 0x4120
+LOCAL(d0): .word 0
+#endif
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsidf))
Index: gcc/config/sh/IEEE-754/m3/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixunsdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixunsdfsi.S (revision 0)
@@ -0,0 +1,77 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!! fixunsdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixunsdfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get INT_MAX, for set sign bit, you get INT_MIN.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixunsdfsi)
+ FUNC(GLOBAL(fixunsdfsi))
+ .balign 4
+GLOBAL(fixunsdfsi):
+ mov.w LOCAL(x413),r1 ! bias + 20
+ mov DBL0H,r0
+ shll DBL0H
+ mov.l LOCAL(mask),r3
+ mov #-21,r2
+ shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
+ bt/s LOCAL(ret0)
+ sub r1,DBL0H
+ cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
+ and r3,r0
+ bf/s LOCAL(ignore_low)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #11,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmax)
+ shld DBL0H,DBL0L
+ rts
+ or DBL0L,r0
+
+ .balign 8
+LOCAL(ignore_low):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ add #1,r0
+ bf 0f
+LOCAL(ret0): mov #0,r0 ! results in 0 return
+0: rts
+ shld DBL0H,r0
+
+LOCAL(retmax):
+ rts
+ mov #-1,r0
+
+LOCAL(x413): .word 0x413
+
+ .balign 4
+LOCAL(mask): .long 0x000fffff
+ ENDFUNC(GLOBAL(fixunsdfsi))
+#endif /* L_fixunsdfsi */
Index: gcc/config/sh/IEEE-754/m3/divdf3-rt.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3-rt.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3-rt.S (revision 0)
@@ -0,0 +1,514 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* This version is not quite finshed, since I've found that I can
+ get better average performance with a slightly altered algorithm.
+ Still, if you want a version for hard real time, this version here might
+ be a good starting point, since it has effectively no conditional
+ branches in the path that deals with normal numbers
+ (branches with zero offset are effectively conditional execution),
+ and thus it has a uniform execution time in this path. */
+
+/* y = 1/x ; x (- [1,2)
+ y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+ y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
+ y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+ z0 = y2*a ; a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+ z1 = y2*a1 (round to nearest odd 0.5 ulp);
+ a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+ z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+ Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+ with suitable scaling and/or top truncation.
+ x truncated to 20 bits is sufficient to calculate y0 or even y1.
+ Table entries are adjusted by about +128 to use full signed byte range.
+ This adjustment has been perturbed slightly to allow cse with the
+ shift count constant -26.
+ The threshold point for the shift adjust before rounding is found by
+ comparing the fractions, which is exact, unlike the top bit of y2.
+ Therefore, the top bit of y2 becomes slightly random after the adjustment
+ shift, but that's OK because this can happen only at the boundaries of
+ the interval, and the baising of the error means that it can in fact happen
+ only at the bottom end. And there, the carry propagation will make sure
+ that in the end we will have in effect an implicit 1 (or two whem rounding
+ up...) */
+/* If an exact result exists, it can have no more bits than the divident.
+ Hence, we don't need to bother with the round-to-even tie breaker
+ unless the result is denormalized. */
+/* 70 cycles through main path for sh4-300 . Some cycles might be
+ saved by more careful register allocation.
+ 122 cycles for sh4-200. If execution time for sh4-200 is of concern,
+ a specially scheduled version makes sense. */
+
+#define x_h r12
+#define yn r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too. We still have to come back to denorm_arg1_done,
+ since we heven't done any of the work yet that we do till the denorm_arg0
+ entry point. We know that neither of the arguments is inf/nan, but
+ arg0 might be zero. Check for that first to avoid having to establish an
+ rts return address. */
+LOCAL(both_denorm):
+ mov.l r9,@-r15
+ mov DBL0H,r1
+ mov.l r0,@-r15
+ shll2 r1
+ mov.w LOCAL(both_denorm_cleanup_off),r9
+ or DBL0L,r1
+ tst r1,r1
+ mov DBL0H,r0
+ bf/s LOCAL(zero_denorm_arg0_1)
+ shll2 r0
+ mov.l @(4,r15),r9
+ add #8,r15
+ bra LOCAL(ret_inf_nan_0)
+ mov r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+ mov.l @r15+,r0
+ !
+ mov.l @r15+,r9
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ bra LOCAL(denorm_arg1_done)
+ !
+ add r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+ (implicit 1). To leave the result exponent unaltered, the other
+ argument's exponent is adjusted by the the shift count. */
+
+ .balign 4
+LOCAL(arg0_tiny):
+ bsr LOCAL(clz)
+ mov DBL0L,r0
+ shll DBL0H
+ add #1,r0
+ mov DBL0L,DBL0H
+ shld r0,DBL0H
+ rotcr DBL0H
+ tst DBL0L,DBL0L /* Check for divide of zero. */
+ add #-33,r0
+ shld r0,DBL0L
+ bf/s LOCAL(adjust_arg1_exp)
+ add #64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign. */
+ mov.l @r15+,r10
+ mov #0,DBLRH
+ mov.l @r15+,r9
+ bra LOCAL(ret_inf_nan_0)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(arg1_tiny):
+ bsr LOCAL(clz)
+ mov DBL1L,r0
+ shll DBL1H
+ add #1,r0
+ mov DBL1L,DBL1H
+ shld r0,DBL1H
+ rotcr DBL1H
+ tst DBL1L,DBL1L /* Check for divide by zero. */
+ add #-33,r0
+ shld r0,DBL1L
+ bf/s LOCAL(adjust_arg0_exp)
+ add #64,r0
+ mov DBL0H,r0
+ add r0,r0
+ tst r0,r0 ! 0 / 0 ?
+ mov #-1,DBLRH
+ bf LOCAL(return_inf)
+ !
+ bt LOCAL(ret_inf_nan_0)
+ !
+
+ .balign 4
+LOCAL(zero_denorm_arg1):
+ not DBL0H,r3
+ mov DBL1H,r0
+ tst r2,r3
+ shll2 r0
+ bt LOCAL(early_inf_nan_arg0)
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg1_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ !
+ shll DBL1H
+ mov DBL1L,r3
+ shld r0,DBL1H
+ shld r0,DBL1L
+ rotcr DBL1H
+ add #-32,r0
+ shld r0,r3
+ add #32,r0
+ or r3,DBL1H
+LOCAL(adjust_arg0_exp):
+ tst r2,DBL0H
+ mov #20,r3
+ shld r3,r0
+ bt LOCAL(both_denorm)
+ add DBL0H,r0
+ div0s r0,DBL0H ! Check for obvious overflow. */
+ not r0,r3 ! Check for more subtle overflow - lest
+ bt LOCAL(return_inf)
+ mov r0,DBL0H
+ tst r2,r3 ! we mistake it for NaN later
+ mov #12,r3
+ bf LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign. */
+ mov #20,r3
+ mov #-2,DBLRH
+ bra LOCAL(ret_inf_nan_0)
+ shad r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan nan/x -> nan */
+LOCAL(inf_nan_arg0):
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+LOCAL(early_inf_nan_arg0):
+ not DBL1H,r3
+ mov DBL0H,DBLRH
+ tst r2,r3 ! both inf/nan?
+ add DBLRH,DBLRH
+ bf LOCAL(ret_inf_nan_0)
+ mov #-1,DBLRH
+LOCAL(ret_inf_nan_0):
+ mov #0,DBLRL
+ mov.l @r15+,r12
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+/* Already handled: inf/x, nan/x . Thus: x/inf -> 0; x/nan -> nan */
+ .balign 4
+LOCAL(inf_nan_arg1):
+ mov DBL1H,r2
+ mov #12,r1
+ shld r1,r2
+ mov.l @r15+,r10
+ mov #0,DBLRL
+ mov.l @r15+,r9
+ or DBL1L,r2
+ mov.l @r15+,r8
+ cmp/hi DBLRL,r2
+ mov.l @r15+,r12
+ subc DBLRH,DBLRH
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(zero_denorm_arg0):
+ mov.w LOCAL(denorm_arg0_done_off),r9
+ not DBL1H,r1
+ mov DBL0H,r0
+ tst r2,r1
+ shll2 r0
+ bt LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg0_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ shll DBL0H
+ mov DBL0L,r12
+ shld r0,DBL0H
+ shld r0,DBL0L
+ rotcr DBL0H
+ add #-32,r0
+ shld r0,r12
+ add #32,r0
+ or r12,DBL0H
+LOCAL(adjust_arg1_exp):
+ mov #20,r12
+ shld r12,r0
+ add DBL1H,r0
+ div0s r0,DBL1H ! Check for obvious underflow. */
+ not r0,r12 ! Check for more subtle underflow - lest
+ bt LOCAL(return_0)
+ mov r0,DBL1H
+ tst r2,r12 ! we mistake it for NaN later
+ bt LOCAL(return_0)
+ !
+ braf r9
+ mov #13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00): .word 0xff00
+LOCAL(denorm_arg0_done_off):
+ .word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+ .word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign 8
+GLOBAL(divdf3):
+ mov.l LOCAL(x7ff00000),r2
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst r2,DBL1H
+ mov.l r12,@-r15
+ bt LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov DBL1H,x_h ! x_h live in r12
+ shld r3,x_h ! x - 1 ; u0.20
+ mov x_h,yn
+ mova LOCAL(ytab),r0
+ mov.l r8,@-r15
+ shld r1,yn ! x-1 ; u26.6
+ mov.b @(r0,yn),yn
+ mov #6,r0
+ mov.l r9,@-r15
+ mov x_h,r8
+ mov.l r10,@-r15
+ shlr16 x_h ! x - 1; u16.16 ! x/2 - 0.5 ; u15.17
+ add x_h,r1 ! SH4-200 single-issues this insn
+ shld r0,yn
+ sub r1,yn ! yn := y0 ; u15.17
+ mov DBL1L,r1
+ mov #-20,r10
+ mul.l yn,x_h ! r12 dead
+ swap.w yn,r9
+ shld r10,r1
+ sts macl,r0 ! y0 * (x-1) - n ; u-1.32
+ add r9,r0 ! y0 * x - 1 ; s-1.32
+ tst r2,DBL0H
+ dmuls.l r0,yn
+ mov.w LOCAL(d13),r0
+ or r1,r8 ! x - 1; u0.32
+ add yn,yn ! yn = y0 ; u14.18
+ bt LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done): ! This label must stay aligned.
+ sts mach,r1 ! d0 ; s14.18
+ sub r1,yn ! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov DBL0L,r12
+ shld r0,yn ! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld r10,r12
+ mov yn,r0
+ mov DBL0H,r8
+ add yn,yn ! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts mach,r1 ! y1 * (x-1); u1.31
+ add r0,r1 ! y1 * x ; u1.31
+ dmulu.l yn,r1
+ not DBL0H,r10
+ shld r9,r8
+ tst r2,r10
+ or r8,r12 ! a - 1; u0.32
+ bt LOCAL(inf_nan_arg0)
+ sts mach,r1 ! d1+yn; u1.31
+ sett ! adjust y2 so that it can be interpreted as s1.31
+ not DBL1H,r10
+ subc r1,yn ! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst r2,r10
+ or DBL1H,r2
+ bt LOCAL(inf_nan_arg1)
+ mov.l r11,@-r15
+ sts mach,r11 ! y2*(a-1) ; u1.31
+ add yn,r11 ! z0 ; u1.31
+ dmulu.l r11,DBL1L
+ mov.l LOCAL(x40000000),DBLRH ! bias + 1
+ and r9,r2 ! x ; u12.20
+ cmp/hi DBL0L,DBL1L
+ sts macl,r8
+ mov #-24,r12
+ sts mach,r9 ! r9:r8 := z0 * DBL1L; u-19.64
+ subc DBL1H,DBLRH
+ mul.l r11,r2 ! (r9+macl):r8 == z0*x; u-19.64
+ shll r8
+ add DBL0H,DBLRH ! result sign/exponent + 1
+ mov r8,r10
+ sts macl,DBLRL
+ add DBLRL,r9
+ rotcl r9 ! r9:r8 := z*x; u-20.63
+ shld r12,r10
+ mov.l LOCAL(x7fe00000),DBLRL
+ sub DBL0L,r9 ! r9:r8 := -a ; u-20.63
+ mov.l LOCAL(x00200000),r12
+FIXME: the following shift might loose the sign.
+ shll8 r9
+ or r10,r9 ! -a1 ; s-28.32
+ mov.l LOCAL(x00100000),r10
+ dmuls.l r9,yn ! r3 dead
+ mov DBL1H,r3
+ mov.l LOCAL(xfff00000),DBL0L
+ xor DBL0H,r3 ! calculate expected sign & bit20
+ div0s r3,DBLRH
+ xor DBLRH,r3
+ bt LOCAL(ret_denorm_inf)
+ tst DBLRL,DBLRH
+ bt LOCAL(ret_denorm)
+ sub r12,DBLRH ! calculate sign / exponent minus implicit 1
+ tst r10,r3 ! set T if a >= x
+ sts mach,r12! -z1 ; s-27.32
+ bt 0f
+ add r11,r11 ! z0 ; u1.31 / u0.31
+0: mov #6,r3
+ negc r3,r10 ! shift count := a >= x ? -7 : -6; T := 1
+ shll8 r8 ! r9:r8 := -a1 ; s-28.64
+ shad r10,r12 ! -z1 ; truncate to s-20.32 / s-21.32
+ rotcl r12 ! -z1 ; s-21.32 / s-22.32 / round to odd 0.5 ulp ; T := sign
+ add #20,r10
+ dmulu.l r12,DBL1L ! r12 signed, DBL1L unsigned
+ and DBL0L,DBLRH ! isolate sign / exponent
+ shld r10,r9
+ mov r8,r3
+ shld r10,r8
+ sts macl,DBL0L
+ sts mach,DBLRL
+ add #-32,r10
+ shld r10,r3
+ mul.l r12,r2
+ bf 0f ! adjustment for signed/unsigned multiply
+ sub DBL1L,DBLRL ! DBL1L dead
+0: shar r12 ! -z1 ; truncate to s-20.32 / s-21.32
+ sts macl,DBL1L
+ or r3,r9 ! r9:r8 := -a1 ; s-41.64/s-42.64
+ !
+ cmp/hi r8,DBL0L
+ add DBLRL,DBL1L ! DBL1L:DBL0L := -z1*x ; s-41.64/s-42.64
+ subc DBL1L,r9
+ not r12,DBLRL ! z1, truncated to s-20.32 / s-21.32
+ shll r9 ! T := a2 > 0
+ mov r11,r2
+ mov #21,r7
+ shld r7,r11
+ addc r11,DBLRL
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ mov #-11,r7
+ mov.l @r15+,r9
+ shld r7,r2
+ mov.l @r15+,r8
+ addc r2,DBLRH
+ rts
+ mov.l @r15+,r12
+
+LOCAL(ret_denorm):
+ tst r10,DBLRH
+ bra LOCAL(denorm_have_count)
+ movt DBLRH ! calculate shift count (off by 2)
+
+LOCAL(ret_denorm_inf):
+ mov DBLRH,r12
+ add r12,r12
+ cmp/pz r12
+ mov #-21,DBLRL
+ bt LOCAL(ret_inf_late)
+ shld DBLRL,DBLRH
+LOCAL(denorm_have_count):
+ add #-2,DBLRH
+/* FIXME */
+ bra LOCAL(return_0)
+ mov.l @r15+,r11
+
+LOCAL(ret_inf_late):
+ mov.l @r15+,r11
+ !
+ mov.l @r15+,r10
+ !
+ mov.l @r15+,r9
+ bra LOCAL(return_inf)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #8-11,r9
+ xtrct r0,r8
+ add #16,r9
+0: tst r12,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(d1): .word 1
+LOCAL(d12): .word 12
+LOCAL(d13): .word 13
+
+ .balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+ .byte 120, 105, 91, 78, 66, 54, 43, 33
+ .byte 24, 15, 8, 0, -5, -12, -17, -22
+ .byte -27, -31, -34, -37, -40, -42, -44, -45
+ .byte -46, -46, -47, -46, -46, -45, -44, -42
+ .byte -41, -39, -36, -34, -31, -28, -24, -20
+ .byte -17, -12, -8, -4, 0, 5, 10, 16
+ .byte 21, 27, 33, 39, 45, 52, 58, 65
+ .byte 72, 79, 86, 93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/addsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/addsf3.S (revision 0)
@@ -0,0 +1,285 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! addsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+#ifdef L_add_sub_sf3
+ .balign 4
+ .global GLOBAL(subsf3)
+ FUNC(GLOBAL(subsf3))
+ .global GLOBAL(addsf3)
+ FUNC(GLOBAL(addsf3))
+GLOBAL(subsf3):
+ cmp/pz r5
+ add r5,r5
+ rotcr r5
+ .balign 4
+GLOBAL(addsf3):
+ mov.l LOCAL(x7f800000),r3
+ mov r4,r6
+ add r6,r6
+ mov r5,r7
+ add r7,r7
+ mov r4,r0
+ or r3,r0
+ cmp/hi r6,r7
+ mov r5,r1
+ bf/s LOCAL(r4_hs)
+ or r3,r1
+ cmp/eq r5,r1
+ bt LOCAL(ret_r5) /* sole Inf or NaN, return unchanged. */
+ shll8 r0 ! r4 fraction
+ shll8 r1 ! r5 fraction
+ mov r6,r3
+ mov #-24,r2
+ mov r7,r6
+ shld r2,r6 ! r5 exp
+ mov r0,r7
+ shld r2,r3 ! r4 exp
+ tst r6,r6
+ sub r6,r3 ! exp difference (negative or 0)
+ bt LOCAL(denorm_r4)
+LOCAL(denorm_r4_done): ! r1: u1.31
+ shld r3,r0 ! Get 31 upper bits, including 8 guard bits
+ mov.l LOCAL(xff000000),r2
+ add #31,r3
+ mov.l r5,@-r15 ! push result sign.
+ cmp/pl r3 ! r0 has no more than one bit set -> return arg 1
+ shld r3,r7 ! copy of lowest guard bit in r0 and lower guard bits
+ bf LOCAL(ret_stack)
+ div0s r4,r5
+ bf/s LOCAL(add)
+ cmp/pl r7 /* Is LSB in r0 clear, but any lower guards bit set? */
+ subc r0,r1
+ mov.l LOCAL(c__clz_tab),r7
+ tst r2,r1
+ mov #-24,r3
+ bf/s LOCAL(norm_r0)
+ mov r1,r0
+ extu.w r1,r1
+ bra LOCAL(norm_check2)
+ cmp/eq r0,r1
+LOCAL(ret_r5):
+ rts
+ mov r5,r0
+LOCAL(ret_stack):
+ rts
+ mov.l @r15+,r0
+
+/* We leave the numbers denormalized, but we change the bit position to be
+ consistent with normalized numbers. This also removes the spurious
+ leading one that was inserted before. */
+LOCAL(denorm_r4):
+ tst r3,r3
+ bf/s LOCAL(denorm_r4_done)
+ add r0,r0
+ bra LOCAL(denorm_r4_done)
+ add r1,r1
+LOCAL(denorm_r5):
+ tst r6,r6
+ add r1,r1
+ bf LOCAL(denorm_r5_done)
+ clrt
+ bra LOCAL(denorm_r5_done)
+ add r0,r0
+
+/* If the exponent differs by two or more, normalization is minimal, and
+ few guard bits are needed for an exact final result, so sticky guard
+ bit compresion before subtraction (or add) works fine.
+ If the exponent differs by one, only one extra guard bit is generated,
+ and effectively no guard bit compression takes place. */
+
+ .balign 4
+LOCAL(r4_hs):
+ cmp/eq r4,r0
+ mov #-24,r3
+ bt LOCAL(inf_nan_arg0)
+ shld r3,r7
+ shll8 r0
+ tst r7,r7
+ shll8 r1
+ mov.l LOCAL(xff000000),r2
+ bt/s LOCAL(denorm_r5)
+ shld r3,r6
+LOCAL(denorm_r5_done):
+ mov r1,r3
+ subc r6,r7
+ bf LOCAL(same_exp)
+ shld r7,r1 /* Get 31 upper bits. */
+ add #31,r7
+ mov.l r4,@-r15 ! push result sign.
+ cmp/pl r7
+ shld r7,r3
+ bf LOCAL(ret_stack)
+ div0s r4,r5
+ bf/s LOCAL(add)
+ cmp/pl r3 /* Is LSB in r1 clear, but any lower guard bit set? */
+ subc r1,r0
+ mov.l LOCAL(c__clz_tab),r7
+LOCAL(norm_check):
+ tst r2,r0
+ mov #-24,r3
+ bf LOCAL(norm_r0)
+ extu.w r0,r1
+ cmp/eq r0,r1
+LOCAL(norm_check2):
+ mov #-8,r3
+ bt LOCAL(norm_r0)
+ mov #-16,r3
+LOCAL(norm_r0):
+ mov r0,r1
+ shld r3,r0
+#ifdef __pic__
+ add r0,r7
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r7),r7
+ add #25,r3
+ add #-9+1,r6
+ mov r1,r0
+ sub r7,r3
+ mov.l LOCAL(xbfffffff),r7
+ sub r3,r6 /* generate exp-1 */
+ mov.w LOCAL(d24),r2
+ cmp/pz r6 /* check exp > 0 */
+ shld r3,r0 /* Leading 1 becomes +1 exp adjustment. */
+ bf LOCAL(zero_denorm)
+LOCAL(denorm_done):
+ add #30,r3
+ shld r3,r1
+ mov.w LOCAL(m1),r3
+ tst r7,r1 ! clear T if rounding up
+ shld r2,r6
+ subc r3,r0 ! round - overflow will boost exp adjustment to 2.
+ mov.l @r15+,r2
+ add r6,r0 ! overflow will generate inf
+ cmp/ge r2,r3 ! get sign into T
+ rts
+ rotcr r0
+LOCAL(ret_r4):
+ rts
+ mov r4,r0
+
+/* At worst, we are shifting the number back in place where an incoming
+ denormal was. Thus, the shifts won't get out of range. They still
+ might generate a zero fraction, but that's OK, that makes it 0. */
+LOCAL(zero_denorm):
+ add r6,r3
+ mov r1,r0
+ mov #0,r6 /* leading one will become free (except for rounding) */
+ bra LOCAL(denorm_done)
+ shld r3,r0
+
+/* Handle abs(r4) >= abs(r5), same exponents specially so we don't need
+ check for a zero fraction in the main path. */
+LOCAL(same_exp):
+ div0s r4,r5
+ mov.l r4,@-r15
+ bf LOCAL(add)
+ cmp/eq r1,r0
+ mov.l LOCAL(c__clz_tab),r7
+ bf/s LOCAL(norm_check)
+ sub r1,r0
+ rts ! zero difference -> return +zero
+ mov.l @r15+,r1
+
+/* r2: 0xff000000 */
+LOCAL(add):
+ addc r1,r0
+ mov.w LOCAL(x2ff),r7
+ shll8 r6
+ bf/s LOCAL(no_carry)
+ shll16 r6
+ tst r7,r0
+ shlr8 r0
+ mov.l @r15+,r3 ! discard saved sign
+ subc r2,r0
+ sett
+ addc r6,r0
+ cmp/hs r2,r0
+ bt/s LOCAL(inf)
+ div0s r7,r4 /* Copy sign. */
+ rts
+ rotcr r0
+LOCAL(inf):
+ mov r6,r0
+ rts
+ rotcr r0
+LOCAL(no_carry):
+ mov.w LOCAL(m1),r3
+ tst r6,r6
+ bt LOCAL(denorm_add)
+ add r0,r0
+ tst r7,r0 ! check if lower guard bit set or round to even
+ shlr8 r0
+ mov.l @r15+,r1 ! discard saved sign
+ subc r3,r0 ! round ; overflow -> exp++
+ cmp/ge r4,r3 /* Copy sign. */
+ add r6,r0 ! overflow -> inf
+ rts
+ rotcr r0
+
+LOCAL(denorm_add):
+ cmp/ge r4,r3 /* Copy sign. */
+ shlr8 r0
+ mov.l @r15+,r1 ! discard saved sign
+ rts
+ rotcr r0
+
+LOCAL(inf_nan_arg0):
+ cmp/eq r5,r1
+ bf LOCAL(ret_r4)
+ div0s r4,r5 /* Both are inf or NaN, check signs. */
+ bt LOCAL(ret_nan) /* inf - inf, or NaN. */
+ mov r4,r0 ! same sign; return NaN if either is NaN.
+ rts
+ or r5,r0
+LOCAL(ret_nan):
+ rts
+ mov #-1,r0
+
+LOCAL(d24):
+ .word 24
+LOCAL(x2ff):
+ .word 0x2ff
+LOCAL(m1):
+ .word -1
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(xbfffffff):
+ .long 0xbfffffff
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(xfe000000):
+ .long 0xfe000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+
+ ENDFUNC(GLOBAL(addsf3))
+ ENDFUNC(GLOBAL(subsf3))
+#endif /* L_add_sub_sf3 */
Index: gcc/config/sh/IEEE-754/m3/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/adddf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/adddf3.S (revision 0)
@@ -0,0 +1,582 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! adddf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4-200 without FPU, but can also be used for SH3.
+! Numbers with same sign are added in typically 37 cycles, worst case is
+! 43 cycles, unless there is an overflow, in which case the addition can
+! take up to takes 47 cycles.
+! Normal numbers with different sign are added in 56 (57 for PIC) cycles
+! or less on SH4.
+! If one of the inputs is a denormal, the worst case is 59 (60 for PIC)
+! cycles. (Two denormal inputs are faster than normal inputs, and
+! denormal outputs don't slow down computation).
+! Subtraction takes two cycles to negate the second input and then drops
+! through to addition.
+
+/* If the input exponents of a difference of two normalized numbers
+ differ by more than one, the output does not need to be adjusted
+ by more than one bit position. Hence, it makes sense to ensure that
+ the shifts by 0 & 1 are handled quickly to reduce average and worst
+ case times. */
+FUNC(GLOBAL(adddf3))
+FUNC(GLOBAL(subdf3))
+ .global GLOBAL(adddf3)
+ .global GLOBAL(subdf3)
+LOCAL(denorm_arg1):
+ bt LOCAL(inf_nan_arg0)
+ tst r0,r2
+ bt/s LOCAL(denorm_both)
+ shlr r1
+ mov.l LOCAL(x00100000),r3
+ bra LOCAL(denorm_arg1_done)
+ sub r2,r3
+
+! Handle denorm addition here because otherwise the ordinary addition would
+! have to check for denormal results.
+! Denormal subtraction could also be done faster, but the denorm subtraction
+! path here is still one cycles faster than the one for normalized input
+! numbers, and 16 instructions shorter than the fastest version.
+! Here we also generate +0.0 + +0.0 -> +0.0 ; -0.0 + -0.0 -> -0.0
+LOCAL(denorm_both):
+ div0s DBL0H,DBL1H
+ mov.l LOCAL(x800fffff),r9
+ bt/s LOCAL(denorm_sub)
+ and r1,DBL1H
+ and r9,DBL0H
+ mov.l @r15+,r9
+ mov DBL0L,DBLRL
+ mov DBL0H,DBLRH
+ addc DBL1L,DBLRL
+ mov.l @r15+,r8
+ rts
+ addc DBL1H,DBLRH
+
+! N.B., since subtraction also generates +0.0 for subtraction of numbers
+! with identical fractions, this also covers the +0.0 + -0.0 -> +0.0 /
+! -0.0 + +0.0 -> +0.0 cases.
+LOCAL(denorm_sub):
+ mov DBL0H,r8 ! tentative result sign
+ and r1,DBL0H
+ bra LOCAL(sub_same_exp)
+ addc r1,r2 ! exponent++, clear T
+
+LOCAL(inf_nan_arg0):
+ mov DBL0L,DBLRL
+ bra LOCAL(pop_r8_r9)
+ mov DBL0H,DBLRH
+
+LOCAL(ret_arg0):
+ mov.l LOCAL(x800fffff),DBLRH
+ mov DBL0L,DBLRL
+ mov r2,r3
+LOCAL(ret_arg):
+ mov.l @r15+,r9
+ and r8,DBLRH
+ mov.l @r15+,r8
+ rts
+ or r3,DBLRH
+
+ .balign 4
+GLOBAL(subdf3):
+ cmp/pz DBL1H
+ add DBL1H,DBL1H
+ rotcr DBL1H
+ nop
+
+GLOBAL(adddf3):
+ mov.l LOCAL(x7ff00000),r0
+ mov DBL0H,r2
+ mov.l LOCAL(x001fffff),r1
+ mov DBL1H,r3
+ mov.l r8,@-r15
+ and r0,r2
+ mov.l r9,@-r15
+ and r0,r3
+ cmp/hi r2,r3
+ or r0,DBL0H
+ or r0,DBL1H
+ bt LOCAL(arg1_gt)
+ tst r0,r3
+ mov #-20,r9
+ bt/s LOCAL(denorm_arg1)
+ cmp/hs r0,r2
+ bt LOCAL(inf_nan_arg0)
+ sub r2,r3
+LOCAL(denorm_arg1_done): ! r2 is tentative result exponent
+ shad r9,r3
+ mov.w LOCAL(m32),r9
+ mov DBL0H,r8 ! tentative result sign
+ and r1,DBL0H ! arg0 fraction
+ mov DBL1H,r0 ! the 'other' sign
+ and r1,DBL1H ! arg1 fraction
+ cmp/ge r9,r3
+ mov DBL1H,r1
+ bf/s LOCAL(large_shift_arg1)
+ shld r3,DBL1H
+LOCAL(small_shift_arg1):
+ mov DBL1L,r9
+ shld r3,DBL1L
+ tst r3,r3
+ add #32,r3
+ bt/s LOCAL(same_exp)
+ div0s r8,r0 ! compare signs
+ shld r3,r1
+
+ or r1,DBL1L
+ bf/s LOCAL(add)
+ shld r3,r9
+ clrt
+ negc r9,r9
+ mov.l LOCAL(x001f0000),r3
+LOCAL(sub_high):
+ mov DBL0L,DBLRL
+ subc DBL1L,DBLRL
+ mov DBL0H,DBLRH
+ bra LOCAL(subtract_done)
+ subc DBL1H,DBLRH
+
+LOCAL(large_shift_arg1):
+ mov.w LOCAL(d0),r9
+ add #64,r3
+ cmp/pl r3
+ shld r3,r1
+ bf LOCAL(ret_arg0)
+ cmp/hi r9,DBL1L
+ mov DBL1H,DBL1L
+ mov r9,DBL1H
+ addc r1,r9
+
+ div0s r8,r0 ! compare signs
+
+ bf LOCAL(add)
+ clrt
+ mov.l LOCAL(x001f0000),r3
+ bra LOCAL(sub_high)
+ negc r9,r9
+
+LOCAL(add_clr_r9):
+ mov #0,r9
+LOCAL(add):
+ mov.l LOCAL(x00200000),r3
+ addc DBL1L,DBL0L
+ addc DBL1H,DBL0H
+ mov.l LOCAL(x80000000),r1
+ tst r3,DBL0H
+ mov.l LOCAL(x7fffffff),r3
+ mov DBL0L,r0
+ bt/s LOCAL(no_carry)
+ and r1,r8
+ tst r9,r9
+ bf LOCAL(add_one)
+ tst #2,r0
+LOCAL(add_one):
+ subc r9,r9
+ sett
+ mov r0,DBLRL
+ addc r9,DBLRL
+ mov DBL0H,DBLRH
+ addc r9,DBLRH
+ shlr DBLRH
+ mov.l LOCAL(x7ff00000),r3
+ add r2,DBLRH
+ mov.l @r15+,r9
+ rotcr DBLRL
+ cmp/hi r3,DBLRH
+LOCAL(add_done):
+ bt LOCAL(inf)
+LOCAL(or_sign):
+ or r8,DBLRH
+ rts
+ mov.l @r15+,r8
+
+LOCAL(inf):
+ bra LOCAL(or_sign)
+ mov r3,DBLRH
+
+LOCAL(pos_difference_0):
+ tst r3,DBL0H
+ mov DBL0L,DBLRL
+ mov.l LOCAL(x80000000),DBL0L
+ mov DBL0H,DBLRH
+ mov.l LOCAL(x00100000),DBL0H
+ bt/s LOCAL(long_norm)
+ and DBL0L,r8
+ bra LOCAL(norm_loop)
+ not DBL0L,r3
+
+LOCAL(same_exp):
+ bf LOCAL(add_clr_r9)
+ clrt
+LOCAL(sub_same_exp):
+ subc DBL1L,DBL0L
+ mov.l LOCAL(x001f0000),r3
+ subc DBL1H,DBL0H
+ mov.w LOCAL(d0),r9
+ bf LOCAL(pos_difference_0)
+ clrt
+ negc DBL0L,DBLRL
+ mov.l LOCAL(x80000000),DBL0L
+ negc DBL0H,DBLRH
+ mov.l LOCAL(x00100000),DBL0H
+ tst r3,DBLRH
+ not r8,r8
+ bt/s LOCAL(long_norm)
+ and DBL0L,r8
+ bra LOCAL(norm_loop)
+ not DBL0L,r3
+
+LOCAL(large_shift_arg0):
+ add #64,r2
+
+ mov #0,r9
+ cmp/pl r2
+ shld r2,r1
+ bf LOCAL(ret_arg1_exp_r3)
+ cmp/hi r9,DBL0L
+ mov DBL0H,DBL0L
+ mov r9,DBL0H
+ addc r1,r9
+ div0s r8,r0 ! compare signs
+ mov r3,r2 ! tentative result exponent
+ bf LOCAL(add)
+ clrt
+ negc r9,r9
+ bra LOCAL(subtract_arg0_arg1_done)
+ mov DBL1L,DBLRL
+
+LOCAL(arg1_gt):
+ tst r0,r2
+ mov #-20,r9
+ bt/s LOCAL(denorm_arg0)
+ cmp/hs r0,r3
+ bt LOCAL(inf_nan_arg1)
+ sub r3,r2
+LOCAL(denorm_arg0_done):
+ shad r9,r2
+ mov.w LOCAL(m32),r9
+ mov DBL1H,r8 ! tentative result sign
+ and r1,DBL1H
+ mov DBL0H,r0 ! the 'other' sign
+ and r1,DBL0H
+ cmp/ge r9,r2
+ mov DBL0H,r1
+ shld r2,DBL0H
+ bf LOCAL(large_shift_arg0)
+ mov DBL0L,r9
+ shld r2,DBL0L
+ add #32,r2
+ mov.l r3,@-r15
+ shld r2,r1
+ mov r2,r3
+ div0s r8,r0 ! compare signs
+ mov.l @r15+,r2 ! tentative result exponent
+ shld r3,r9
+ bf/s LOCAL(add)
+ or r1,DBL0L
+ clrt
+ negc r9,r9
+ mov DBL1L,DBLRL
+LOCAL(subtract_arg0_arg1_done):
+ subc DBL0L,DBLRL
+ mov DBL1H,DBLRH
+ mov.l LOCAL(x001f0000),r3
+ subc DBL0H,DBLRH
+/* Since the exponents were different, the difference is positive. */
+/* Fall through */
+LOCAL(subtract_done):
+/* First check if a shift by a few bits is sufficient. This not only
+ speeds up this case, but also alleviates the need for considering
+ lower bits from r9 or rounding in the other code.
+ Moreover, by handling the upper 1+4 bits of the fraction here, long_norm
+ can assume that DBLRH fits into 20 (20 < 16) bit. */
+ tst r3,DBLRH
+ mov.l LOCAL(x80000000),r3
+ mov.l LOCAL(x00100000),DBL0H
+ bt/s LOCAL(long_norm)
+ and r3,r8
+ mov.l LOCAL(x7fffffff),r3
+LOCAL(norm_loop): ! Well, this used to be a loop...
+ tst DBL0H,DBLRH
+ sub DBL0H,r2
+ bf LOCAL(norm_round)
+ shll r9
+ rotcl DBLRL
+
+ rotcl DBLRH
+
+ tst DBL0H,DBLRH
+ sub DBL0H,r2
+ bf LOCAL(norm_round)
+ shll DBLRL
+ rotcl DBLRH
+ mov.l @r15+,r9
+ cmp/gt r2,DBL0H
+ sub DBL0H,r2
+LOCAL(norm_loop_1):
+ bt LOCAL(denorm0_n)
+ tst DBL0H,DBLRH
+ bf LOCAL(norm_pack)
+ shll DBLRL
+ rotcl DBLRH ! clears T
+ bra LOCAL(norm_loop_1)
+ subc DBL0H,r2
+
+LOCAL(no_carry):
+ shlr r0
+ mov.l LOCAL(x000fffff),DBLRH
+ addc r3,r9
+ mov.w LOCAL(d0),DBL1H
+ mov DBL0L,DBLRL
+ and DBL0H,DBLRH ! mask out implicit 1
+ mov.l LOCAL(x7ff00000),r3
+ addc DBL1H,DBLRL
+ addc r2,DBLRH
+ mov.l @r15+,r9
+ add DBL1H,DBLRH ! fraction overflow -> exp increase
+ bra LOCAL(add_done)
+ cmp/hi r3,DBLRH
+
+LOCAL(denorm_arg0):
+ bt LOCAL(inf_nan_arg1)
+ mov.l LOCAL(x00100000),r2
+ shlr r1
+ bra LOCAL(denorm_arg0_done)
+ sub r3,r2
+
+LOCAL(inf_nan_arg1):
+ mov DBL1L,DBLRL
+ bra LOCAL(pop_r8_r9)
+ mov DBL1H,DBLRH
+
+LOCAL(ret_arg1_exp_r3):
+ mov.l LOCAL(x800fffff),DBLRH
+ bra LOCAL(ret_arg)
+ mov DBL1L,DBLRL
+
+#ifdef __pic__
+ .balign 8
+#endif
+LOCAL(m32):
+ .word -32
+LOCAL(d0):
+ .word 0
+#ifndef __pic__
+ .balign 8
+#endif
+! Because we had several bits of cancellations, we know that r9 contains
+! only one bit.
+! We'll normalize by shifting words so that DBLRH:DBLRL contains
+! the fraction with 0 < DBLRH <= 0x1fffff, then we shift DBLRH:DBLRL
+! up by 21 minus the number of non-zero bits in DBLRH.
+LOCAL(long_norm):
+ tst DBLRH,DBLRH
+ mov.w LOCAL(xff),DBL0L
+ mov #21,r3
+ bf LOCAL(long_norm_highset)
+ mov.l LOCAL(x02100000),DBL1L ! shift 32, implicit 1
+ tst DBLRL,DBLRL
+ extu.w DBLRL,DBL0H
+ bt LOCAL(zero_or_ulp)
+ mov DBLRL,DBLRH
+ cmp/hi DBL0H,DBLRL
+ bf 0f
+ mov.l LOCAL(x01100000),DBL1L ! shift 16, implicit 1
+ clrt
+ shlr16 DBLRH
+ xtrct DBLRL,r9
+ mov DBLRH,DBL0H
+LOCAL(long_norm_ulp_done):
+0: mov r9,DBLRL ! DBLRH:DBLRL == fraction; DBL0H == DBLRH
+ subc DBL1L,r2
+ bt LOCAL(denorm1_b)
+#ifdef __pic__
+ mov.l LOCAL(c__clz_tab),DBL1H
+LOCAL(long_norm_lookup):
+ mov r0,r9
+ mova LOCAL(c__clz_tab),r0
+ add DBL1H,r0
+#else
+ mov r0,r9
+LOCAL(long_norm_lookup):
+ mov.l LOCAL(c__clz_tab),r0
+#endif /* __pic__ */
+ cmp/hi DBL0L,DBL0H
+ bf 0f
+ shlr8 DBL0H
+0: mov.b @(r0,DBL0H),r0
+ bf 0f
+ add #-8,r3
+0: mov.w LOCAL(d20),DBL0L
+ mov #-20,DBL0H
+ clrt
+ sub r0,r3
+ mov r9,r0
+ mov r3,DBL1H
+ shld DBL0L,DBL1H
+ subc DBL1H,r2
+ !
+ bf LOCAL(no_denorm)
+ shad DBL0H,r2
+ bra LOCAL(denorm1_done)
+ add r2,r3
+
+LOCAL(norm_round):
+ cmp/pz r2
+ mov #0,DBL1H
+ bf LOCAL(denorm0_1)
+ or r8,r2
+ mov DBLRL,DBL1L
+ shlr DBL1L
+ addc r3,r9
+ mov.l @r15+,r9
+ addc DBL1H,DBLRL ! round to even
+ mov.l @r15+,r8
+ rts
+ addc r2,DBLRH
+
+LOCAL(norm_pack):
+ add r8,DBLRH
+ mov.l @r15+,r8
+ rts
+ add r2,DBLRH
+
+LOCAL(denorm0_1):
+ mov.l @r15+,r9
+ mov r8,DBL0L
+ mov.l @r15+,r8
+LOCAL(denorm0_shift):
+ shlr DBLRH
+ rotcr DBLRL
+
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(denorm0_n):
+ mov r8,DBL0L
+ addc DBL0H,r2
+ mov.l @r15+,r8
+ bf LOCAL(denorm0_shift)
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(no_denorm):
+ add r2,r8 ! add (exponent - 1) to sign
+
+LOCAL(denorm1_done):
+ shld r3,DBLRH
+ mov DBLRL,DBL0L
+ shld r3,DBLRL
+
+ add r8,DBLRH ! add in sign and (exponent - 1)
+ mov.l @r15+,r9
+ add #-32,r3
+ mov.l @r15+,r8
+ shld r3,DBL0L
+
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(long_norm_highset):
+ mov.l LOCAL(x00200000),DBL1L ! shift 1, implicit 1
+ shll r9
+ rotcl DBLRL
+ mov DBLRH,DBL0H
+ rotcl DBLRH ! clears T
+#ifdef __pic__
+ mov.l LOCAL(c__clz_tab),DBL1H
+#else
+ mov r0,r9
+#endif /* __pic__ */
+ subc DBL1L,r2
+ add #-1,r3
+ bf LOCAL(long_norm_lookup)
+LOCAL(denorm1_a):
+ shlr DBLRH
+ rotcr DBLRL
+ mov.l @r15+,r9
+ or r8,DBLRH
+
+ rts
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(denorm1_b):
+ mov #-20,DBL0L
+ shad DBL0L,r2
+ mov DBLRH,DBL0L
+ shld r2,DBLRH
+ shld r2,DBLRL
+ or r8,DBLRH
+ mov.l @r15+,r9
+ add #32,r2
+ mov.l @r15+,r8
+ shld r2,DBL0L
+ rts
+ or DBL0L,DBLRL
+
+LOCAL(zero_or_ulp):
+ tst r9,r9
+ bf LOCAL(long_norm_ulp_done)
+ ! return +0.0
+LOCAL(pop_r8_r9):
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+LOCAL(d20):
+ .word 20
+LOCAL(xff):
+ .word 0xff
+ .balign 4
+LOCAL(x7ff00000):
+ .long 0x7ff00000
+LOCAL(x001fffff):
+ .long 0x001fffff
+LOCAL(x80000000):
+ .long 0x80000000
+LOCAL(x000fffff):
+ .long 0x000fffff
+LOCAL(x800fffff):
+ .long 0x800fffff
+LOCAL(x001f0000):
+ .long 0x001f0000
+LOCAL(x00200000):
+ .long 0x00200000
+LOCAL(x7fffffff):
+ .long 0x7fffffff
+LOCAL(x00100000):
+ .long 0x00100000
+LOCAL(x02100000):
+ .long 0x02100000
+LOCAL(x01100000):
+ .long 0x01100000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(adddf3))
+ENDFUNC(GLOBAL(subdf3))
Index: gcc/config/sh/IEEE-754/m3/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/mulsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/mulsf3.S (revision 0)
@@ -0,0 +1,241 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! mulsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+ .balign 4
+ .global GLOBAL(mulsf3)
+ FUNC(GLOBAL(mulsf3))
+GLOBAL(mulsf3):
+ mov.l LOCAL(x7f800000),r1
+ not r4,r2
+ mov r4,r3
+ not r5,r0
+ tst r1,r2
+ or r1,r3
+ bt/s LOCAL(inf_nan_arg0)
+ tst r1,r0
+ bt LOCAL(inf_nan_arg1)
+ tst r1,r5
+ mov r1,r2
+ shll8 r3
+ or r5,r1
+ bt/s LOCAL(zero_denorm_arg1)
+ shll8 r1
+ tst r2,r4
+ bt LOCAL(zero_denorm_arg0)
+ dmulu.l r3,r1
+ mov r4,r0
+ and r2,r0
+LOCAL(arg_norm):
+ and r5,r2
+ mov.l LOCAL(x3f800000),r3
+ sts mach,r1
+ sub r3,r0
+ sts macl,r3
+ add r2,r0
+ cmp/pz r1
+ mov.w LOCAL(x100),r2
+ bf/s LOCAL(norm_frac)
+ tst r3,r3
+ shll2 r1 /* Shift one up, replace leading 1 with 0. */
+ shlr r1
+ tst r3,r3
+LOCAL(norm_frac):
+ mov.w LOCAL(mx80),r3
+ bf LOCAL(round_frac)
+ tst r2,r1
+LOCAL(round_frac):
+ mov.l LOCAL(xff000000),r2
+ subc r3,r1 /* Even overflow gives right result: exp++, frac=0. */
+ shlr8 r1
+ add r1,r0
+ shll r0
+ bt LOCAL(ill_exp)
+ tst r2,r0
+ bt LOCAL(denorm0)
+ cmp/hs r2,r0
+ bt LOCAL(inf)
+LOCAL(insert_sign):
+ div0s r4,r5
+ rts
+ rotcr r0
+LOCAL(denorm0):
+ sub r2,r0
+ bra LOCAL(insert_sign)
+ shlr r0
+LOCAL(zero_denorm_arg1):
+ mov.l LOCAL(x60000000),r2 /* Check exp0 >= -64 */
+ add r1,r1
+ tst r1,r1 /* arg1 == 0 ? */
+ mov #0,r0
+ bt LOCAL(insert_sign) /* argument 1 is zero ==> return 0 */
+ tst r4,r2
+ bt LOCAL(insert_sign) /* exp0 < -64 ==> return 0 */
+ mov.l LOCAL(c__clz_tab),r0
+ mov r3,r2
+ mov r1,r3
+ bra LOCAL(arg_normalize)
+ mov r2,r1
+LOCAL(zero_denorm_arg0):
+ mov.l LOCAL(x60000000),r2 /* Check exp1 >= -64 */
+ add r3,r3
+ tst r3,r3 /* arg0 == 0 ? */
+ mov #0,r0
+ bt LOCAL(insert_sign) /* argument 0 is zero ==> return 0 */
+ tst r5,r2
+ bt LOCAL(insert_sign) /* exp1 < -64 ==> return 0 */
+ mov.l LOCAL(c__clz_tab),r0
+LOCAL(arg_normalize):
+ mov.l r7,@-r15
+ extu.w r3,r7
+ cmp/eq r3,r7
+ mov.l LOCAL(xff000000),r7
+ mov #-8,r2
+ bt 0f
+ tst r7,r3
+ mov #-16,r2
+ bt 0f
+ mov #-24,r2
+0:
+ mov r3,r7
+ shld r2,r7
+#ifdef __pic__
+ add r0,r7
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r7),r0
+ add #32,r2
+ mov r2,r7
+ mov #23,r2
+ sub r0,r7
+ mov.l LOCAL(x7f800000),r0
+ shld r7,r3
+ shld r2,r7
+ mov r0,r2
+ and r4,r0
+ sub r7,r0
+ mov.l @r15+,r7
+ bra LOCAL(arg_norm)
+ dmulu.l r3,r1
+#if 0 /* This is slightly slower, but could be used if table lookup causes
+ cache thrashing. */
+ bt LOCAL(insert_sign) /* exp1 < -64 ==> return 0 */
+ mov.l LOCAL(xff000000),r2
+ mov r4,r0
+LOCAL(arg_normalize):
+ tst r2,r3
+ bf LOCAL(arg_bit_norm)
+LOCAL(arg_byte_loop):
+ tst r2,r3
+ add r2,r0
+ shll8 r3
+ bt LOCAL(arg_byte_loop)
+ add r4,r0
+LOCAL(arg_bit_norm):
+ mov.l LOCAL(x7f800000),r2
+ rotl r3
+LOCAL(arg_bit_loop):
+ add r2,r0
+ bf/s LOCAL(arg_bit_loop)
+ rotl r3
+ rotr r3
+ rotr r3
+ sub r2,r0
+ bra LOCAL(arg_norm)
+ dmulu.l r3,r1
+#endif /* 0 */
+LOCAL(inf):
+ bra LOCAL(insert_sign)
+ mov r2,r0
+LOCAL(inf_nan_arg0):
+ bt LOCAL(inf_nan_both)
+ add r0,r0
+ cmp/eq #-1,r0 /* arg1 zero? -> NAN */
+ bt LOCAL(insert_sign)
+ mov r4,r0
+LOCAL(inf_insert_sign):
+ bra LOCAL(insert_sign)
+ add r0,r0
+LOCAL(inf_nan_both):
+ mov r4,r0
+ bra LOCAL(inf_insert_sign)
+ or r5,r0
+LOCAL(inf_nan_arg1):
+ mov r2,r0
+ add r0,r0
+ cmp/eq #-1,r0 /* arg0 zero? */
+ bt LOCAL(insert_sign)
+ bra LOCAL(inf_insert_sign)
+ mov r5,r0
+LOCAL(ill_exp):
+ cmp/pz r0
+ mov #-24,r3
+ bt LOCAL(inf)
+ add r1,r1
+ mov r0,r2
+ sub r1,r2 ! remove fraction to get back pre-rounding exponent.
+ sts mach,r0
+ sts macl,r1
+ shad r3,r2
+ mov r0,r3
+ shld r2,r0
+ add #32,r2
+ cmp/pz r2
+ shld r2,r3
+ bf LOCAL(zero)
+ or r1,r3
+ mov #-1,r1
+ tst r3,r3
+ mov.w LOCAL(x100),r3
+ bf/s LOCAL(denorm_round_up)
+ mov #-0x80,r1
+ tst r3,r0
+LOCAL(denorm_round_up):
+ mov #-7,r3
+ subc r1,r0
+ bra LOCAL(insert_sign)
+ shld r3,r0
+LOCAL(zero):
+ bra LOCAL(insert_sign)
+ mov #0,r0
+LOCAL(x100):
+ .word 0x100
+LOCAL(mx80):
+ .word -0x80
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x3f800000):
+ .long 0x3f800000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x60000000):
+ .long 0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ ENDFUNC(GLOBAL(mulsf3))
Index: gcc/config/sh/IEEE-754/m3/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsisf.S (revision 0)
@@ -0,0 +1,101 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsisf))
+ .global GLOBAL(floatsisf)
+ .balign 4
+GLOBAL(floatsisf):
+ cmp/pz r4
+ mov r4,r5
+ bt 0f
+ neg r4,r5
+0: mov.l LOCAL(c_clz_tab),r0
+ extu.w r5,r1
+ mov.w LOCAL(xff00),r3
+ cmp/eq r5,r1
+ mov #24,r2
+ bt 0f
+ mov r5,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r1
+ cmp/pz r4
+ mov.l LOCAL(x4a800000),r3 ! bias + 23 - implicit 1
+ bt 0f
+ mov.l LOCAL(xca800000),r3 ! sign + bias + 23 - implicit 1
+0: mov r5,r0
+ sub r1,r2
+ mov.l LOCAL(x80000000),r1
+ shld r2,r0
+ cmp/pz r2
+ add r3,r0
+ bt LOCAL(noround)
+ add #31,r2
+ shld r2,r5
+ add #-31,r2
+ rotl r5
+ cmp/hi r1,r5
+ mov #0,r3
+ addc r3,r0
+ mov #23,r1
+ shld r1,r2
+ rts
+ sub r2,r0
+ .balign 8
+LOCAL(noround):
+ mov #23,r1
+ tst r4,r4
+ shld r1,r2
+ bt LOCAL(ret0)
+ rts
+ sub r2,r0
+LOCAL(ret0):
+ rts
+ mov #0,r0
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(xca800000): .long 0xca800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsisf))
Index: gcc/config/sh/IEEE-754/m3/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/muldf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/muldf3.S (revision 0)
@@ -0,0 +1,481 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! muldf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+! Normal numbers are multiplied in 53 or 54 cycles on SH4-200.
+
+FUNC(GLOBAL(muldf3))
+ .global GLOBAL(muldf3)
+LOCAL(inf_nan_denorm_or_zero_a):
+ mov.l r8,@-r15
+ sub r3,DBL0H ! isolate high fraction
+ mov.l @(4,r15),r8 ! original DBL0H (with sign & exp)
+ sub r3,r1 ! 0x7ff00000
+ mov.l LOCAL(x60000000),r3
+ shll16 r2 ! 0xffff0000
+ ! no stall here for sh4-200
+ !
+ tst r1,r8
+ mov.l r0,@-r15
+ bf LOCAL(inf_nan_a)
+ tst r1,r0 ! test for DBL1 inf, nan or small
+ bt LOCAL(ret_inf_nan_zero)
+LOCAL(normalize_arg):
+ tst DBL0H,DBL0H
+ bf LOCAL(normalize_arg53)
+ tst DBL0L,DBL0L
+ bt LOCAL(a_zero)
+ tst r2,DBL0L
+ mov DBL0L,DBL0H
+ bt LOCAL(normalize_arg16)
+ shlr16 DBL0H
+ mov.w LOCAL(m15),r2 ! 1-16
+ bra LOCAL(normalize_arg48)
+ shll16 DBL0L
+
+LOCAL(normalize_arg53):
+ tst r2,DBL0H
+ mov #1,r2
+ bt LOCAL(normalize_arg48)
+ mov DBL0H,r1
+ shlr16 r1
+ bra LOCAL(normalize_DBL0H)
+ mov #21-16,r3
+
+LOCAL(normalize_arg16):
+ mov.w LOCAL(m31),r2 ! 1-32
+ mov #0,DBL0L
+LOCAL(normalize_arg48):
+ mov DBL0H,r1
+ mov #21,r3
+LOCAL(normalize_DBL0H):
+ extu.b r1,r8
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r8,r1
+ !
+ bt 0f
+ shlr8 r1
+0:
+#ifdef __pic__
+ add r0,r1
+
+ mova LOCAL(c__clz_tab),r0
+
+#endif /* __pic__ */
+ mov.b @(r0,r1),r8
+ mov DBL0L,r1
+ mov.l @r15+,r0
+ bt 0f
+ add #-8,r3
+0: clrt
+ sub r8,r3
+ mov.w LOCAL(d20),r8
+ shld r3,DBL0H
+ shld r3,DBL0L
+ sub r3,r2
+ add #-32,r3
+ shld r3,r1
+ mov.l LOCAL(x00100000),r3
+ or r1,DBL0H
+ shld r8,r2
+ mov.l @r15+,r8
+ add r2,DBL1H
+ mov.l LOCAL(x001fffff),r2
+ dmulu.l DBL0L,DBL1L
+ bra LOCAL(arg_denorm_done)
+ or r3,r0 ! set implicit 1 bit
+
+LOCAL(a_zero):
+ mov.l @(4,r15),r8
+ add #8,r15
+LOCAL(zero):
+ mov #0,DBLRH
+ bra LOCAL(pop_ret)
+ mov #0,DBLRL
+
+! both inf / nan -> result is nan if at least one is none, else inf.
+! BBL0 inf/nan, DBL1 zero -> result is nan
+! DBL0 inf/nan, DBL1 finite -> result is DBL0 with sign adjustemnt
+LOCAL(inf_nan_a):
+ mov r8,DBLRH
+ mov.l @(4,r15),r8
+ add #8,r15
+ tst r1,r0 ! arg1 inf/nan ?
+ mov DBL0L,DBLRL
+ bt LOCAL(both_inf_nan)
+ tst DBL1L,DBL1L
+ mov DBL1H,r1
+ bf LOCAL(pop_ret)
+ add r1,r1
+ tst r1,r1
+ !
+ bf LOCAL(pop_ret)
+LOCAL(nan):
+ mov #-1,DBLRL
+ bra LOCAL(pop_ret)
+ mov #-1,DBLRH
+
+LOCAL(both_inf_nan):
+ or DBL1L,DBLRL
+ bra LOCAL(pop_ret)
+ or DBL1H,DBLRH
+
+LOCAL(ret_inf_nan_zero):
+ tst r1,r0
+ mov.l @(4,r15),r8
+ or DBL0L,DBL0H
+ bf/s LOCAL(zero)
+ add #8,r15
+ tst DBL0H,DBL0H
+ bt LOCAL(nan)
+LOCAL(inf_nan_b):
+ mov DBL1L,DBLRL
+ mov DBL1H,DBLRH
+LOCAL(pop_ret):
+ mov.l @r15+,DBL0H
+ add DBLRH,DBLRH
+
+
+ div0s DBL0H,DBL1H
+
+ rts
+ rotcr DBLRH
+
+ .balign 4
+/* Argument a has already been tested for being zero or denorm.
+ On the other side, we have to swap a and b so that we can share the
+ normalization code.
+ a: sign/exponent : @r15 fraction: DBL0H:DBL0L
+ b: sign/exponent: DBL1H fraction: r0:DBL1L */
+LOCAL(inf_nan_denorm_or_zero_b):
+ sub r3,r1 ! 0x7ff00000
+ mov.l @r15,r2 ! get original DBL0H
+ tst r1,DBL1H
+ sub r3,r0 ! isolate high fraction
+ bf LOCAL(inf_nan_b)
+ mov.l DBL1H,@r15
+ mov r0,DBL0H
+ mov.l r8,@-r15
+ mov r2,DBL1H
+ mov.l LOCAL(0xffff0000),r2
+ mov.l r1,@-r15
+ mov DBL1L,r1
+ mov DBL0L,DBL1L
+ bra LOCAL(normalize_arg)
+ mov r1,DBL0L
+
+LOCAL(d20):
+ .word 20
+LOCAL(m15):
+ .word -15
+LOCAL(m31):
+ .word -31
+LOCAL(xff):
+ .word 0xff
+
+ .balign 4
+LOCAL(0xffff0000): .word 0xffff0000
+
+ ! calculate a (DBL0H:DBL0L) * b (DBL1H:DBL1L)
+ .balign 4
+GLOBAL(muldf3):
+ mov.l LOCAL(xfff00000),r3
+ mov DBL1H,r0
+ dmulu.l DBL0L,DBL1L
+ mov.l LOCAL(x7fe00000),r1
+ sub r3,r0
+ mov.l DBL0H,@-r15
+ sub r3,DBL0H
+ tst r1,DBL0H
+ or r3,DBL0H
+ mov.l LOCAL(x001fffff),r2
+ bt LOCAL(inf_nan_denorm_or_zero_a)
+ tst r1,r0
+ or r3,r0 ! r0:DBL1L := b fraction ; u12.52
+ bt LOCAL(inf_nan_denorm_or_zero_b) ! T clear on fall-through
+LOCAL(arg_denorm_done):
+ and r2,r0 ! r0:DBL1L := b fraction ; u12.52
+ sts macl,r3
+ sts mach,r1
+ dmulu.l DBL0L,r0
+ and r2,DBL0H ! DBL0H:DBL0L := a fraction ; u12.52
+ mov.l r8,@-r15
+ mov #0,DBL0L
+ mov.l r9,@-r15
+ sts macl,r2
+ sts mach,r8
+ dmulu.l DBL0H,DBL1L
+ addc r1,r2
+
+ addc DBL0L,r8 ! add T; clears T
+
+ sts macl,r1
+ sts mach,DBL1L
+ dmulu.l DBL0H,r0
+ addc r1,r2
+ mov.l LOCAL(x7ff00000),DBL0H
+ addc DBL1L,r8 ! clears T
+ mov.l @(8,r15),DBL1L ! a sign/exp w/fraction
+ sts macl,DBLRL
+ sts mach,DBLRH
+ and DBL0H,DBL1L ! a exponent
+ mov.w LOCAL(x200),r9
+ addc r8,DBLRL
+ mov.l LOCAL(x3ff00000),r8 ! bias
+ addc DBL0L,DBLRH ! add T
+ cmp/hi DBL0L,r3 ! 32 guard bits -> sticky: T := r3 != 0
+ movt r3
+ tst r9,DBLRH ! T := fraction < 2
+ or r3,r2 ! DBLRH:DBLRL:r2 := result fraction; u24.72
+ bt/s LOCAL(shll12)
+ sub r8,DBL1L
+ mov.l LOCAL(x002fffff),r8
+ and DBL1H,DBL0H ! b exponent
+ mov.l LOCAL(x00100000),r9
+ add DBL0H,DBL1L ! result exponent - 1
+ tst r8,r2
+ mov.w LOCAL(m20),r8
+ subc DBL0L,r9
+ addc r2,r9 ! r2 value is still needed for denormal rounding
+ mov.w LOCAL(d11),DBL0L
+ rotcr r9
+ clrt
+ shld r8,r9
+ mov.w LOCAL(m21),r8
+ mov DBLRL,r3
+ shld DBL0L,DBLRL
+ addc r9,DBLRL
+ mov.l @r15+,r9
+ shld r8,r3
+ mov.l @r15+,r8
+ shld DBL0L,DBLRH
+ mov.l @r15+,DBL0H
+ addc r3,DBLRH
+ mov.l LOCAL(x7ff00000),DBL0L
+ add DBL1L,DBLRH ! implicit 1 adjusts exponent
+ mov.l LOCAL(xffe00000),r3
+ cmp/hs DBL0L,DBLRH
+ add DBLRH,DBLRH
+ bt LOCAL(ill_exp_11)
+ tst r3,DBLRH
+ bt LOCAL(denorm_exp0_11)
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+
+LOCAL(shll12):
+ mov.l LOCAL(x0017ffff),r8
+ extu.b DBLRH,DBLRH ! remove implicit 1.
+ mov.l LOCAL(x00080000),r9
+ and DBL1H,DBL0H ! b exponent
+ add DBL0H,DBL1L ! result exponent
+ tst r8,r2 ! rounding adjust for lower guard ...
+ mov.w LOCAL(m19),r8
+ subc DBL0L,r9 ! ... bits and round to even; clear T
+ addc r2,r9 ! r2 value is still needed for denormal rounding
+ mov.w LOCAL(d12),DBL0L
+ rotcr r9
+ clrt
+ shld r8,r9
+ mov.w LOCAL(m20),r8
+ mov DBLRL,r3
+ shld DBL0L,DBLRL
+ addc r9,DBLRL
+ mov.l @r15+,r9
+ shld r8,r3
+ mov.l @r15+,r8
+ shld DBL0L,DBLRH
+ mov.l LOCAL(x7ff00000),DBL0L
+ addc r3,DBLRH
+ mov.l @r15+,DBL0H
+ add DBL1L,DBLRH
+ mov.l LOCAL(xffe00000),r3
+ cmp/hs DBL0L,DBLRH
+ add DBLRH,DBLRH
+ bt LOCAL(ill_exp_12)
+ tst r3,DBLRH
+ bt LOCAL(denorm_exp0_12)
+LOCAL(insert_sign):
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+LOCAL(overflow):
+ mov r3,DBLRH
+ mov #0,DBLRL
+ bra LOCAL(insert_sign)
+ mov.l @r15+,r8
+
+LOCAL(denorm_exp0_11):
+ mov.l r8,@-r15
+ mov #-21,r8
+ mov.l r9,@-r15
+ bra LOCAL(denorm)
+ mov #-2,DBL1L ! one for denormal, and one for sticky bit
+
+LOCAL(ill_exp_11):
+ mov DBL1H,DBL1L
+ and r3,DBL0L ! 0x7fe00000
+ add DBL1L,DBL1L
+ mov.l r8,@-r15
+ cmp/hi DBL1L,DBL0L ! check if exp a was large
+ mov #-20,DBL0L
+ bf LOCAL(overflow)
+ mov #-21,r8
+ mov DBLRH,DBL1L
+ rotcr DBL1L ! shift in negative sign
+ mov.l r9,@-r15
+ shad DBL0L,DBL1L ! exponent ; s32
+ bra LOCAL(denorm)
+ add #-2,DBL1L ! add one for denormal, and one for sticky bit
+
+LOCAL(denorm_exp0_12):
+ mov.l r8,@-r15
+ mov #-20,r8
+ mov.l r9,@-r15
+ bra LOCAL(denorm)
+ mov #-2,DBL1L ! one for denormal, and one for sticky bit
+
+ .balign 4 ! also aligns LOCAL(denorm)
+LOCAL(ill_exp_12):
+ and r3,DBL0L ! 0x7fe00000
+ mov DBL1H,DBL1L
+ add DBL1L,DBL1L
+ mov.l r8,@-r15
+ cmp/hi DBL1L,DBL0L ! check if exp a was large
+ bf LOCAL(overflow)
+ mov DBLRH,DBL1L
+ rotcr DBL1L ! shift in negative sign
+ mov #-20,r8
+ shad r8,DBL1L ! exponent ; s32
+ mov.l r9,@-r15
+ add #-2,DBL1L ! add one for denormal, and one for sticky bit
+LOCAL(denorm):
+ not r3,r9 ! 0x001fffff
+ mov.l r10,@-r15
+ mov r2,r10
+ shld r8,r10 ! 11 or 12 lower bit valid
+ and r9,DBLRH ! Mask away vestiges of exponent.
+ add #32,r8
+ sub r3,DBLRH ! Make leading 1 explicit.
+ shld r8,r2 ! r10:r2 := unrounded result lowpart
+ shlr DBLRH ! compensate for doubling at end of normal code
+ sub DBLRL,r10 ! reconstruct effect of previous rounding
+ exts.b r10,r9
+ shad r3,r10 ! sign extension
+ mov #0,r3
+ clrt
+ addc r9,DBLRL ! Undo previous rounding.
+ mov.w LOCAL(m32),r9
+ addc r10,DBLRH
+ cmp/hi r3,r2
+ rotcl DBLRL ! fit in the rest of r2 as a sticky bit.
+ mov.l @r15+,r10
+ rotcl DBLRH
+ cmp/ge r9,DBL1L
+ bt LOCAL(small_norm_shift)
+ cmp/hi r3,DBLRL
+ add #32,DBL1L
+ movt DBLRL
+ cmp/gt r9,DBL1L
+ or DBLRH,DBLRL
+ bt/s LOCAL(small_norm_shift)
+ mov r3,DBLRH
+ mov r3,DBLRL ! exponent too negative to shift - return zero
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+ .balign 4
+LOCAL(small_norm_shift):
+ mov DBLRL,r2 ! stash away guard bits
+ shld DBL1L,DBLRL
+ mov DBLRH,DBL0L
+ shld DBL1L,DBLRH
+ mov.l LOCAL(x7fffffff),r9
+ add #32,DBL1L
+ shld DBL1L,r2
+ shld DBL1L,DBL0L
+ or DBL0L,DBLRL
+ shlr DBL0L
+ addc r2,r9
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ addc r3,DBLRL
+ addc r3,DBLRH
+ div0s DBL0H,DBL1H
+ add DBLRH,DBLRH
+ rts
+ rotcr DBLRH
+
+
+LOCAL(x200):
+ .word 0x200
+LOCAL(m19):
+ .word -19
+LOCAL(m20):
+ .word -20
+LOCAL(m21):
+ .word -21
+LOCAL(m32):
+ .word -32
+LOCAL(d11):
+ .word 11
+LOCAL(d12):
+ .word 12
+ .balign 4
+LOCAL(x60000000):
+ .long 0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(xfff00000):
+ .long 0xfff00000
+LOCAL(x7fffffff):
+ .long 0x7fffffff
+LOCAL(x00100000):
+ .long 0x00100000
+LOCAL(x7fe00000):
+ .long 0x7fe00000
+LOCAL(x001fffff):
+ .long 0x001fffff
+LOCAL(x7ff00000):
+ .long 0x7ff00000
+LOCAL(x3ff00000):
+ .long 0x3ff00000
+LOCAL(x002fffff):
+ .long 0x002fffff
+LOCAL(xffe00000):
+ .long 0xffe00000
+LOCAL(x0017ffff):
+ .long 0x0017ffff
+LOCAL(x00080000):
+ .long 0x00080000
+ENDFUNC(GLOBAL(muldf3))
Index: gcc/config/sh/IEEE-754/m3/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsidf.S (revision 0)
@@ -0,0 +1,98 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! floatsidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsidf))
+ .global GLOBAL(floatsidf)
+ .balign 4
+GLOBAL(floatsidf):
+ tst r4,r4
+ mov r4,r1
+ bt LOCAL(ret0)
+ cmp/pz r4
+ bt 0f
+ neg r4,r1
+0: mov.l LOCAL(c_clz_tab),r0
+ extu.w r1,r5
+ mov.w LOCAL(xff00),r3
+ cmp/eq r1,r5
+ mov #21,r2
+ bt 0f
+ mov r1,r5
+ shlr16 r5
+ add #-16,r2
+0: tst r3,r5 ! 0xff00
+ bt 0f
+ shlr8 r5
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r5
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r5),r5
+ cmp/pz r4
+ mov.l LOCAL(x41200000),r3 ! bias + 20 - implicit 1
+ bt 0f
+ mov.l LOCAL(xc1200000),r3 ! sign + bias + 20 - implicit 1
+0: mov r1,r0 ! DBLRL & DBLRH
+ sub r5,r2
+ mov r2,r5
+ shld r2,DBLRH
+ cmp/pz r2
+ add r3,DBLRH
+ add #32,r2
+ shld r2,DBLRL
+ bf 0f
+ mov.w LOCAL(d0),DBLRL
+0: mov #20,r2
+ shld r2,r5
+ rts
+ sub r5,DBLRH
+LOCAL(ret0):
+ mov #0,DBLRL
+ rts
+ mov #0,DBLRH
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0): .word 0
+ .word 0x4120
+#else
+ .word 0x4120
+LOCAL(d0): .word 0
+#endif
+LOCAL(xc1200000): .long 0xc1200000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsidf))
Index: gcc/config/sh/IEEE-754/m3/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixdfsi.S (revision 0)
@@ -0,0 +1,110 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!! fixdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixdfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get UINT_MAX, for set sign bit, you get 0.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixdfsi)
+ FUNC(GLOBAL(fixdfsi))
+ .balign 4
+GLOBAL(fixdfsi):
+ mov.w LOCAL(x413),r1
+ mov DBL0H,r0
+ shll DBL0H
+ mov.l LOCAL(mask),r3
+ mov #-21,r2
+ shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
+ bt/s LOCAL(neg)
+ sub r1,DBL0H
+ cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
+ and r3,r0
+ bf/s LOCAL(ignore_low)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #10,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmax)
+ shld DBL0H,DBL0L
+ rts
+ or DBL0L,r0
+
+ .balign 8
+LOCAL(ignore_low):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ bf 0f ! SH4-200 will start this insn in a new cycle
+ mov #-31,DBL0H ! results in 0 return
+0: add #1,r0
+ rts
+ shld DBL0H,r0
+
+ .balign 4
+LOCAL(neg):
+ cmp/pl DBL0H
+ and r3,r0
+ bf/s LOCAL(ignore_low_neg)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #10,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmin)
+ shld DBL0H,DBL0L
+ or DBL0L,r0 ! SH4-200 will start this insn in a new cycle
+ rts
+ neg r0,r0
+
+ .balign 4
+LOCAL(ignore_low_neg):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ add #1,r0
+ shld DBL0H,r0
+ bf 0f
+ mov #0,r0 ! results in 0 return
+0: rts
+ neg r0,r0
+
+LOCAL(retmax):
+ mov #-1,r0
+ rts
+ shlr r0
+
+LOCAL(retmin):
+ mov #1,r0
+ rts
+ rotr r0
+
+LOCAL(x413): .word 0x413
+
+ .balign 4
+LOCAL(mask): .long 0x000fffff
+ ENDFUNC(GLOBAL(fixdfsi))
+#endif /* L_fixdfsi */
Index: gcc/config/sh/IEEE-754/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divdf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/divdf3.S (revision 0)
@@ -0,0 +1,593 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!division of two double precision floating point numbers
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:dividend
+!
+!r6,r7:divisor
+!
+!Exit:
+!r0,r1:quotient
+
+!Notes: dividend is passed in regs r4 and r5 and divisor is passed in regs
+!r6 and r7, quotient is returned in regs r0 and r1. dividend is referred as op1
+!and divisor as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (divdf3)
+ FUNC (GLOBAL (divdf3))
+
+GLOBAL (divdf3):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+ mov r6,r1
+ mov r7,r6
+ mov r1,r7
+#endif
+ mov r4,r2
+ mov.l .L_inf,r1
+
+ and r1,r2
+ mov.l r8,@-r15
+
+ cmp/eq r1,r2
+ mov r6,r8
+
+ bt .L_a_inv
+ and r1,r8
+
+ cmp/eq r1,r8
+ mov.l .L_high_mant,r3
+
+ bf .L_chk_zero
+ and r6,r3
+
+ mov.l .L_mask_sign,r8
+ cmp/pl r7
+
+ mov r8,r0
+ bt .L_ret_b !op2=NaN,return op2
+
+ and r4,r8
+ cmp/pl r3
+
+ and r6,r0
+ bt .L_ret_b !op2=NaN,return op2
+
+ xor r8,r0 !op1=normal no,op2=Inf, return Zero
+ mov #0,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_ret_b:
+ mov r7,r1
+ mov r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_a_inv:
+ !chk if op1 is Inf or NaN
+ mov.l .L_high_mant,r2
+ cmp/pl r5
+
+ and r4,r2
+ bt .L_ret_a
+
+ and r1,r8 !r1 contains infinity
+ cmp/pl r2
+
+ bt .L_ret_a
+ cmp/eq r1,r8
+
+ mov r1,DBLRH
+ add DBLRH,DBLRH
+ bf 0f
+ mov #-1,DBLRH ! Inf/Inf, return NaN.
+0: div0s r4,r6
+ mov.l @r15+,r8
+ rts
+ rotcr DBLRH
+
+.L_ret_a:
+ !return op1
+ mov r5,r1
+ mov r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_chk_zero:
+ !chk if op1=0
+ mov.l .L_mask_sign,r0
+ mov r4,r3
+
+ and r0,r3
+ shll r4
+
+ and r6,r0
+ shlr r4
+
+ xor r3,r0
+ shll r6
+
+ shlr r6
+ tst r4,r4
+
+
+ bf .L_op1_not_zero
+ tst r5,r5
+
+ bf .L_op1_not_zero
+ tst r7,r7
+
+ mov.l @r15+,r8
+ bf .L_ret_zero
+
+ tst r6,r6
+ bf .L_ret_zero
+
+ rts
+ mov #-1,DBLRH !op1=op2=0, return NaN
+
+.L_ret_zero:
+ !return zero
+ mov r0,r1
+ rts
+#ifdef __LITTLE__ENDIAN
+ mov #0,r0
+#else
+ mov #0,r1 !op1=0,op2=normal no,return zero
+#endif
+
+.L_norm_b:
+ !normalize op2
+ shll r7
+ mov.l .L_imp_bit,r3
+
+ rotcl r6
+ tst r3,r6
+
+ add #-1,r8
+ bt .L_norm_b
+
+ bra .L_divide
+ add #1,r8
+
+.L_op1_not_zero:
+ !op1!=0, chk if op2=0
+ tst r7,r7
+ mov r1,r3
+
+ mov #0,r1
+ bf .L_normal_nos
+
+ tst r6,r6
+ bf .L_normal_nos
+
+ mov.l @r15+,r8
+ or r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ nop
+
+.L_normal_nos:
+ !op1 and op2 are normal nos
+ tst r2,r2
+ mov #-20,r1
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2
+#else
+ SHLR20 (r2)
+#endif
+ bt .L_norm_a !normalize dividend
+
+.L_chk_b:
+ mov.l r9,@-r15
+ tst r8,r8
+
+ mov.l .L_high_mant,r9
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r8
+#else
+ SHLR20 (r8)
+#endif
+ ! T set -> normalize divisor
+ SL(bt, .L_norm_b,
+ and r9,r4)
+
+.L_divide:
+ mov.l .L_2047,r1
+ sub r8,r2
+
+ mov.l .L_1023,r8
+ and r9,r6
+
+ !resultant exponent
+ add r8,r2
+ !chk the exponent for overflow
+ cmp/ge r1,r2
+
+ mov.l .L_imp_bit,r1
+ bt .L_overflow
+
+ mov #0,r8
+ or r1,r4
+
+ or r1,r6
+ mov #-24,r3
+
+ !chk if the divisor is 1(mantissa only)
+ cmp/eq r8,r7
+ bf .L_div2
+
+ cmp/eq r6,r1
+ bt .L_den_one
+
+.L_div2:
+ !divide the mantissas
+ shll8 r4
+ mov r5,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r9
+#else
+ SHLR24 (r9)
+#endif
+ shll8 r6
+
+ or r9,r4
+ shll8 r5
+
+ mov r7,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r9
+#else
+ SHLR24 (r9)
+#endif
+ mov r8,r3
+ shll8 r7
+
+ or r9,r6
+ cmp/gt r4,r6
+
+ mov r3,r9
+ bt .L_shift
+
+ cmp/eq r4,r6
+ bf .L_loop
+
+ cmp/gt r5,r7
+ bf .L_loop
+
+.L_shift:
+ add #-1,r2
+ shll r5
+ rotcl r4
+
+.L_loop:
+ !actual division loop
+ cmp/gt r6,r4
+ bt .L_subtract
+
+ cmp/eq r6,r4
+ bf .L_skip
+
+ cmp/ge r7,r5
+ bf .L_skip
+
+.L_subtract:
+ clrt
+ subc r7,r5
+
+ or r1,r8
+ subc r6,r4
+
+.L_skip:
+ shlr r1
+ shll r5
+
+ rotcl r4
+ cmp/eq r1,r3
+
+ bf .L_loop
+ mov.l .L_imp_bit,r1
+
+ !chk if the divison was for the higher word of the quotient
+ tst r1,r9
+ bf .L_chk_exp
+
+ mov r8,r9
+ mov.l .L_mask_sign,r1
+
+ !divide for the lower word of the quotient
+ bra .L_loop
+ mov r3,r8
+
+.L_chk_exp:
+ !chk if the result needs to be denormalized
+ cmp/gt r2,r3
+ bf .L_round
+ mov #-53,r7
+
+.L_underflow:
+ !denormalize the result
+ add #1,r2
+ cmp/gt r2,r7
+
+ or r4,r5 !remainder
+ add #-2,r2
+
+ mov #32,r4
+ bt .L_return_zero
+
+ add r2,r4
+ cmp/ge r3,r4
+
+ mov r2,r7
+ mov r3,r1
+
+ mov #-54,r2
+ bt .L_denorm
+ mov #-32,r7
+
+.L_denorm:
+ shlr r8
+ rotcr r1
+
+ shll r8
+ add #1,r7
+
+ shlr r9
+ rotcr r8
+
+ cmp/eq r3,r7
+ bf .L_denorm
+
+ mov r4,r7
+ cmp/eq r2,r4
+
+ bt .L_break
+ mov r3,r6
+
+ cmp/gt r7,r3
+ bf .L_break
+
+ mov r2,r4
+ mov r1,r6
+
+ mov r3,r1
+ bt .L_denorm
+
+.L_break:
+ mov #0,r2
+
+ cmp/gt r1,r2
+
+ addc r2,r8
+ mov.l .L_comp_1,r4
+
+ addc r3,r9
+ or r9,r0
+
+ cmp/eq r5,r3
+ bf .L_return
+
+ cmp/eq r3,r6
+ mov.l .L_mask_sign,r7
+
+ bf .L_return
+ cmp/eq r7,r1
+
+ bf .L_return
+ and r4,r8
+
+.L_return:
+ mov.l @r15+,r9
+ mov r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_norm_a:
+ !normalize op1
+ shll r5
+ mov.l .L_imp_bit,r3
+
+ rotcl r4
+ tst r3,r4
+
+ add #-1,r2
+ bt .L_norm_a
+
+ bra .L_chk_b
+ add #1,r2
+
+.L_overflow:
+ !overflow, return inf
+ mov.l .L_inf,r2
+#ifdef __LITTLE_ENDIAN__
+ or r2,r1
+ mov #0,r0
+#else
+ or r2,r0
+ mov #0,r1
+#endif
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+.L_den_one:
+ !denominator=1, result=numerator
+ mov r4,r9
+ mov #-53,r7
+
+ cmp/ge r2,r8
+ mov r8,r4
+
+ mov r5,r8
+ mov r4,r3
+
+ !chk the exponent for underflow
+ SL(bt, .L_underflow,
+ mov r4,r5)
+
+ mov.l .L_high_mant,r7
+ bra .L_pack
+ mov #20,r6
+
+.L_return_zero:
+ !return zero
+ mov r3,r1
+ mov.l @r15+,r9
+
+ rts
+ mov.l @r15+,r8
+
+.L_round:
+ !apply rounding
+ cmp/eq r4,r6
+ bt .L_lower
+
+ clrt
+ subc r6,r4
+
+ bra .L_rounding
+ mov r4,r6
+
+.L_lower:
+ clrt
+ subc r7,r5
+ mov r5,r6
+
+.L_rounding:
+ !apply rounding
+ mov.l .L_invert,r1
+ mov r3,r4
+
+ movt r3
+ clrt
+
+ not r3,r3
+ and r1,r3
+
+ addc r3,r8
+ mov.l .L_high_mant,r7
+
+ addc r4,r9
+ cmp/eq r4,r6
+
+ mov.l .L_comp_1,r3
+ SL (bf, .L_pack,
+ mov #20,r6)
+ and r3,r8
+
+.L_pack:
+ !pack the result, r2=exponent,r0=sign,r8=lower mantissa, r9=higher mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ SHLL20 (r2)
+#endif
+ and r7,r9
+
+ or r2,r0
+ mov r8,r1
+
+ or r9,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+ .align 2
+
+.L_mask_sign:
+ .long 0x80000000
+.L_high_mant:
+ .long 0x000fffff
+.L_inf:
+ .long 0x7ff00000
+.L_1023:
+ .long 1023
+.L_2047:
+ .long 2047
+.L_imp_bit:
+ .long 0x00100000
+.L_comp_1:
+ .long 0xfffffffe
+.L_invert:
+ .long 0x00000001
+
+ENDFUNC (GLOBAL (divdf3))
Index: gcc/config/sh/IEEE-754/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatunssisf.S (revision 0)
@@ -0,0 +1,132 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of unsigned integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatunsisf)
+ FUNC (GLOBAL (floatunsisf))
+
+GLOBAL (floatunsisf):
+ tst r4,r4
+ mov #23,r6
+
+ mov.l .L_set_24_bits,r7
+ SL(bt, .L_return,
+ not r7,r3)
+
+ ! Decide the direction for shifting
+ mov.l .L_set_24_bit,r5
+ cmp/hi r7,r4
+
+ not r5,r2
+ SL(bt, .L_shift_right,
+ mov #0,r7)
+
+ tst r5,r4
+
+ mov #0,r0
+ bf .L_pack_sf
+
+! Shift the bits to the left. Adjust the exponent
+.L_shift_left:
+ shll r4
+ tst r5,r4
+
+ add #-1,r6
+ bt .L_shift_left
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa
+.L_pack_sf:
+ mov #23,r3
+ add #127,r6
+
+ ! Align the exponent
+ and r2,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+
+ or r6,r0
+ rts
+ or r4,r0
+
+! Shift right the number with rounding
+.L_shift_right:
+ shlr r4
+ rotcr r7
+
+ tst r4,r3
+ add #1,r6
+
+ bf .L_shift_right
+
+ tst r7,r7
+ bt .L_sh_rt_1
+
+ shll r7
+ movt r1
+
+ add r1,r4
+
+ tst r7,r7
+ bf .L_sh_rt_1
+
+ ! Halfway between two numbers.
+ ! Round towards LSB = 0
+ shlr r4
+ shll r4
+
+.L_sh_rt_1:
+ mov r4,r0
+
+ ! Rounding may have misplaced MSB. Adjust.
+ and r3,r0
+ cmp/eq #0,r0
+
+ bf .L_shift_right
+ bt .L_pack_sf
+
+.L_return:
+ rts
+ mov r4,r0
+
+ .align 2
+.L_set_24_bit:
+ .long 0x00800000
+
+.L_set_24_bits:
+ .long 0x00FFFFFF
+
+ENDFUNC (GLOBAL (floatunsisf))
Index: gcc/config/sh/IEEE-754/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunsdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixunsdfsi.S (revision 0)
@@ -0,0 +1,176 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to unsigned integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixunsdfsi)
+ FUNC (GLOBAL (fixunsdfsi))
+
+GLOBAL (fixunsdfsi):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+#endif
+ mov.l .L_p_inf,r2
+ mov #-20,r1
+
+ mov r2,r7
+ mov.l .L_1023,r3
+
+ and r4,r2
+ shll r4
+
+ movt r6 ! r6 contains the sign bit
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2 ! r2 contains the exponent
+#else
+ SHLR20 (r2)
+#endif
+ shlr r4
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r7
+#else
+ SHLR20 (r7)
+#endif
+ tst r6,r6
+ SL(bf, .L_epil,
+ mov #0,r0)
+
+ cmp/hi r2,r3 ! if exp < 1023,return 0
+ mov.l .L_high_mant,r1
+
+ SL(bt, .L_epil,
+ and r4,r1) ! r1 contains high mantissa
+
+ cmp/eq r2,r7 ! chk if exp is invalid
+ mov.l .L_1054,r7
+
+ bt .L_inv_exp
+ mov #11,r0
+
+ cmp/hi r7,r2 ! If exp > 1054,return maxint
+ sub r2,r7 !r7 contains the number of shifts
+
+ mov.l .L_21bit,r2
+ bt .L_ret_max
+
+ or r2,r1
+ mov r7,r3
+
+ shll8 r1
+ neg r7,r7
+
+ shll2 r1
+
+ shll r1
+ cmp/hi r3,r0
+
+ SL(bt, .L_lower_mant,
+ mov #21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_sh_loop:
+ tst r7,r7
+ bt .L_break
+ add #1,r7
+ bra .L_sh_loop
+ shlr r1
+
+.L_break:
+#endif
+ rts
+ mov r1,r0
+
+.L_lower_mant:
+ neg r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r0,r5
+#else
+ SHLR21 (r5)
+#endif
+ or r5,r1 !pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_loop:
+ tst r7,r7
+ bt .L_break1
+ add #1,r7
+ bra .L_loop
+ shlr r1
+
+.L_break1:
+#endif
+ mov r1,r0
+.L_epil:
+ rts
+ nop
+
+.L_inv_exp:
+ cmp/hi r0,r5
+ bt .L_epil
+
+ cmp/hi r0,r1 !compare high mantissa,r1
+ bt .L_epil
+
+.L_ret_max:
+ mov.l .L_maxint,r0
+
+ rts
+ nop
+
+ .align 2
+
+.L_maxint:
+ .long 0xffffffff
+.L_p_inf:
+ .long 0x7ff00000
+.L_high_mant:
+ .long 0x000fffff
+.L_1023:
+ .long 0x000003ff
+.L_1054:
+ .long 1054
+.L_21bit:
+ .long 0x00100000
+
+ENDFUNC (GLOBAL (fixunsdfsi))
Index: gcc/config/sh/IEEE-754/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/adddf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/adddf3.S (revision 0)
@@ -0,0 +1,786 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for adding two double numbers
+
+! Author: Rakesh Kumar
+! SH1 Support by Joern Rennecke
+! Sticky Bit handling : Joern Rennecke
+
+! Arguments: r4-r5, r6-r7
+! Result: r0-r1
+
+! The value in r4-r5 is referred to as op1
+! and that in r6-r7 is referred to as op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (subdf3)
+ FUNC (GLOBAL (subdf3))
+ .global GLOBAL (adddf3)
+ FUNC (GLOBAL (adddf3))
+
+GLOBAL (subdf3):
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r6,r2
+
+ mov r5,r4
+ mov r7,r6
+
+ mov r1,r5
+ mov r2,r7
+#endif
+ mov.l .L_sign,r2
+ bra .L_adddf3_1
+ xor r2,r6
+
+GLOBAL (adddf3):
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r6,r2
+
+ mov r5,r4
+ mov r7,r6
+
+ mov r1,r5
+ mov r2,r7
+#endif
+
+.L_adddf3_1:
+ mov.l r8,@-r15
+ mov r4,r1
+
+ mov.l .L_inf,r2
+ mov r6,r3
+
+ mov.l r9,@-r15
+ and r2,r1 !Exponent of op1 in r1
+
+ mov.l r10,@-r15
+ and r2,r3 !Exponent of op2 in r3
+
+ ! Check for Nan or Infinity
+ mov.l .L_sign,r9
+ cmp/eq r2,r1
+
+ mov r9,r10
+ bt .L_thread_inv_exp_op1
+
+ mov r9,r0
+ cmp/eq r2,r3
+! op1 has a valid exponent. We need not check it again.
+! Return op2 straight away.
+ and r4,r9 !r9 has sign bit for op1
+ bt .L_ret_op2
+
+ ! Check for -ve zero
+ cmp/eq r4,r0
+ and r6,r10 !r10 has sign bit for op2
+
+ bt .L_op1_nzero
+
+ cmp/eq r6,r0
+ bt .L_op2_nzero
+
+! Check for zero
+.L_non_zero:
+ tst r4,r4
+ bt .L_op1_zero
+
+ ! op1 is not zero, check op2 for zero
+ tst r6,r6
+ bt .L_op2_zero
+
+! r1 and r3 has masked out exponents, r9 and r10 has signs
+.L_add:
+ mov.l .L_high_mant,r8
+ mov #-20,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r1 ! r1 now has exponent for op1 in its lower bits
+#else
+ SHLR20 (r1)
+#endif
+ and r8,r6 ! Higher bits of mantissa of op2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3 ! r3 has exponent for op2 in its lower bits
+#else
+ SHLR20 (r3)
+#endif
+ and r8,r4 ! Higher bits of mantissa of op1
+
+ mov.l .L_21bit,r8
+
+ tst r1,r1
+ bt .L_norm_op1
+
+ ! Set the 21st bit.
+ or r8,r4
+ tst r3,r3
+
+ bt .L_norm_op2
+ or r8,r6
+
+! Check for negative mantissas. Make them positive by negation
+! r9 and r10 have signs of op1 and op2 respectively
+.L_neg_mant:
+ tst r9,r9
+ bf .L_neg_op1
+
+ tst r10,r10
+ bf .L_neg_op2
+
+.L_add_1:
+ cmp/ge r1,r3
+
+ mov r1,r0
+ bt .L_op2_exp_greater
+
+ sub r3,r0
+ ! If exponent difference is greater than 54, the resultant exponent
+ ! won't be changed. Return op1 straight away.
+ mov #54,r2
+ cmp/gt r2,r0
+
+ bt .L_pack_op1
+
+ mov r1,r3
+ clrt
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+
+ ! Shift left the first operand and apply rest of shifts to second operand.
+ mov #0,r2
+ shll r5
+
+ rotcl r4
+
+ add #-1,r3
+ dt r0
+
+ bt .L_add_mant
+ dt r0
+
+ bt LOCAL(got_guard)
+ dt r0
+
+ bt LOCAL(got_sticky)
+
+! Shift the mantissa part of op2 so that both exponents are equal
+.L_shfrac_op2:
+ shar r6
+ or r7,r2 ! sticky bit
+
+ rotcr r7
+ dt r0
+
+ bf .L_shfrac_op2
+
+ shlr r2
+
+ subc r2,r2 ! spread sticky bit across r2
+LOCAL(got_sticky):
+ shar r6
+
+ rotcr r7
+
+ rotcr r2
+LOCAL(got_guard):
+ shar r6
+
+ rotcr r7
+
+ rotcr r2
+
+
+! Add the psotive mantissas and check for overflow by checking the
+! MSB of the resultant. In case of overflow, negate the result.
+.L_add_mant:
+ clrt
+ addc r7,r5
+
+ mov #0,r10 ! Assume resultant to be positive
+ addc r6,r4
+
+ cmp/pz r4
+
+ bt .L_mant_ptv
+ negc r2,r2
+
+ negc r5,r5
+
+ mov.l .L_sign,r10 ! The assumption was wrong, result is negative
+ negc r4,r4
+
+! 23rd bit in the high part of mantissa could be set.
+! In this case, right shift the mantissa.
+.L_mant_ptv:
+ mov.l .L_23bit,r0
+
+ tst r4,r0
+ bt .L_mant_ptv_0
+
+ shlr r4
+ rotcr r5
+
+ add #1,r3
+ bra .L_mant_ptv_1
+ rotcr r2
+
+.L_mant_ptv_0:
+ mov.l .L_22bit,r0
+ tst r4,r0
+
+ bt .L_norm_mant
+
+.L_mant_ptv_1:
+ ! 22 bit of resultant mantissa is set. Shift right the mantissa
+ ! and add 1 to exponent
+ add #1,r3
+ shlr r4
+ rotcr r5
+ ! The mantissa is already normalized. We don't need to
+ ! spend any effort. Branch to epilogue.
+ bra .L_epil
+ rotcr r2
+
+! Normalize operands
+.L_norm_op1:
+ shll r5
+
+ rotcl r4
+ add #-1,r1
+
+ tst r4,r8
+ bt .L_norm_op1
+
+ tst r3,r3
+ SL(bf, .L_neg_mant,
+ add #1,r1)
+
+.L_norm_op2:
+ shll r7
+
+ rotcl r6
+ add #-1,r3
+
+ tst r6,r8
+ bt .L_norm_op2
+
+ bra .L_neg_mant
+ add #1,r3
+
+! Negate the mantissa of op1
+.L_neg_op1:
+ clrt
+ negc r5,r5
+
+ negc r4,r4
+ tst r10,r10
+
+ bt .L_add_1
+
+! Negate the mantissa of op2
+.L_neg_op2:
+ clrt
+ negc r7,r7
+
+ bra .L_add_1
+ negc r6,r6
+
+! Thread the jump to .L_inv_exp_op1
+.L_thread_inv_exp_op1:
+ bra .L_inv_exp_op1
+ nop
+
+.L_ret_op2:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r6,r1
+#else
+ mov r6,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r7,r0
+#else
+ mov r7,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_op1_nzero:
+ tst r5,r5
+ bt .L_ret_op2
+
+ ! op1 is not zero. Check op2 for negative zero
+ cmp/eq r6,r0
+ bf .L_non_zero ! both op1 and op2 are not -0
+
+.L_op2_nzero:
+ tst r7,r7
+ bf .L_non_zero
+
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0 ! op2 is -0, return op1
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! High bit of op1 is known to be zero.
+! Check low bit. r2 contains 0x00000000
+.L_op1_zero:
+ tst r5,r5
+ bt .L_ret_op2
+
+ ! op1 is not zero. Check high bit of op2
+ tst r6,r6
+ bf .L_add ! both op1 and op2 are not zero
+
+! op1 is not zero. High bit of op2 is known to be zero.
+! Check low bit of op2. r2 contains 0x00000000
+.L_op2_zero:
+ tst r7,r7
+ bf .L_add
+
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0 ! op2 is zero, return op1
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! exp (op1) is smaller or equal to exp (op2)
+! The logic of same operations is present in .L_add. Kindly refer it for
+! comments
+.L_op2_exp_greater:
+ mov r3,r0
+ sub r1,r0
+
+ mov #54,r2
+ cmp/gt r2,r0
+
+ bt .L_pack_op2
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+
+ mov #0,r2
+ shll r7
+ rotcl r6
+ add #-1,r0
+ add #-1,r3
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+.L_shfrac_op1:
+ add #-1,r0
+ shar r4
+
+ rotcr r5
+ rotcr r2
+
+ cmp/eq #0,r0
+ bf .L_shfrac_op1
+
+ bra .L_add_mant
+ nop
+
+! Return the value in op1
+.L_ret_op1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! r1 has exp, r9 has sign, r4 and r5 mantissa
+.L_pack_op1:
+ mov.l .L_high_mant,r7
+ mov r4,r0
+
+ tst r9,r9
+ bt .L_pack_op1_1
+
+ clrt
+ negc r5,r5
+ negc r0,r0
+
+.L_pack_op1_1:
+ and r7,r0
+ mov r1,r3
+
+ mov #20,r2
+ mov r5,r1
+
+ mov.l @r15+,r10
+ or r9,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+ mov.l @r15+,r9
+
+ or r3,r0
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+!r2 has exp, r10 has sign, r6 and r7 mantissa
+.L_pack_op2:
+ mov.l .L_high_mant,r9
+ mov r6,r0
+
+ tst r10,r10
+ bt .L_pack_op2_1
+
+ clrt
+ negc r7,r7
+ negc r0,r0
+
+.L_pack_op2_1:
+ and r9,r0
+ mov r7,r1
+
+ mov #20,r2
+ or r10,r0
+
+ mov.l @r15+,r10
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+
+ mov.l @r15+,r9
+
+ or r3,r0
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+! Normalize the mantissa by setting its 21 bit in high part
+.L_norm_mant:
+ mov.l .L_21bit,r0
+
+ tst r4,r0
+ bf .L_epil
+
+ tst r4,r4
+ bf .L_shift_till_1
+
+ tst r5,r5
+ bf .L_shift_till_1
+
+ ! Mantissa is zero, return 0
+ mov.l @r15+,r10
+ mov #0,r0
+
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+
+ rts
+ mov #0,r1
+
+! A loop for making the 21st bit 1 in high part of resultant mantissa
+! It is already ensured that 1 bit is present in the mantissa
+.L_shift_till_1:
+ clrt
+ shll r5
+
+ rotcl r4
+ add #-1,r3
+
+ tst r4,r0
+ bt .L_shift_till_1
+
+! Return the result. Mantissa is in r4-r5. Exponent is in r3
+! Sign bit in r10
+.L_epil:
+ cmp/pl r3
+
+ bf .L_denorm
+ mov.l LOCAL(x7fffffff),r0
+
+ mov r5,r1
+ shlr r1
+
+ mov #0,r1
+ addc r0,r2
+
+! Check extra MSB here
+ mov.l .L_22bit,r9
+ addc r1,r5 ! round to even
+
+ addc r1,r4
+ tst r9,r4
+
+ bf .L_epil_1
+
+.L_epil_0:
+ mov.l .L_21bit,r1
+
+ not r1,r1
+ and r1,r4
+
+ mov r4,r0
+ or r10,r0
+
+ mov.l @r15+,r10
+ mov #20,r2
+
+ mov.l @r15+,r9
+ mov r5,r1
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+ or r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_epil_1:
+ shlr r4
+ add #1,r3
+ bra .L_epil_0
+ rotcr r5
+
+.L_denorm:
+ add #-1,r3
+.L_denorm_1:
+ tst r3,r3
+ bt .L_denorm_2
+
+ shlr r4
+ rotcr r5
+
+ movt r1
+ bra .L_denorm_1
+ add #1,r3
+
+.L_denorm_2:
+ clrt
+ mov #0,r2
+ addc r1,r5
+
+ addc r2,r4
+ mov r4,r0
+
+ or r10,r0
+ mov.l @r15+,r10
+
+ mov r5,r1
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+! op1 is known to be positive infinity, and op2 is Inf. The sign
+! of op2 is not known. Return the appropriate value
+.L_op1_pinf_op2_inf:
+ mov.l .L_sign,r0
+ tst r6,r0
+
+ bt .L_ret_op2_1
+
+ ! op2 is negative infinity. Inf - Inf is being performed
+ mov.l .L_inf,r0
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ rts
+ mov #-1,DBLRH ! return NaN.
+
+.L_ret_op1_1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_ret_op2_1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r6,r1
+#else
+ mov r6,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r7,r0
+#else
+ mov r7,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! op1 is negative infinity. Check op2 for infinity or Nan
+.L_op1_ninf:
+ cmp/eq r2,r3
+ bf .L_ret_op1_1 ! op2 is neither Nan nor Inf
+
+ mov.l @r15+,r9
+ div0s r4,r6 ! different signs -> NaN
+ mov r4,DBLRH
+ or r6,DBLRH
+ mov.l @r15+,r8
+ SL(bf, 0f,
+ mov r5,DBLRL)
+ mov #-1,DBLRH ! return NaN.
+0: rts
+ or r7,DBLRL
+
+!r1 contains exponent for op1, r3 contains exponent for op2
+!r2 has .L_inf (+ve Inf)
+!op1 has invalid exponent. Either it contains Nan or Inf
+.L_inv_exp_op1:
+ ! Check if a is Nan
+ cmp/pl r5
+ bt .L_ret_op1_1
+
+ mov.l .L_high_mant,r0
+ and r4,r0
+
+ cmp/pl r0
+ bt .L_ret_op1_1
+
+ ! op1 is not Nan. It is infinity. Check the sign of it.
+ ! If op2 is Nan, return op2
+ cmp/pz r4
+
+ bf .L_op1_ninf
+
+ ! op2 is +ve infinity here
+ cmp/eq r2,r3
+ bf .L_ret_op1_1 ! op2 is neither Nan nor Inf
+
+ ! r2 is free now
+ mov.l .L_high_mant,r0
+ tst r6,r0 ! op2 also has invalid exponent
+
+ bf .L_ret_op2_1 ! branch if op2 is NaN
+
+ tst r7,r7
+ bt .L_op1_pinf_op2_inf ! op2 is Infinity, and op1 is +Infinity
+ !op2 is not infinity, It is Nan
+ bf .L_ret_op2_1
+
+ .align 2
+.L_high_mant:
+ .long 0x000FFFFF
+
+.L_21bits:
+ .long 0x001FFFFF
+
+.L_22bit:
+ .long 0x00200000
+
+.L_23bit:
+ .long 0x00400000
+
+.L_21bit:
+ .long 0x00100000
+
+.L_sign:
+ .long 0x80000000
+
+.L_inf:
+ .long 0x7ff00000
+
+LOCAL(x7fffffff): .long 0x7fffffff
+
+ENDFUNC (GLOBAL (subdf3))
+ENDFUNC (GLOBAL (adddf3))
Index: gcc/config/sh/IEEE-754/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatsisf.S (revision 0)
@@ -0,0 +1,195 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatsisf)
+ FUNC (GLOBAL (floatsisf))
+
+GLOBAL (floatsisf):
+ mov.l .L_sign,r2
+ mov #23,r6
+
+ ! Check for zero
+ tst r4,r4
+ mov.l .L_24_bits,r7
+
+ ! Extract sign
+ and r4,r2
+ bt .L_ret
+
+ ! Negative ???
+ mov.l .L_imp_bit,r5
+ cmp/pl r4
+
+ not r7,r3
+ bf .L_neg
+
+ ! Decide the direction for shifting
+ cmp/gt r7,r4
+ mov r4,r0
+
+ and r5,r0
+ bt .L_shr_0
+
+ ! Number may already be in normalized form
+ cmp/eq #0,r0
+ bf .L_pack
+
+! Shift the bits to the left. Adjust the exponent
+.L_shl:
+ shll r4
+ mov r4,r0
+
+ and r5,r0
+ cmp/eq #0,r0
+
+ SL(bt, .L_shl,
+ add #-1,r6)
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa, r2 has sign
+.L_pack:
+ mov #23,r3
+ not r5,r5
+
+ mov r2,r0
+ add #127,r6
+
+ and r5,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+
+ or r6,r0
+ rts
+ or r4,r0
+
+! Negate the number
+.L_neg:
+ ! Take care for -2147483648.
+ mov r4,r0
+ shll r0
+
+ cmp/eq #0,r0
+ SL(bt, .L_ret_min,
+ neg r4,r4)
+
+ cmp/gt r7,r4
+ bt .L_shr_0
+
+ mov r4,r0
+ and r5,r0
+
+ cmp/eq #0,r0
+ bf .L_pack
+ bt .L_shl
+
+.L_shr_0:
+ mov #0,r1
+
+! Shift right the number with rounding
+.L_shr:
+ shlr r4
+ movt r7
+
+ tst r7,r7
+
+ ! Count number of ON bits shifted
+ bt .L_shr_1
+ add #1,r1
+
+.L_shr_1:
+ mov r4,r0
+ add #1,r6
+
+ and r3,r0
+ cmp/eq #0,r0
+
+ ! Add MSB of shifted bits
+ bf .L_shr
+ add r7,r4
+
+ tst r7,r7
+ bt .L_pack
+
+.L_pack1:
+ mov #1,r0
+ cmp/eq r1,r0
+
+ bt .L_rnd
+ mov r4,r0
+
+ ! Rounding may have misplaced MSB. Adjust.
+ and r3,r0
+ cmp/eq #0,r0
+
+ bf .L_shr
+ bt .L_pack
+
+! If only MSB of shifted bits is ON, we are halfway
+! between two numbers. Round towards even LSB of
+! resultant mantissa.
+.L_rnd:
+ shlr r4
+ bra .L_pack
+ shll r4
+
+.L_ret:
+ rts
+ mov r4,r0
+
+! Return value for -2147483648
+.L_ret_min:
+ mov.l .L_min_val,r0
+ rts
+ nop
+
+ .align 2
+.L_sign:
+ .long 0x80000000
+
+.L_imp_bit:
+ .long 0x00800000
+
+.L_24_bits:
+ .long 0x00FFFFFF
+
+.L_nsign:
+ .long 0x7FFFFFFF
+
+.L_min_val:
+ .long 0xCF000000
+
+ENDFUNC (GLOBAL (floatsisf))
Index: gcc/config/sh/IEEE-754/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/muldf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/muldf3.S (revision 0)
@@ -0,0 +1,596 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!multiplication of two double precision floating point numbers
+!Author:Aanchal Khanna
+!SH1 Support / Simplifications: Joern Rennecke
+!
+!Entry:
+!r4,r5:operand 1
+!
+!r6,r7:operand 2
+!
+!Exit:
+!r0,r1:result
+!
+!Notes: argument 1 is passed in regs r4 and r5 and argument 2 is passed in regs
+!r6 and r7, result is returned in regs r0 and r1. operand 1 is referred as op1
+!and operand 2 as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+ .text
+ .align 5
+ .global GLOBAL (muldf3)
+ FUNC (GLOBAL (muldf3))
+
+GLOBAL (muldf3):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+ mov r6,r1
+ mov r7,r6
+ mov r1,r7
+#endif
+ mov.l .L_mask_sign,r0
+ mov r4,r2
+
+ and r0,r2
+ mov #0,r1
+
+ shll r4
+ and r6,r0
+
+ xor r2,r0 !r0 contains the result's sign bit
+ shlr r4
+
+ mov.l .L_inf,r2
+ shll r6
+
+ mov r4,r3
+ shlr r6
+
+.L_chk_a_inv:
+ !chk if op1 is Inf/NaN
+ and r2,r3
+ mov.l r8,@-r15
+
+ cmp/eq r3,r2
+ mov.l .L_mask_high_mant,r8
+
+ mov r2,r3
+ bf .L_chk_b_inv
+
+ mov r8,r3
+ and r4,r8
+
+ cmp/hi r1,r8
+ bt .L_return_a !op1 NaN, return op1
+
+ cmp/hi r1,r5
+ mov r2,r8
+
+ bt .L_return_a !op1 NaN, return op1
+ and r6,r8
+
+ cmp/eq r8,r2
+ and r6,r3
+
+ bt .L_b_inv
+ cmp/eq r1,r6
+
+ bf .L_return_a !op1 Inf,op2= normal no return op1
+ cmp/eq r1,r7
+
+ bf .L_return_a !op1 Inf,op2= normal no return op1
+ mov.l @r15+,r8
+
+ rts
+ mov #-1,DBLRH !op1=Inf, op2=0,return nan
+
+.L_b_inv:
+ !op2 is NaN/Inf
+ cmp/hi r1,r7
+ mov r1,r2
+
+ mov r5,r1
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/hi r2,r6
+ or r4,r0
+
+ bt .L_return_b !op2=NaN,return op2
+ mov.l @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts !op1=Inf,op2=Inf,return Inf with sign
+ nop
+
+.L_chk_b_inv:
+ !Chk if op2 is NaN/Inf
+ and r6,r2
+ cmp/eq r3,r2
+
+ bf .L_chk_a_for_zero
+ and r6,r8
+
+ cmp/hi r1,r8
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/hi r1,r7
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/eq r5,r1
+ bf .L_return_b !op1=normal number,op2=Inf,return Inf
+
+ mov r7,r1
+ cmp/eq r4,r1
+
+ bf .L_return_b /* op1=normal number, op2=Inf,return Inf */
+ mov.l @r15+,r8
+
+ rts
+ mov #-1,DBLRH !op1=0,op2=Inf,return NaN
+
+.L_return_a:
+ mov r5,r1
+ or r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_return_b:
+ mov r7,r1
+ or r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_chk_a_for_zero:
+ !Chk if op1 is zero
+ cmp/eq r1,r4
+ bf .L_chk_b_for_zero
+
+ cmp/eq r1,r5
+ bf .L_chk_b_for_zero
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_chk_b_for_zero:
+ !op1=0,chk if op2 is zero
+ cmp/eq r1,r6
+ mov r1,r3
+
+ mov.l .L_inf,r1
+ bf .L_normal_nos
+
+ cmp/eq r3,r7
+ bf .L_normal_nos
+
+ mov r3,r1
+ mov.l @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ nop
+
+.L_normal_nos:
+ !op1 and op2 are normal nos
+ mov.l r9,@-r15
+ mov r4,r3
+
+ mov #-20,r9
+ and r1,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r9,r2
+#else
+ SHLR20 (r2)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r9,r3
+#else
+ SHLR20 (r3)
+#endif
+ cmp/pl r3
+
+ bf .L_norm_a !normalize op1
+.L_chk_b:
+ cmp/pl r2
+ bf .L_norm_b !normalize op2
+
+.L_mul1:
+ add r3,r2
+ mov.l .L_1023,r1
+
+ !resultant exponent in r2
+ add r1,r2
+ mov.l .L_2047,r1
+
+ !Chk the exponent for overflow
+ cmp/ge r1,r2
+ and r8,r4
+
+ bt .L_return_inf
+ mov.l .L_imp_bit,r1
+
+ or r1,r4
+ and r8,r6
+
+ or r1,r6
+ clrt
+
+ !multiplying the mantissas
+ DMULU_SAVE
+ DMULUL (r7,r5,r1) !bits 0-31 of product
+
+ DMULUH (r3)
+
+ DMULUL (r4,r7,r8)
+
+ addc r3,r8
+
+ DMULUH (r3)
+
+ movt r9
+ clrt
+
+ DMULUL (r5,r6,r7)
+
+ addc r7,r8 !bits 63-32 of product
+
+ movt r7
+ add r7,r9
+
+ DMULUH (r7)
+
+ add r7,r3
+
+ add r9,r3
+ clrt
+
+ DMULUL (r4,r6,r7)
+
+ addc r7,r3 !bits 64-95 of product
+
+ DMULUH (r7)
+ DMULU_RESTORE
+
+ mov #0,r5
+ addc r5,r7 !bits 96-105 of product
+
+ cmp/eq r5,r1
+ mov #1,r4
+
+ bt .L_skip
+ or r4,r8
+.L_skip:
+ mov.l .L_106_bit,r4
+ mov r8,r9
+
+.L_chk_extra_msb:
+ !chk if exra MSB is generated
+ and r7,r4
+ cmp/eq r5,r4
+
+ mov #12,r4
+ SL(bf, .L_shift_rt_by_1,
+ mov #31,r5)
+
+.L_pack_mantissa:
+ !scale the mantissa t0 53 bits
+ mov #-19,r6
+ mov.l .L_mask_high_mant,r5
+
+ SHLRN (19, r6, r8)
+
+ and r3,r5
+
+ shlr r8
+ movt r1
+
+ SHLLN (12, r4, r5)
+
+ add #-1,r6
+
+ or r5,r8 !lower bits of resulting mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r3
+#else
+ SHLR20 (r3)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r4,r7
+#else
+ SHLL12 (r7)
+#endif
+ clrt
+
+ or r7,r3 !higher bits of resulting mantissa
+ mov #0,r7
+
+ !chk the exponent for underflow
+ cmp/ge r2,r7
+ bt .L_underflow
+
+ addc r1,r8 !rounding
+ mov r8,r1
+
+ addc r7,r3 !rounding
+ mov.l .L_mask_22_bit,r5
+
+ and r3,r5
+ !chk if extra msb is generated after rounding
+ cmp/eq r7,r5
+
+ mov.l .L_mask_high_mant,r8
+ bt .L_pack_result
+
+ add #1,r2
+ mov.l .L_2047,r6
+
+ cmp/ge r6,r2
+
+ bt .L_return_inf
+ shlr r3
+
+ rotcr r1
+
+.L_pack_result:
+ !pack the result, r2=exponent, r3=higher mantissa, r1=lower mantissa
+ !r0=sign bit
+ mov #20,r6
+ and r8,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ SHLL20 (r2)
+#endif
+ or r3,r0
+
+ or r2,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_norm_a:
+ !normalize op1
+ shll r5
+ mov.l .L_imp_bit,r1
+
+ rotcl r4
+ add #-1,r3
+
+ tst r1,r4
+ bt .L_norm_a
+
+ bra .L_chk_b
+ add #1,r3
+
+.L_norm_b:
+ !normalize op2
+ shll r7
+ mov.l .L_imp_bit,r1
+
+ rotcl r6
+ add #-1,r2
+
+ tst r1,r6
+ bt .L_norm_b
+
+ bra .L_mul1
+ add #1,r2
+
+.L_shift_rt_by_1:
+ !adjust the extra msb
+
+ add #1,r2 !add 1 to exponent
+ mov.l .L_2047,r6
+
+ cmp/ge r6,r2
+ mov #20,r6
+
+ bt .L_return_inf
+ shlr r7 !r7 contains bit 96-105 of product
+
+ rotcr r3 !r3 contains bit 64-95 of product
+
+ rotcr r8 !r8 contains bit 32-63 of product
+ bra .L_pack_mantissa
+
+ rotcr r1 !r1 contains bit 31-0 of product
+
+.L_return_inf:
+ !return Inf
+ mov.l .L_inf,r2
+ mov #0,r1
+
+ or r2,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_underflow:
+ !check if the result needs to be denormalized
+ mov #-53,r1
+ add #1,r2
+
+ cmp/gt r2,r1
+ mov #32,r4
+
+ add #-2,r2
+ bt .L_return_zero
+
+ add r2,r4
+ mov r7,r1
+
+ cmp/ge r7,r4
+ mov r2,r6
+
+ mov #-54,r2
+ bt .L_denorm
+
+ mov #-32,r6
+
+.L_denorm:
+ !denormalize the result
+ shlr r8
+ rotcr r1
+
+ shll r8
+ add #1,r6
+
+ shlr r3
+ rotcr r8
+
+ cmp/eq r7,r6
+ bf .L_denorm
+
+ mov r4,r6
+ cmp/eq r2,r4
+
+ bt .L_break
+ mov r7,r5
+
+ cmp/gt r6,r7
+ bf .L_break
+
+ mov r2,r4
+ mov r1,r5
+
+ mov r7,r1
+ bt .L_denorm
+
+.L_break:
+ mov #0,r2
+
+ cmp/gt r1,r2
+
+ addc r2,r8
+ mov.l .L_comp_1,r4
+
+ addc r7,r3
+ or r3,r0
+
+ cmp/eq r9,r7
+ bf .L_return
+
+ cmp/eq r7,r5
+ mov.l .L_mask_sign,r6
+
+ bf .L_return
+ cmp/eq r1,r6
+
+ bf .L_return
+ and r4,r8
+
+.L_return:
+ mov.l @r15+,r9
+ mov r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_return_zero:
+ mov.l @r15+,r9
+ mov r7,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+ .align 2
+
+.L_mask_high_mant:
+ .long 0x000fffff
+.L_inf:
+ .long 0x7ff00000
+.L_mask_sign:
+ .long 0x80000000
+.L_1023:
+ .long -1023
+.L_2047:
+ .long 2047
+.L_imp_bit:
+ .long 0x00100000
+.L_mask_22_bit:
+ .long 0x00200000
+.L_106_bit:
+ .long 0x00000200
+.L_comp_1:
+ .long 0xfffffffe
+
+ENDFUNC (GLOBAL (muldf3))
Index: gcc/config/sh/IEEE-754/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixdfsi.S (revision 0)
@@ -0,0 +1,195 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to signed integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixdfsi)
+ FUNC (GLOBAL (fixdfsi))
+
+GLOBAL (fixdfsi):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+#endif
+ mov.l .L_p_inf,r2
+ mov #-20,r1
+
+ mov r2,r7
+ mov.l .L_1023,r3
+
+ and r4,r2
+ shll r4
+
+ movt r6 ! r6 contains the sign bit
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2 ! r2 contains the exponent
+#else
+ SHLR20 (r2)
+#endif
+ shlr r4
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r7
+#else
+ SHLR20 (r7)
+#endif
+ cmp/hi r2,r3 ! if exp < 1023,return 0
+ mov.l .L_mask_high_mant,r1
+
+ SL(bt, .L_epil,
+ mov #0,r0)
+ and r4,r1 ! r1 contains high mantissa
+
+ cmp/eq r2,r7 ! chk if exp is invalid
+ mov.l .L_1053,r7
+
+ bt .L_inv_exp
+ mov #11,r0
+
+ cmp/hi r7,r2 ! If exp > 1053,return maxint
+ sub r2,r7
+
+ mov.l .L_21bit,r2
+ SL(bt, .L_ret_max,
+ add #1,r7) ! r7 contains the number of shifts
+
+ or r2,r1
+ mov r7,r3
+ shll8 r1
+
+ neg r7,r7
+ shll2 r1
+
+ shll r1
+ cmp/hi r3,r0
+
+ !chk if the result can be made only from higher mantissa
+ SL(bt, .L_lower_mantissa,
+ mov #21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_loop:
+ tst r7,r7
+ bt .L_break1
+ add #1,r7
+ bra .L_loop
+ shlr r1
+
+.L_break1:
+#endif
+ tst r6,r6
+ SL(bt, .L_epil,
+ mov r1,r0)
+
+ rts
+ neg r0,r0
+
+.L_lower_mantissa:
+ !result is made from lower mantissa also
+ neg r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r0,r5
+#else
+ SHLR21 (r5)
+#endif
+
+ or r5,r1 !pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_sh_loop:
+ tst r7,r7
+ bt .L_break
+ add #1,r7
+ bra .L_sh_loop
+ shlr r1
+
+.L_break:
+#endif
+ mov r1,r0
+ bra .L_chk_sign
+ nop
+
+.L_epil:
+ rts
+ nop
+
+.L_inv_exp:
+ cmp/hi r0,r5
+ bt .L_epil
+
+ cmp/hi r0,r1 !compare high mantissa,r1
+ bt .L_epil
+
+.L_ret_max:
+ mov.l .L_maxint,r0
+ tst r6,r6
+ bt .L_epil
+
+ rts
+ add #1,r0
+
+.L_chk_sign:
+ tst r6,r6 !sign bit is set, number is -ve
+ bt .L_epil
+
+ rts
+ neg r0,r0
+
+ .align 2
+
+.L_maxint:
+ .long 0x7fffffff
+.L_p_inf:
+ .long 0x7ff00000
+.L_mask_high_mant:
+ .long 0x000fffff
+.L_1023:
+ .long 0x000003ff
+.L_1053:
+ .long 1053
+.L_21bit:
+ .long 0x00100000
+
+ENDFUNC (GLOBAL (fixdfsi))
Index: gcc/config/sh/IEEE-754/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/divsf3.S (revision 0)
@@ -0,0 +1,393 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!divides two single precision floating point
+
+! Author: Aanchal Khanna
+
+! Arguments: Dividend is in r4, divisor in r5
+! Result: r0
+
+! r4 and r5 are referred as op1 and op2 resp.
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (divsf3)
+ FUNC (GLOBAL (divsf3))
+
+GLOBAL (divsf3):
+ mov.l .L_mask_sign,r1
+ mov r4,r3
+
+ xor r5,r3
+ shll r4
+
+ shlr r4
+ mov.l .L_inf,r2
+
+ and r3,r1 !r1=resultant sign
+ mov r4,r6
+
+ shll r5
+ mov #0,r0
+
+ shlr r5
+ and r2,r6
+
+ cmp/eq r2,r6
+ mov r5,r7
+
+ and r2,r7
+ bt .L_op1_inv
+
+ cmp/eq r2,r7
+ mov #-23,r3
+
+ bt .L_op2_inv
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+ SHLR23 (r7)
+#else
+ shld r3,r6
+ shld r3,r7
+#endif
+
+ cmp/eq r0,r4
+
+ bt .L_op1_zero !dividend=0
+ cmp/eq r0,r6
+
+ mov.l .L_imp_bit,r3
+ bt .L_norm_op1 !normalize dividend
+.L_chk_op2:
+ cmp/eq r0,r5
+ bt .L_op2_zero !divisor=0
+
+ cmp/eq r0,r7
+ bt .L_norm_op2 !normalize divisor
+
+.L_div1:
+ sub r7,r6
+ add #127,r6 !r6=resultant exponent
+
+ mov r3,r7
+ mov.l .L_mask_mant,r3
+
+ and r3,r4
+ !chk exponent for overflow
+ mov.l .L_255,r2
+
+ and r3,r5
+ or r7,r4
+
+ cmp/ge r2,r6
+ or r7,r5
+
+ bt .L_return_inf
+ mov r0,r2
+
+ cmp/eq r4,r5
+ bf .L_den_one
+
+ cmp/ge r6,r0
+ !numerator=denominator, quotient=1, remainder=0
+ mov r7,r2
+
+ mov r0,r4
+ !chk exponent for underflow
+ bt .L_underflow
+ bra .L_pack
+ nop
+
+.L_den_one:
+ !denominator=1, result=numerator
+
+ cmp/eq r7,r5
+ bf .L_divide
+
+ !chk exponent for underflow
+ cmp/ge r6,r0
+ mov r4,r2
+
+ SL(bt, .L_underflow,
+ mov r0,r4)
+ bra .L_pack
+ nop
+
+.L_divide:
+ !dividing the mantissas r4<-dividend, r5<-divisor
+
+ cmp/hi r4,r5
+ bf .L_loop
+
+ shll r4 ! if mantissa(op1)< mantissa(op2)
+ add #-1,r6 ! shift left the numerator and decrease the exponent.
+
+.L_loop:
+ !division loop
+
+ cmp/ge r5,r4
+ bf .L_skip
+
+ or r7,r2
+ sub r5,r4
+
+.L_skip:
+ shlr r7
+ shll r4
+
+ cmp/eq r0,r7
+ bf .L_loop
+
+ !chk the exponent for underflow
+ cmp/ge r6,r0
+ bt .L_underflow
+
+ !apply rounding
+ cmp/gt r5,r4
+ bt .L_round1
+
+ cmp/eq r4,r5
+ bt .L_round2
+
+.L_pack:
+ !pack the result, r1=sign, r2=quotient, r6=exponent
+
+ mov #23,r4
+ and r3,r2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r4,r6
+#endif
+ or r2,r1
+
+ or r6,r1
+ mov r1,r0
+
+ rts
+ nop
+
+.L_round1:
+ !Apply proper rounding
+
+ bra .L_pack
+ add #1,r2
+
+.L_round2:
+ !Apply proper rounding
+
+ mov.l .L_comp_1,r5
+ bra .L_pack
+ and r5,r2
+
+.L_op1_inv:
+ !chk if op1 is Inf or NaN
+
+ mov.l .L_mask_mant,r3
+ mov r4,r6
+
+ and r3,r6
+ cmp/hi r0,r6
+
+ bt .L_ret_NaN ! op1 is NaN, return NaN.
+ cmp/eq r2,r7
+
+ SL(bf, .L_return,
+ mov r4,r0) ! Inf/finite, return Inf
+
+ ! Inf/Inf or Inf/NaN, return NaN
+.L_ret_NaN:
+ rts
+ mov #-1,r0
+
+.L_op2_inv:
+ !chk if op2 is Inf or NaN
+
+ mov.l .L_mask_mant,r3
+ mov r5,r7
+
+ and r3,r7
+ cmp/hi r0,r7
+
+ bt .L_ret_op2
+ mov r1,r0
+
+ rts
+ nop
+
+.L_op1_zero:
+ !op1 is zero. If op2 is zero, return NaN, else return zero
+
+ cmp/eq r0,r5
+
+ bf .L_return
+
+ rts
+ mov #-1,r0
+
+.L_op2_zero:
+ !B is zero,return Inf
+
+ rts
+ or r2,r0
+
+.L_return_inf:
+ mov.l .L_inf,r0
+
+ rts
+ or r1,r0
+
+.L_norm_op1:
+ !normalize dividend
+
+ shll r4
+ tst r2,r4
+
+ add #-1,r6
+ bt .L_norm_op1
+
+ bra .L_chk_op2
+ add #1,r6
+
+.L_norm_op2:
+ !normalize divisor
+
+ shll r5
+ tst r2,r5
+
+ add #-1,r7
+ bt .L_norm_op2
+
+ bra .L_div1
+ add #1,r7
+
+.L_underflow:
+ !denormalize the result
+
+ add #1,r6
+ mov #-24,r7
+
+ cmp/gt r6,r7
+ mov r2,r5
+
+ bt .L_return_zero
+ add #-1,r6
+
+ mov #32,r3
+ neg r6,r7
+
+ add #1,r7
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ cmp/ge r0,r6
+ bf .L_mov_right
+
+.L_mov_left:
+ cmp/eq r0,r6
+ bt .L_out
+
+ shll r2
+ bra .L_mov_left
+ add #-1,r6
+
+.L_mov_right:
+ cmp/eq r0,r6
+ bt .L_out
+
+ add #1,r6
+ bra .L_mov_right
+ shlr r2
+
+.L_out:
+#endif
+ sub r7,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r5
+#else
+ cmp/ge r0,r3
+ bf .L_mov_right_1
+
+.L_mov_left_1:
+ shll r5
+ add #-1,r3
+
+ cmp/eq r0,r3
+ bf .L_mov_left_1
+
+ bt .L_out_1
+
+.L_mov_right_1:
+ cmp/eq r0,r3
+ bt .L_out_1
+
+ add #1,r3
+ bra .L_mov_right_1
+ shlr r5
+
+.L_out_1:
+#endif
+ shlr r2
+ addc r0,r2
+
+ cmp/eq r4,r0 !r4 contains the remainder
+ mov r2,r0
+
+ mov.l .L_mask_sign,r7
+ bf .L_return
+
+ mov.l .L_comp_1,r2
+ cmp/eq r7,r5
+
+ bf .L_return
+ and r2,r0
+
+.L_return:
+.L_return_zero:
+ rts
+ or r1,r0
+
+.L_ret_op2:
+ rts
+ or r5,r0
+
+
+ .align 2
+.L_inf:
+ .long 0x7f800000
+.L_mask_sign:
+ .long 0x80000000
+.L_mask_mant:
+ .long 0x007fffff
+.L_imp_bit:
+ .long 0x00800000
+.L_comp_1:
+ .long 0xfffffffe
+.L_255:
+ .long 255
+
+ENDFUNC (GLOBAL (divsf3))
Index: gcc/config/sh/IEEE-754/fixunssfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunssfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixunssfsi.S (revision 0)
@@ -0,0 +1,150 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion from floating point to unsigned integer
+
+! Author: Rakesh Kumar
+
+! Argument: r4 (in floating point format)
+! Result: r0
+
+! For negative floating point numbers, it returns zero
+
+! The argument is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixunssfsi)
+ FUNC (GLOBAL (fixunssfsi))
+
+GLOBAL (fixunssfsi):
+ mov.l .L_sign,r0
+ mov r4,r2
+
+ ! Check for NaN
+ mov.l .L_inf,r1
+ and r4,r0
+
+ mov.l .L_mask_sign,r7
+ mov #127,r5
+
+ ! Remove sign bit
+ cmp/eq #0,r0
+ and r7,r2
+
+ ! If number is negative, return 0
+ ! LIBGCC deviates from standard in this regard.
+ mov r4,r3
+ SL(bf, .L_epil,
+ mov #0,r0)
+
+ mov.l .L_frac,r6
+ cmp/gt r1,r2
+
+ shll r2
+ SL1(bt, .L_epil,
+ shlr16 r2)
+
+ shlr8 r2 ! r2 has exponent
+ mov.l .L_24bit,r1
+
+ and r6,r3 ! r3 has fraction
+ cmp/gt r2,r5
+
+ ! If exponent is less than 127, return 0
+ or r1,r3
+ bt .L_epil
+
+ ! Process only if exponent is less than 158
+ mov.l .L_158,r1
+ shll8 r3
+
+ cmp/gt r1,r2
+ sub r2,r1
+
+ neg r1,r1
+ bt .L_ret_max
+
+! Shift the mantissa with exponent difference from 158
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r3
+#else
+ cmp/gt r0,r1
+ bt .L_mov_left
+
+.L_mov_right:
+ cmp/eq r1,r0
+ bt .L_ret
+
+ add #1,r1
+ bra .L_mov_right
+ shlr r3
+
+.L_mov_left:
+ add #-1,r1
+
+ shll r3
+ cmp/eq r1,r0
+
+ bf .L_mov_left
+
+.L_ret:
+#endif
+ rts
+ mov r3,r0
+
+! r0 already has appropriate value
+.L_epil:
+ rts
+ nop
+
+! Return the maximum unsigned integer value
+.L_ret_max:
+ mov.l .L_max,r3
+
+ rts
+ mov r3,r0
+
+ .align 2
+.L_inf:
+ .long 0x7F800000
+
+.L_158:
+ .long 158
+
+.L_max:
+ .long 0xFFFFFFFF
+
+.L_frac:
+ .long 0x007FFFFF
+
+.L_sign:
+ .long 0x80000000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_mask_sign:
+ .long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixunssfsi))
Index: gcc/config/sh/IEEE-754/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatunssidf.S (revision 0)
@@ -0,0 +1,71 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of unsigned integer to double precision floating point number
+!Author:Rakesh Kumar
+!Rewritten for SH1 support: Joern Rennecke
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatunsidf)
+ FUNC (GLOBAL (floatunsidf))
+
+GLOBAL (floatunsidf):
+ mov.w LOCAL(x41f0),DBLRH ! bias + 32
+ tst r4,r4 ! check for zero
+ bt .L_ret_zero
+.L_loop:
+ shll r4
+ SL(bf, .L_loop,
+ add #-16,DBLRH)
+
+ mov r4,DBLRL
+
+ SHLL20 (DBLRL)
+
+ shll16 DBLRH ! put exponent in proper place
+
+ SHLR12 (r4)
+
+ rts
+ or r4,DBLRH
+
+.L_ret_zero:
+ mov #0,r1
+ rts
+ mov #0,r0
+
+LOCAL(x41f0): .word 0x41f0
+ .align 2
+
+ENDFUNC (GLOBAL (floatunsidf))
Index: gcc/config/sh/IEEE-754/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/addsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/addsf3.S (revision 0)
@@ -0,0 +1,530 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Add floating point numbers in r4, r5.
+
+! Author: Rakesh Kumar
+
+! Arguments are in r4, r5 and result in r0
+
+! Entry points: ___subsf3, ___addsf3
+
+! r4 and r5 are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (subsf3)
+ .global GLOBAL (addsf3)
+ FUNC (GLOBAL (subsf3))
+ FUNC (GLOBAL (addsf3))
+
+GLOBAL (subsf3):
+ mov.l .L_sign_bit,r1
+ xor r1,r5
+
+GLOBAL (addsf3):
+ mov.l r8,@-r15
+ mov r4,r3
+
+ mov.l .L_pinf,r2
+ mov #0,r8
+
+ and r2,r3 ! op1's exponent.
+ mov r5,r6
+
+ ! Check NaN or Infinity
+ and r2,r6 ! op2's exponent.
+ cmp/eq r2,r3
+
+ ! go if op1 is NaN or INF.
+ mov.l .L_sign_bit,r0
+ SL(bt, .L_inv_op1,
+ mov #-23,r1)
+
+ ! Go if op2 is NaN/INF.
+ cmp/eq r2,r6
+ mov r0,r7
+ bt .L_ret_op2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r3)
+#else
+ shld r1,r3
+#endif
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+#else
+ shld r1,r6
+#endif
+
+ ! Check for negative zero
+ cmp/eq r0,r5
+
+ mov r5,r1
+ SL(bt, .L_ret_op1,
+ and r7,r1)
+
+ cmp/eq r0,r4
+ bt .L_ret_op2
+
+ ! if op1 is zero return op2
+ tst r4,r4
+ bt .L_ret_op2
+
+ ! Equal numbers with opposite sign
+ mov r4,r2
+ xor r5,r2
+
+ cmp/eq r0,r2
+ bt .L_ret_zero
+
+ ! if op2 is zero return op1
+ mov.l .L_mask_fra,r2
+ tst r5,r5
+
+ ! Extract the mantissa
+ mov r4,r0
+ SL(bt, .L_ret_op1,
+ and r2,r5)
+
+ and r2,r4
+
+ mov.l .L_imp_bit,r2
+ and r7,r0 ! sign bit of op1
+
+ ! Check for denormals
+ tst r3,r3
+ bt .L_norm_op1
+
+ ! Attach the implicit bit
+ or r2,r4
+ tst r6,r6
+
+ bt .L_norm_op2
+
+ or r2,r5
+ tst r0,r0
+
+ ! operands are +ve or -ve??
+ bt .L_ptv_op1
+
+ neg r4,r4
+
+.L_ptv_op1:
+ tst r1,r1
+ bt .L_ptv_op2
+
+ neg r5,r5
+
+! Test exponents for equality
+.L_ptv_op2:
+ cmp/eq r3,r6
+ bt .L_exp_eq
+
+! Make exponents of two arguments equal
+.L_exp_ne:
+ ! r0, r1 contain sign bits.
+ ! r4, r5 contain mantissas.
+ ! r3, r6 contain exponents.
+ ! r2, r7 scratch.
+
+ ! Calculate result exponent.
+ mov r6,r2
+ sub r3,r2 ! e2 - e1
+
+ cmp/pl r2
+ mov #23,r7
+
+ ! e2 - e1 is -ve
+ bf .L_exp_ne_1
+
+ mov r6,r3 ! Result exp.
+ cmp/gt r7,r2 ! e2-e1 > 23
+
+ mov #1,r7
+ bt .L_pack_op2_0
+
+ ! Align the mantissa
+.L_loop_ne:
+ shar r4
+
+ rotcr r8
+ cmp/eq r7,r2
+
+ add #-1,r2
+ bf .L_loop_ne
+
+ bt .L_exp_eq
+
+! Exponent difference is too high.
+! Return op2 after placing pieces in proper place
+.L_pack_op2_0:
+ ! If op1 is -ve
+ tst r1,r1
+ bt .L_pack_op2
+
+ neg r5,r5
+
+! r6 has exponent
+! r5 has mantissa, r1 has sign
+.L_pack_op2:
+ mov.l .L_nimp_bit,r2
+ mov #23,r3
+
+ mov r1,r0
+
+ and r2,r5
+ mov.l @r15+,r8
+
+ or r5,r0
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+ rts
+ or r6,r0
+
+! return op1. It is NAN or INF or op2 is zero.
+.L_ret_op1:
+ mov r4,r0
+
+ rts
+ mov.l @r15+,r8
+
+! return zero
+.L_ret_zero:
+ mov #0,r0
+
+ rts
+ mov.l @r15+,r8
+
+! return op2. It is NaN or INF or op1 is zero.
+.L_ret_op2:
+ mov r5,r0
+
+ rts
+ mov.l @r15+,r8
+
+! op2 is denormal. Normalize it.
+.L_norm_op2:
+ shll r5
+ add #-1,r6
+
+ tst r2,r5
+ bt .L_norm_op2
+
+ ! Check sign
+ tst r1,r1
+ bt .L_norm_op2_2
+
+ neg r5,r5
+
+.L_norm_op2_2:
+ add #1,r6
+ cmp/eq r3,r6
+
+ bf .L_exp_ne
+ bt .L_exp_eq
+
+! Normalize op1
+.L_norm_op1:
+ shll r4
+ add #-1,r3
+
+ tst r2,r4
+ bt .L_norm_op1
+
+ ! Check sign
+ tst r0,r0
+ bt .L_norm_op1_1
+
+ neg r4,r4
+
+.L_norm_op1_1:
+ ! Adjust biasing
+ add #1,r3
+
+ ! Check op2 for denormalized value
+ tst r6,r6
+ bt .L_norm_op2
+
+ mov.l .L_imp_bit,r2
+
+ tst r1,r1 ! Check sign
+ or r2,r5 ! Attach 24th bit
+
+ bt .L_norm_op1_2
+
+ neg r5,r5
+
+.L_norm_op1_2:
+ cmp/eq r3,r6
+
+ bt .L_exp_eq
+ bf .L_exp_ne
+
+! op1 is NaN or Inf
+.L_inv_op1:
+ ! Return op1 if it is NAN.
+ ! r2 is infinity
+ cmp/gt r2,r4
+ bt .L_ret_op1
+
+ ! op1 is +/- INF
+ ! If op2 is same return now.
+ cmp/eq r4,r5
+ bt .L_ret_op1
+
+ ! return op2 if it is NAN
+ cmp/gt r2,r5
+ bt .L_ret_op2
+
+ ! Check if op2 is inf
+ cmp/eq r2,r6
+ bf .L_ret_op1
+
+ ! Both op1 and op2 are infinities
+ !of opp signs, or there is -NAN. Return a NAN.
+ mov.l @r15+,r8
+ rts
+ mov #-1,r0
+
+! Make unequal exponents equal.
+.L_exp_ne_1:
+ mov #-25,r7
+ cmp/gt r2,r7 ! -23 > e2 - e1
+
+ add #1,r2
+ bf .L_exp_ne_2
+
+ tst r0,r0
+ bt .L_pack_op1
+
+.L_pack_op1_0:
+ bra .L_pack_op1
+ neg r4,r4
+
+! Accumulate the shifted bits in r8
+.L_exp_ne_2:
+ ! Shift with rounding
+ shar r5
+ rotcr r8
+
+ tst r2,r2
+
+ add #1,r2
+ bf .L_exp_ne_2
+
+! Exponents of op1 and op2 are equal (or made so)
+! The mantissas are in r4-r5 and remaining bits in r8
+.L_exp_eq:
+ add r5,r4 ! Add fractions.
+ mov.l .L_sign_bit,r2
+
+ ! Check for negative result
+ mov #0,r0
+ tst r2,r4
+
+ mov.l .L_255,r5
+ bt .L_post_add
+
+ negc r8,r8
+ negc r4,r4
+ or r2,r0
+
+.L_post_add:
+ ! Check for extra MSB
+ mov.l .L_chk_25,r2
+
+ tst r2,r4
+ bt .L_imp_check
+
+ shar r4
+ rotcr r8
+
+ add #1,r3
+ cmp/ge r5,r3
+
+ ! Return Inf if exp > 254
+ bt .L_ret_inf
+
+! Check for implicit (24th) bit in result
+.L_imp_check:
+ mov.l .L_imp_bit,r2
+ tst r2,r4
+
+ bf .L_pack_op1
+
+! Result needs left shift
+.L_lft_shft:
+ shll r8
+ rotcl r4
+
+ add #-1,r3
+ tst r2,r4
+
+ bt .L_lft_shft
+
+! Pack the result after rounding
+.L_pack_op1:
+ ! See if denormalized result is possible
+ mov.l .L_chk_25,r5
+ cmp/pl r3
+
+ bf .L_denorm_res
+
+ ! Are there any bits shifted previously?
+ tst r8,r8
+ bt .L_pack_1
+
+ ! Round
+ shll r8
+ movt r6
+
+ add r6,r4
+
+ ! If we are halfway between two numbers,
+ ! round towards LSB = 0
+ tst r8,r8
+
+ bf .L_pack_1
+
+ shlr r4
+ shll r4
+
+.L_pack_1:
+ ! Adjust extra MSB generated after rounding
+ tst r4,r5
+ mov.l .L_255,r2
+
+ bt .L_pack_2
+ shar r4
+
+ add #1,r3
+ cmp/ge r2,r3 ! Check for exp overflow
+
+ bt .L_ret_inf
+
+! Pack it finally
+.L_pack_2:
+ ! Do not store implicit bit
+ mov.l .L_nimp_bit,r2
+ mov #23,r1
+
+ and r2,r4
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r3)
+#else
+ shld r1,r3
+#endif
+ mov.l @r15+,r8
+
+ or r4,r0
+ rts
+ or r3,r0
+
+! Return infinity
+.L_ret_inf:
+ mov.l .L_pinf,r2
+
+ mov.l @r15+,r8
+ rts
+ or r2,r0
+
+! Result must be denormalized
+.L_denorm_res:
+ mov #0,r2
+
+! Denormalizing loop with rounding
+.L_den_1:
+ shar r4
+ movt r6
+
+ tst r3,r3
+ bt .L_den_2
+
+ ! Increment the exponent
+ add #1,r3
+
+ tst r6,r6
+ bt .L_den_0
+
+ ! Count number of ON bits shifted
+ add #1,r2
+
+.L_den_0:
+ bra .L_den_1
+ nop
+
+! Apply rounding
+.L_den_2:
+ cmp/eq r6,r1
+ bf .L_den_3
+
+ add r6,r4
+ mov #1,r1
+
+ ! If halfway between two numbers,
+ ! round towards LSB = 0
+ cmp/eq r2,r1
+ bf .L_den_3
+
+ shar r4
+ shll r4
+
+.L_den_3:
+
+ mov.l @r15+,r8
+ rts
+ or r4,r0
+
+ .align 2
+.L_imp_bit:
+ .long 0x00800000
+
+.L_nimp_bit:
+ .long 0xFF7FFFFF
+
+.L_mask_fra:
+ .long 0x007FFFFF
+
+.L_pinf:
+ .long 0x7F800000
+
+.L_sign_bit:
+ .long 0x80000000
+
+.L_bit_25:
+ .long 0x01000000
+
+.L_chk_25:
+ .long 0x7F000000
+
+.L_255:
+ .long 0x000000FF
+
+ENDFUNC (GLOBAL (addsf3))
+ENDFUNC (GLOBAL (subsf3))
Index: gcc/config/sh/IEEE-754/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/mulsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/mulsf3.S (revision 0)
@@ -0,0 +1,347 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for multiplying two floating point numbers
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 and r5
+! Result: r0
+
+! The arguments are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (mulsf3)
+ FUNC (GLOBAL (mulsf3))
+
+GLOBAL (mulsf3):
+ ! Extract the sign bits
+ mov.l .L_sign,r3
+ mov r3,r0
+
+ and r4,r3 ! sign bit for op1
+ mov.l .L_sign_mask,r6
+
+ ! Mask out the sign bit from op1 and op2
+ and r5,r0 ! sign bit for op2
+ mov.l .L_inf,r2
+
+ and r6,r4
+ xor r3,r0 ! Final sign in r0
+
+ and r6,r5
+ tst r4,r4
+
+ ! Check for zero
+ mov r5,r7
+ ! Check op1 for zero
+ SL(bt, .L_op1_zero,
+ mov r4,r6)
+
+ tst r5,r5
+ bt .L_op2_zero ! op2 is zero
+
+ ! Extract the exponents
+ and r2,r6 ! Exponent of op1
+ cmp/eq r2,r6
+
+ and r2,r7
+ bt .L_inv_op1 ! op1 is NaN or Inf
+
+ mov.l .L_mant,r3
+ cmp/eq r2,r7
+
+ and r3,r4 ! Mantissa of op1
+ bt .L_ret_op2 ! op2 is Nan or Inf
+
+ and r3,r5 ! Mantissa of op2
+
+ mov #-23,r3
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+ SHLR23 (r7)
+#else
+ shld r3,r6
+ shld r3,r7
+#endif
+ ! Check for denormals
+ mov.l .L_24bit,r3
+ tst r6,r6
+
+ bt .L_norm_op1 ! op1 is denormal
+ add #-127,r6 ! Unbias op1's exp
+
+ tst r7,r7
+ bt .L_norm_op2 ! op2 is denormal
+
+ add #-127,r7 ! Unbias op2's exp
+
+.L_multiply:
+ add r6,r7 ! Final exponent in r7
+ mov.l .L_24bit,r1
+
+ ! set 24th bit of mantissas
+ mov #127,r3
+ or r1,r4
+
+ DMULU_SAVE
+
+ ! Multiply
+ or r1,r5
+ DMULUL (r4,r5,r4)
+
+ DMULUH (r5)
+
+ DMULU_RESTORE
+
+ mov.l .L_16bit,r6
+
+ ! Check for extra MSB generated
+ tst r5,r6
+
+ mov.l .L_255,r1
+ bf .L_shift_by_1 ! Adjust the extra MSB
+
+! Normalize the result with rounding
+.L_epil:
+ ! Bias the exponent
+ add #127,r7
+ cmp/ge r1,r7
+
+ ! Check exponent overflow and underflow
+ bt .L_ret_inf
+
+ cmp/pl r7
+ bf .L_denorm
+
+.L_epil_0:
+ mov #-23,r3
+ shll r5
+ mov #0,r6
+
+! Fit resultant mantissa in 24 bits
+! Apply default rounding
+.L_loop_epil_0:
+ tst r3,r3
+ bt .L_loop_epil_out
+
+ add #1,r3
+ shlr r4
+
+ bra .L_loop_epil_0
+ rotcr r6
+
+! Round mantissa
+.L_loop_epil_out:
+ shll8 r5
+ or r5,r4
+
+ mov.l .L_mant,r2
+ mov #23,r3
+
+ ! Check last bit shifted out of result
+ tst r6,r6
+ bt .L_epil_2
+
+ ! Round
+ shll r6
+ movt r5
+
+ add r5,r4
+
+ ! If this is the only ON bit shifted
+ ! Round towards LSB = 0
+ tst r6,r6
+ bf .L_epil_2
+
+ shlr r4
+ shll r4
+
+.L_epil_2:
+ ! Rounding may have produced extra MSB.
+ mov.l .L_25bit,r5
+ tst r4,r5
+
+ bt .L_epil_1
+
+ add #1,r7
+ shlr r4
+
+.L_epil_1:
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r7)
+#else
+ shld r3,r7
+#endif
+
+ and r2,r4
+
+ or r7,r4
+ rts
+ or r4,r0
+
+.L_denorm:
+ mov #0,r3
+
+.L_den_1:
+ shlr r5
+ rotcr r4
+
+ cmp/eq r3,r7
+ bt .L_epil_0
+
+ bra .L_den_1
+ add #1,r7
+
+
+! Normalize the first argument
+.L_norm_op1:
+ shll r4
+ tst r3,r4
+
+ add #-1,r6
+ bt .L_norm_op1
+
+ ! The biasing is by 126
+ add #-126,r6
+ tst r7,r7
+
+ bt .L_norm_op2
+
+ bra .L_multiply
+ add #-127,r7
+
+! Normalize the second argument
+.L_norm_op2:
+ shll r5
+ tst r3,r5
+
+ add #-1,r7
+ bt .L_norm_op2
+
+ bra .L_multiply
+ add #-126,r7
+
+! op2 is zero. Check op1 for exceptional cases
+.L_op2_zero:
+ mov.l .L_inf,r2
+ and r2,r6
+
+ ! Check if op1 is deterministic
+ cmp/eq r2,r6
+ SL(bf, .L_ret_op2,
+ mov #1,r1)
+
+ ! Return NaN
+ rts
+ mov #-1,r0
+
+! Adjust the extra MSB
+.L_shift_by_1:
+ shlr r5
+ rotcr r4
+
+ add #1,r7 ! Show the shift in exponent
+
+ cmp/gt r3,r7
+ bf .L_epil
+
+ ! The resultant exponent is invalid
+ mov.l .L_inf,r1
+ rts
+ or r1,r0
+
+.L_ret_op1:
+ rts
+ or r4,r0
+
+! op1 is zero. Check op2 for exceptional cases
+.L_op1_zero:
+ mov.l .L_inf,r2
+ and r2,r7
+
+ ! Check if op2 is deterministic
+ cmp/eq r2,r7
+ SL(bf, .L_ret_op1,
+ mov #1,r1)
+
+ ! Return NaN
+ rts
+ mov #-1,r0
+
+.L_inv_op1:
+ mov.l .L_mant,r3
+ mov r4,r6
+
+ and r3,r6
+ tst r6,r6
+
+ bf .L_ret_op1 ! op1 is Nan
+ ! op1 is not Nan. It is Inf
+
+ cmp/eq r2,r7
+ bf .L_ret_op1 ! op2 has a valid exponent
+
+! op2 has a invalid exponent. It could be Inf, -Inf, Nan.
+! It doesn't make any difference.
+.L_ret_op2:
+ rts
+ or r5,r0
+
+.L_ret_inf:
+ rts
+ or r2,r0
+
+.L_ret_zero:
+ mov #0,r2
+ rts
+ or r2,r0
+
+
+ .align 2
+.L_mant:
+ .long 0x007FFFFF
+
+.L_inf:
+ .long 0x7F800000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_25bit:
+ .long 0x01000000
+
+.L_16bit:
+ .long 0x00008000
+
+.L_sign:
+ .long 0x80000000
+
+.L_sign_mask:
+ .long 0x7FFFFFFF
+
+.L_255:
+ .long 0x000000FF
+
+ENDFUNC (GLOBAL (mulsf3))
Index: gcc/config/sh/IEEE-754/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatsidf.S (revision 0)
@@ -0,0 +1,146 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of signed integer to double precision floating point number
+!Author:Rakesh Kumar
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatsidf)
+ FUNC (GLOBAL (floatsidf))
+
+GLOBAL (floatsidf):
+ mov.l .L_sign,r0
+ mov #0,r1
+
+ mov r0,r2
+ tst r4,r4 ! check r4 for zero
+
+ ! Extract the sign
+ mov r2,r3
+ SL(bt, .L_ret_zero,
+ and r4,r0)
+
+ cmp/eq r1,r0
+ not r3,r3
+
+ mov r1,r7
+ SL(bt, .L_loop,
+ and r4,r3)
+
+ ! Treat -2147483648 as special case
+ cmp/eq r1,r3
+ neg r4,r4
+
+ bt .L_ret_min
+
+.L_loop:
+ shll r4
+ mov r4,r5
+
+ and r2,r5
+ cmp/eq r1,r5
+
+ add #1,r7
+ bt .L_loop
+
+ mov.l .L_initial_exp,r6
+ not r2,r2
+
+ and r2,r4
+ mov #21,r3
+
+ sub r7,r6
+ mov r4,r1
+
+ mov #20,r7
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r1
+#else
+ SHLL21 (r1)
+#endif
+ mov #-11,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r6 ! Exponent in proper place
+#else
+ SHLL20 (r6)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r4
+#else
+ SHLR11 (r4)
+#endif
+ or r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+#ifdef __LITTLE_ENDIAN__
+ or r4,r1
+#else
+ or r4,r0
+#endif
+
+.L_ret_zero:
+ rts
+ mov #0,r0
+
+.L_ret_min:
+ mov.l .L_min,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ nop
+
+ .align 2
+
+.L_initial_exp:
+ .long 0x0000041E
+
+.L_sign:
+ .long 0x80000000
+
+.L_min:
+ .long 0xC1E00000
+
+ENDFUNC (GLOBAL (floatsidf))
Index: gcc/config/sh/IEEE-754/fixsfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixsfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixsfsi.S (revision 0)
@@ -0,0 +1,160 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion routine for float to integer
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 (in floating point format)
+! Return: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixsfsi)
+ FUNC (GLOBAL (fixsfsi))
+
+GLOBAL (fixsfsi):
+ mov.l .L_mask_sign,r7
+ mov r4,r2
+
+ ! Check for NaN
+ mov.l .L_inf,r1
+ and r7,r2
+
+ cmp/gt r1,r2
+ mov #127,r5
+
+ mov r4,r3
+ SL(bt, .L_epil,
+ mov #0,r0)
+
+ shll r2
+ mov.l .L_frac,r6
+
+ shlr16 r2
+ and r6,r3 ! r3 has fraction
+
+ shlr8 r2 ! r2 has exponent
+ mov.l .L_24bit,r1
+
+ ! If exponent is less than 127, return 0
+ cmp/gt r2,r5
+ or r1,r3 ! Set the implicit bit
+
+ mov.l .L_157,r1
+ SL1(bt, .L_epil,
+ shll8 r3)
+
+ ! If exponent is greater than 157,
+ ! return the maximum/minumum integer
+ ! value deducing from sign
+ cmp/gt r1,r2
+ sub r2,r1
+
+ mov.l .L_sign,r2
+ SL(bt, .L_ret_max,
+ add #1,r1)
+
+ and r4,r2 ! Sign in r2
+ neg r1,r1
+
+ ! Shift mantissa by exponent difference from 157
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r3
+#else
+ cmp/gt r0,r1
+ bt .L_mov_left
+
+.L_mov_right:
+ cmp/eq r1,r0
+ bt .L_ret
+
+ add #1,r1
+ bra .L_mov_right
+
+ shlr r3
+
+.L_mov_left:
+ add #-1,r1
+
+ shll r3
+ cmp/eq r1,r0
+
+ bf .L_mov_left
+.L_ret:
+#endif
+ ! If op1 is negative, negate the result
+ cmp/eq r0,r2
+ SL(bf, .L_negate,
+ mov r3,r0)
+
+! r0 has the appropriate value
+.L_epil:
+ rts
+ nop
+
+! Return the max/min integer value
+.L_ret_max:
+ and r4,r2 ! Sign in r2
+ mov.l .L_max,r3
+
+ mov.l .L_sign,r1
+ cmp/eq r0,r2
+
+ mov r3,r0
+ bt .L_epil
+
+ ! Negative number, return min int
+ rts
+ mov r1,r0
+
+! Negate the result
+.L_negate:
+ rts
+ neg r0,r0
+
+ .align 2
+.L_inf:
+ .long 0x7F800000
+
+.L_157:
+ .long 157
+
+.L_max:
+ .long 0x7FFFFFFF
+
+.L_frac:
+ .long 0x007FFFFF
+
+.L_sign:
+ .long 0x80000000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_mask_sign:
+ .long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixsfsi))
Index: gcc/config/sh/ieee-754-sf.S
===================================================================
--- gcc/config/sh/ieee-754-sf.S (revision 0)
+++ gcc/config/sh/ieee-754-sf.S (revision 0)
@@ -0,0 +1,692 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_ANY__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Single-precision floating-point emulation.
+ We handle NANs, +-infinity, and +-zero.
+ However, we assume that for NANs, the topmost bit of the fraction is set. */
+#ifdef L_nesf2
+/* -ffinite-math-only inline version, T := r4:SF == r5:SF
+ cmp/eq r4,r5
+ mov r4,r0
+ bt 0f
+ or r5,r0
+ add r0,r0
+ tst r0,r0 ! test for +0.0 == -0.0 ; -0.0 == +0.0
+ 0: */
+ .balign 4
+ .global GLOBAL(nesf2)
+ HIDDEN_FUNC(GLOBAL(nesf2))
+GLOBAL(nesf2):
+ /* If the raw values are unequal, the result is unequal, unless
+ both values are +-zero.
+ If the raw values are equal, the result is equal, unless
+ the values are NaN. */
+ cmp/eq r4,r5
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ not r4,r0
+ bt LOCAL(check_nan)
+ mov r4,r0
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(check_nan):
+ tst r1,r0
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(nesf2))
+#endif /* L_nesf2 */
+
+#ifdef L_unordsf2
+ .balign 4
+ .global GLOBAL(unordsf2)
+ HIDDEN_FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ not r4,r0
+ tst r1,r0
+ not r5,r0
+ bt LOCAL(unord)
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
+#if defined(L_gtsf2t) || defined(L_gtsf2t_trap)
+/* -ffinite-math-only inline version, T := r4:SF > r5:SF ? 0 : 1
+ cmp/pz r4
+ mov r4,r0
+ bf/s 0f
+ cmp/hs r5,r4
+ cmp/ge r4,r5
+ or r5,r0
+ bt 0f
+ add r0,r0
+ tst r0,r0
+ 0: */
+#ifdef L_gtsf2t
+#define fun_label GLOBAL(gtsf2t)
+#else
+#define fun_label GLOBAL(gtsf2t_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater, the result true, unless
+ any of them is a nan (but infinity is fine), or both values are
+ +- zero. Otherwise, the result false. */
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ cmp/pz r4
+ not r5,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov r4,r0
+ bt LOCAL(nan)
+ cmp/gt r5,r4
+ SLC(bf, LOCAL(check_nan),
+ cmp/gt r4,r1)
+ bf LOCAL(nan)
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ not r4,r0
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi r4,r5
+#if defined(L_gtsf2t) && defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+#endif /* DELAYED_BRANCHES */
+ rts
+ movt r0
+#ifdef L_gtsf2t
+LOCAL(check_nan):
+LOCAL(nan):
+ rts
+ mov #0,r0
+#else /* ! L_gtsf2t */
+LOCAL(check_nan):
+ SLI(cmp/gt r4,r1)
+ bf LOCAL(nan)
+ rts
+ movt r0
+LOCAL(nan):
+ mov #0,r0
+ trapa #0
+#endif /* ! L_gtsf2t */
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(fun_label)
+#endif /* L_gtsf2t */
+
+#if defined(L_gesf2f) || defined(L_gesf2f_trap)
+/* -ffinite-math-only inline version, T := r4:SF >= r5:SF */
+ cmp/pz r5
+ mov r4,r0
+ bf/s 0f
+ cmp/hs r4,r5
+ cmp/ge r5,r4
+ or r5,r0
+ bt 0f
+ add r0,r0
+ tst r0,r0
+ 0:
+#ifdef L_gesf2f
+#define fun_label GLOBAL(gesf2f)
+#else
+#define fun_label GLOBAL(gesf2f_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater or equal, the result is
+ true, unless any of them is a nan. If both are -+zero, the
+ result is true; otherwise, it is false.
+ We use 0 as true and nonzero as false for this function. */
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ cmp/pz r5
+ not r4,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov r4,r0
+ bt LOCAL(nan)
+ cmp/gt r4,r5
+ SLC(bf, LOCAL(check_nan),
+ cmp/ge r1,r5)
+ bt LOCAL(nan)
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ not r5,r0
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi r5,r4
+#if defined(L_gesf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+ rts
+ movt r0
+#if defined(L_gesf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+ cmp/ge r1,r5
+LOCAL(nan):
+ rts
+ movt r0
+#endif /* ! DELAYED_BRANCHES */
+#ifdef L_gesf2f_trap
+LOCAL(check_nan):
+ SLI(cmp/ge r1,r5)
+ bt LOCAL(nan)
+ rts
+LOCAL(nan):
+ movt r0
+ trapa #0
+#endif /* L_gesf2f_trap */
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(gesf2f))
+#endif /* L_gesf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_add_sub_sf3
+#include "IEEE-754/addsf3.S"
+#endif /* _add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+#include "IEEE-754/fixunssfsi.S"
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+#include "IEEE-754/fixsfsi.S"
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/divsf3.S"
+#endif /* L_divsf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift. Supporting SH1 / SH2 here would
+ make this code too hard to maintain, so if you want to add SH1 / SH2
+ support, do it in a separate copy. */
+#ifdef DYN_SHIFT
+#ifdef L_add_sub_sf3
+#include "IEEE-754/m3/addsf3.S"
+#endif /* L_add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/m3/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get UINT_MAX, for set sign bit, you get 0.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixunssfsi)
+ FUNC(GLOBAL(fixunssfsi))
+GLOBAL(fixunssfsi):
+ mov.l LOCAL(max),r2
+ mov #-23,r1
+ mov r4,r0
+ shad r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/ge r2,r0
+ or r2,r0
+ bt LOCAL(retmax)
+ cmp/pz r4
+ and r1,r0
+ bf LOCAL(ret0)
+ add #-23,r4
+ rts
+ shld r4,r0
+LOCAL(ret0):
+LOCAL(retmax):
+ rts
+ subc r0,r0
+ .balign 4
+LOCAL(mask):
+ .long 0x00ffffff
+LOCAL(max):
+ .long 0x4f800000
+ ENDFUNC(GLOBAL(fixunssfsi))
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get INT_MAX, for set sign bit, you get INT_MIN.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixsfsi)
+ FUNC(GLOBAL(fixsfsi))
+ .balign 4
+GLOBAL(fixsfsi):
+ mov r4,r0
+ shll r4
+ mov #-24,r1
+ bt LOCAL(neg)
+ mov.l LOCAL(max),r2
+ shld r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/pz r4
+ add #-23,r4
+ bf LOCAL(ret0)
+ cmp/gt r0,r2
+ bf LOCAL(retmax)
+ and r1,r0
+ addc r1,r0
+ rts
+ shld r4,r0
+
+ .balign 4
+LOCAL(neg):
+ mov.l LOCAL(min),r2
+ shld r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/pz r4
+ add #-23,r4
+ bf LOCAL(ret0)
+ cmp/gt r0,r2
+ bf LOCAL(retmin)
+ and r1,r0
+ addc r1,r0
+ shld r4,r0 ! SH4-200 will start this insn on a new cycle
+ rts
+ neg r0,r0
+
+ .balign 4
+LOCAL(ret0):
+ rts
+ mov #0,r0
+
+LOCAL(retmax):
+ mov #-1,r0
+ rts
+ shlr r0
+
+LOCAL(retmin):
+ mov #1,r0
+ rts
+ rotr r0
+
+ .balign 4
+LOCAL(mask):
+ .long 0x007fffff
+LOCAL(max):
+ .long 0x4f000000
+LOCAL(min):
+ .long 0xcf000000
+ ENDFUNC(GLOBAL(fixsfsi))
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/m3/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/m3/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/m3/divsf3.S"
+#endif /* L_divsf3 */
+
+#ifdef L_hypotf
+ .balign 4
+ .global GLOBAL(hypotf)
+ FUNC(GLOBAL(hypotf))
+GLOBAL(hypotf):
+/* This integer implementation takes 71 to 72 cycles in the main path.
+ This is a bit slower than the SH4 can do this computation using double
+ precision hardware floating point - 57 cycles, or 69 with mode switches. */
+ /* First, calculate x (r4) as the sum of the square of the fractions -
+ the exponent is calculated separately in r3.
+ Then, calculate sqrt(x) for the fraction by reciproot iteration.
+ We get an 7.5 bit inital value using linear approximation with two slopes
+ that are powers of two.
+ x (- [1. .. 2.) y0 := 1.25 - x/4 - tab(x) y (- (0.8 .. 1.0)
+ x (- [2. .. 4.) y0 := 1. - x/8 - tab(x) y (- (0.5 .. 0.8)
+ x is represented with two bits before the point,
+ y with 0 bits before the binary point.
+ Thus, to calculate y0 := 1. - x/8 - tab(x), all you have to do is to shift x
+ right by 1, negate it, and subtract tab(x). */
+
+ /* y1 := 1.5*y0 - 0.5 * (x * y0) * (y0 * y0)
+ z0 := x * y1
+ z1 := z0 + 0.5 * (y1 - (y1*y1) * z0) */
+
+ mov.l LOCAL(xff000000),r1
+ add r4,r4
+ mov r4,r0
+ add r5,r5
+ cmp/hs r5,r4
+ sub r5,r0
+ mov #-24,r2
+ bf/s LOCAL(r5_large)
+ shad r2,r0
+ mov r4,r3
+ shll8 r4
+ rotcr r4
+ tst #0xe0,r0
+ neg r0,r0
+ bt LOCAL(ret_abs_r3)
+ tst r1,r5
+ shll8 r5
+ bt/s LOCAL(denorm_r5)
+ cmp/hi r3,r1
+ dmulu.l r4,r4
+ bf LOCAL(inf_nan)
+ rotcr r5
+ shld r0,r5
+LOCAL(denorm_r5_done):
+ sts mach,r4
+ dmulu.l r5,r5
+ mov.l r6,@-r15
+ mov #20,r6
+
+ sts mach,r5
+LOCAL(add_frac):
+ mova LOCAL(tab)-32,r0
+ mov.l r7,@-r15
+ mov.w LOCAL(x1380),r7
+ and r1,r3
+ addc r5,r4
+ mov.w LOCAL(m25),r2 ! -25
+ bf LOCAL(frac_ok)
+ sub r1,r3
+ rotcr r4
+ cmp/eq r1,r3 ! did we generate infinity ?
+ bt LOCAL(inf_nan)
+ shlr r4
+ mov r4,r1
+ shld r2,r1
+ mov.b @(r0,r1),r0
+ mov r4,r1
+ shld r6,r1
+ bra LOCAL(frac_low2)
+ sub r1,r7
+
+LOCAL(frac_ok):
+ mov r4,r1
+ shld r2,r1
+ mov.b @(r0,r1),r1
+ cmp/pz r4
+ mov r4,r0
+ bt/s LOCAL(frac_low)
+ shld r6,r0
+ mov.w LOCAL(xf80),r7
+ shlr r0
+LOCAL(frac_low):
+ sub r0,r7
+LOCAL(frac_low2):
+ mov.l LOCAL(x40000080),r0 ! avoid denorm results near 1. << r3
+ sub r1,r7 ! {0.12}
+ mov.l LOCAL(xfffe0000),r5 ! avoid rounding overflow near 4. << r3
+ swap.w r7,r1 ! {0.28}
+ dmulu.l r1,r4 /* two issue cycles */
+ mulu.w r7,r7 /* two issue cycles */
+ sts mach,r2 ! {0.26}
+ mov r1,r7
+ shlr r1
+ sts macl,r6 ! {0.24}
+ cmp/hi r0,r4
+ shlr2 r2
+ bf LOCAL(near_one)
+ shlr r2 ! {0.23} systemic error of linear approximation keeps y1 < 1
+ dmulu.l r2,r6
+ cmp/hs r5,r4
+ add r7,r1 ! {1.28}
+ bt LOCAL(near_four)
+ shlr2 r1 ! {1.26}
+ sts mach,r0 ! {0.15} x*y0^3 == {0.16} 0.5*x*y0^3
+ shlr2 r1 ! {1.24}
+ shlr8 r1 ! {1.16}
+ sett ! compensate for truncation of subtrahend, keep y1 < 1
+ subc r0,r1 ! {0.16} y1; max error about 3.5 ulp
+ swap.w r1,r0
+ dmulu.l r0,r4 ! { 1.30 }
+ mulu.w r1,r1
+ sts mach,r2
+ shlr2 r0
+ sts macl,r1
+ add r2,r0
+ mov.l LOCAL(xff000000),r6
+ add r2,r0
+ dmulu.l r1,r2
+ add #127,r0
+ add r6,r3 ! precompensation for adding leading 1
+ sts mach,r1
+ shlr r3
+ mov.l @r15+,r7
+ sub r1,r0 ! {0.31} max error about 50 ulp (+127)
+ mov.l @r15+,r6
+ shlr8 r0 ! {0.23} max error about 0.7 ulp
+ rts
+ add r3,r0
+
+LOCAL(r5_large):
+ mov r5,r3
+ mov #-31,r2
+ cmp/ge r2,r0
+ shll8 r5
+ bf LOCAL(ret_abs_r3)
+ rotcr r5
+ tst r1,r4
+ shll8 r4
+ bt/s LOCAL(denorm_r4)
+ cmp/hi r3,r1
+ dmulu.l r5,r5
+ bf LOCAL(inf_nan)
+ rotcr r4
+LOCAL(denorm_r4_done):
+ shld r0,r4
+ sts mach,r5
+ dmulu.l r4,r4
+ mov.l r6,@-r15
+ mov #20,r6
+ bra LOCAL(add_frac)
+ sts mach,r4
+
+LOCAL(near_one):
+ bra LOCAL(assemble_sqrt)
+ mov #0,r0
+LOCAL(near_four):
+ ! exact round-to-nearest would add 255. We add 256 for speed & compactness.
+ mov r4,r0
+ shlr8 r0
+ add #1,r0
+ tst r0,r0
+ addc r0,r3 ! might generate infinity.
+LOCAL(assemble_sqrt):
+ mov.l @r15+,r7
+ shlr r3
+ mov.l @r15+,r6
+ rts
+ add r3,r0
+LOCAL(inf_nan):
+LOCAL(ret_abs_r3):
+ mov r3,r0
+ rts
+ shlr r0
+LOCAL(denorm_r5):
+ bf LOCAL(inf_nan)
+ tst r1,r4
+ bt LOCAL(denorm_both)
+ dmulu.l r4,r4
+ bra LOCAL(denorm_r5_done)
+ shld r0,r5
+LOCAL(denorm_r4):
+ bf LOCAL(inf_nan)
+ tst r1,r5
+ dmulu.l r5,r5
+ bf LOCAL(denorm_r4_done)
+LOCAL(denorm_both): ! normalize according to r3.
+ extu.w r3,r2
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r3,r2
+ mov #-8,r2
+ bt 0f
+ tst r1,r3
+ mov #-16,r2
+ bt 0f
+ mov #-24,r2
+0:
+ shld r2,r3
+ mov.l r7,@-r15
+#ifdef __pic__
+ add r0,r3
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r3),r0
+ add #32,r2
+ sub r0,r2
+ shld r2,r4
+ mov r2,r7
+ dmulu.l r4,r4
+ sts.l pr,@-r15
+ mov #1,r3
+ bsr LOCAL(denorm_r5_done)
+ shld r2,r5
+ mov.l LOCAL(x01000000),r1
+ neg r7,r2
+ lds.l @r15+,pr
+ tst r1,r0
+ mov.l @r15+,r7
+ bt 0f
+ add #1,r2
+ sub r1,r0
+0:
+ rts
+ shld r2,r0
+
+LOCAL(m25):
+ .word -25
+LOCAL(x1380):
+ .word 0x1380
+LOCAL(xf80):
+ .word 0xf80
+ .balign 4
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x40000080):
+ .long 0x40000080
+LOCAL(xfffe0000):
+ .long 0xfffe0000
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+
+/*
+double err(double x)
+{
+ return (x < 2. ? 1.25 - x/4. : 1. - x/8.) - 1./sqrt(x);
+}
+
+int
+main ()
+{
+ int i = 0;
+ double x, s, v;
+ double lx, hx;
+
+ s = 1./32.;
+ for (x = 1.; x < 4; x += s, i++)
+ {
+ lx = x;
+ hx = x + s - 1. / (1 << 30);
+ v = 0.5 * (err (lx) + err (hx));
+ printf ("%s% 4d%c",
+ (i & 7) == 0 ? "\t.byte\t" : "",
+ (int)(v * 4096 + 0.5) - 128,
+ (i & 7) == 7 ? '\n' : ',');
+ }
+ return 0;
+} */
+
+ .balign 4
+LOCAL(tab):
+ .byte -113, -84, -57, -33, -11, 8, 26, 41
+ .byte 55, 67, 78, 87, 94, 101, 106, 110
+ .byte 113, 115, 115, 115, 114, 112, 109, 106
+ .byte 101, 96, 91, 84, 77, 69, 61, 52
+ .byte 51, 57, 63, 68, 72, 77, 80, 84
+ .byte 87, 89, 91, 93, 95, 96, 97, 97
+ .byte 97, 97, 97, 96, 95, 94, 93, 91
+ .byte 89, 87, 84, 82, 79, 76, 72, 69
+ .byte 65, 61, 57, 53, 49, 44, 39, 34
+ .byte 29, 24, 19, 13, 8, 2, -4, -10
+ .byte -17, -23, -29, -36, -43, -50, -57, -64
+ .byte -71, -78, -85, -93,-101,-108,-116,-124
+ ENDFUNC(GLOBAL(hypotf))
+#endif /* L_hypotf */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_ANY__ */
Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md (revision 162269)
+++ gcc/config/sh/sh.md (working copy)
@@ -107,6 +107,7 @@ (define_constants [
(DR0_REG 64)
(DR2_REG 66)
(DR4_REG 68)
+ (FR4_REG 68)
(FR23_REG 87)
(TR0_REG 128)
@@ -174,6 +175,16 @@ (define_constants [
(UNSPECV_WINDOW_END 10)
(UNSPECV_CONST_END 11)
(UNSPECV_EH_RETURN 12)
+
+ ;; NaN handling for software floating point:
+ ;; We require one bit specific for a precision to be set in all NaNs,
+ ;; so that we can test them with a not / tst sequence.
+ ;; ??? Ironically, this is the quiet bit for now, because that is the
+ ;; only bit set by __builtin_nan ("").
+ ;; ??? Should really use one bit lower and force it set by using
+ ;; a custom encoding function.
+ (SF_NAN_MASK 0x7fc00000)
+ (DF_NAN_MASK 0x7ff80000)
])
;; -------------------------------------------------------------------------
@@ -615,6 +626,14 @@ (define_insn "cmpeqsi_t"
cmp/eq %1,%0"
[(set_attr "type" "mt_group")])
+(define_insn "fpcmp_i1"
+ [(set (reg:SI T_REG)
+ (match_operator:SI 1 "soft_fp_comparison_operator"
+ [(match_operand 0 "soft_fp_comparison_operand" "r") (const_int 0)]))]
+ "TARGET_SH1_SOFTFP"
+ "tst %0,%0"
+ [(set_attr "type" "mt_group")])
+
(define_insn "cmpgtsi_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1154,9 +1173,9 @@ (define_insn_and_split "*movsicc_umin"
(define_insn "*movsicc_t_false"
[(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
- (if_then_else (eq (reg:SI T_REG) (const_int 0))
- (match_operand:SI 1 "general_movsrc_operand" "r,I08")
- (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+ (if_then_else:SI (eq (reg:SI T_REG) (const_int 0))
+ (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+ (match_operand:SI 2 "arith_reg_operand" "0,0")))]
"TARGET_PRETEND_CMOVE
&& (arith_reg_operand (operands[1], SImode)
|| (immediate_operand (operands[1], SImode)
@@ -1167,9 +1186,9 @@ (define_insn "*movsicc_t_false"
(define_insn "*movsicc_t_true"
[(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
- (if_then_else (ne (reg:SI T_REG) (const_int 0))
- (match_operand:SI 1 "general_movsrc_operand" "r,I08")
- (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+ (if_then_else:SI (ne (reg:SI T_REG) (const_int 0))
+ (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+ (match_operand:SI 2 "arith_reg_operand" "0,0")))]
"TARGET_PRETEND_CMOVE
&& (arith_reg_operand (operands[1], SImode)
|| (immediate_operand (operands[1], SImode)
@@ -6849,6 +6868,50 @@ (define_insn "stuff_delay_slot"
;; Conditional branch insns
+(define_expand "cmpun_sdf"
+ [(unordered (match_operand 0 "" "") (match_operand 1 "" ""))]
+ ""
+ "
+{
+ HOST_WIDE_INT mask;
+ switch (GET_MODE (operands[0]))
+ {
+ case SFmode:
+ mask = SF_NAN_MASK;
+ break;
+ case DFmode:
+ mask = DF_NAN_MASK;
+ break;
+ default:
+ FAIL;
+ }
+ emit_insn (gen_cmpunsf_i1 (operands[0], operands[1],
+ force_reg (SImode, GEN_INT (mask))));
+ DONE;
+}")
+
+(define_expand "cmpuneq_sdf"
+ [(uneq (match_operand 0 "" "") (match_operand 1 "" ""))]
+ ""
+ "
+{
+ HOST_WIDE_INT mask;
+ switch (GET_MODE (operands[0]))
+ {
+ case SFmode:
+ mask = SF_NAN_MASK;
+ break;
+ case DFmode:
+ mask = DF_NAN_MASK;
+ break;
+ default:
+ FAIL;
+ }
+ emit_insn (gen_cmpuneqsf_i1 (operands[0], operands[1],
+ force_reg (SImode, GEN_INT (mask))));
+ DONE;
+}")
+
(define_expand "cbranchint4_media"
[(set (pc)
(if_then_else (match_operator 0 "shmedia_cbranch_comparison_operator"
@@ -9394,11 +9457,15 @@ (define_split
(define_expand "cstoresf4"
[(set (match_operand:SI 0 "register_operand" "=r")
(match_operator:SI 1 "sh_float_comparison_operator"
- [(match_operand:SF 2 "arith_operand" "")
- (match_operand:SF 3 "arith_operand" "")]))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ [(match_operand:SF 2 "nonmemory_operand" "")
+ (match_operand:SF 3 "nonmemory_operand" "")]))]
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"if (TARGET_SHMEDIA)
{
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ if (!arith_operand (operands[3], DFmode))
+ operands[3] = copy_to_mode_reg (DFmode, operands[3]);
emit_insn (gen_cstore4_media (operands[0], operands[1],
operands[2], operands[3]));
DONE;
@@ -9407,18 +9474,22 @@ (define_expand "cstoresf4"
if (! currently_expanding_to_rtl)
FAIL;
- sh_emit_compare_and_set (operands, SFmode);
+ sh_expand_float_scc (operands);
DONE;
")
(define_expand "cstoredf4"
[(set (match_operand:SI 0 "register_operand" "=r")
(match_operator:SI 1 "sh_float_comparison_operator"
- [(match_operand:DF 2 "arith_operand" "")
- (match_operand:DF 3 "arith_operand" "")]))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ [(match_operand:DF 2 "nonmemory_operand" "")
+ (match_operand:DF 3 "nonmemory_operand" "")]))]
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"if (TARGET_SHMEDIA)
{
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ if (!arith_operand (operands[3], DFmode))
+ operands[3] = copy_to_mode_reg (DFmode, operands[3]);
emit_insn (gen_cstore4_media (operands[0], operands[1],
operands[2], operands[3]));
DONE;
@@ -9427,7 +9498,7 @@ (define_expand "cstoredf4"
if (! currently_expanding_to_rtl)
FAIL;
- sh_emit_compare_and_set (operands, DFmode);
+ sh_expand_float_scc (operands);
DONE;
")
@@ -9765,7 +9836,7 @@ (define_expand "addsf3"
[(set (match_operand:SF 0 "arith_reg_operand" "")
(plus:SF (match_operand:SF 1 "arith_reg_operand" "")
(match_operand:SF 2 "arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH2E)
@@ -9773,6 +9844,12 @@ (define_expand "addsf3"
expand_sf_binop (&gen_addsf3_i, operands);
DONE;
}
+ else if (TARGET_OSFP)
+ {
+ expand_sfunc_binop (SFmode, &gen_addsf3_i3, \"__addsf3\", PLUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*addsf3_media"
@@ -9871,6 +9948,22 @@ (define_insn_and_split "binary_sf_op1"
}"
[(set_attr "type" "fparith_media")])
+(define_insn "addsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (plus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R6_REG))
+ (clobber (reg:SI R7_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "addsf3_i"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(plus:SF (match_operand:SF 1 "fp_arith_reg_operand" "%0")
@@ -9885,7 +9978,7 @@ (define_expand "subsf3"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "")
(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
(match_operand:SF 2 "fp_arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH2E)
@@ -9893,6 +9986,12 @@ (define_expand "subsf3"
expand_sf_binop (&gen_subsf3_i, operands);
DONE;
}
+ else if (TARGET_OSFP)
+ {
+ expand_sfunc_binop (SFmode, &gen_subsf3_i3, \"__subsf3\", MINUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*subsf3_media"
@@ -9903,6 +10002,23 @@ (define_insn "*subsf3_media"
"fsub.s %1, %2, %0"
[(set_attr "type" "fparith_media")])
+(define_insn "subsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (minus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R5_REG))
+ (clobber (reg:SI R6_REG))
+ (clobber (reg:SI R7_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "subsf3_i"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")
@@ -9915,10 +10031,15 @@ (define_insn "subsf3_i"
(define_expand "mulsf3"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "")
- (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
- (match_operand:SF 2 "fp_arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
- "")
+ (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+ (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
+ "if (TARGET_SH1_SOFTFP_MODE (SFmode))
+ {
+ expand_sfunc_binop (SFmode, &gen_mulsf3_i3, \"__mulsf3\", MULT,
+ operands);
+ DONE;
+ }")
(define_insn "*mulsf3_media"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
@@ -9959,6 +10080,22 @@ (define_insn "mulsf3_i4"
[(set_attr "type" "fp")
(set_attr "fp_mode" "single")])
+(define_insn "mulsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (mult:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "mac_media"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(plus:SF (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -10119,6 +10256,149 @@ (define_insn "*fixsfsi"
"ftrc %1,%0"
[(set_attr "type" "fp")])
+(define_insn "cmpnesf_i1"
+ [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+ (compare:CC_FP_NE (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtsf_i1"
+ [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+ (compare:CC_FP_GT (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltsf_i1"
+ [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+ (compare:CC_FP_UNLT (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqsf_i1_finite"
+ [(set (reg:SI T_REG)
+ (eq:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+ (clobber (match_scratch:SI 2 "=0,1,?r"))]
+ "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+ "*
+{
+ if (which_alternative == 0)
+ output_asm_insn (\"cmp/eq\t%0,%1\;or\t%1,%2\;bt\t0f\", operands);
+ else if (which_alternative == 1)
+ output_asm_insn (\"cmp/eq\t%0,%1\;or\t%0,%2\;bt\t0f\", operands);
+ else
+ output_asm_insn (\"cmp/eq\t%0,%1\;mov\t%0,%2\;bt\t0f\;or\t%1,%2\",
+ operands);
+ return \"add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+ [(set_attr "length" "10,10,12")])
+
+(define_insn "cmplesf_i1_finite"
+ [(set (reg:SI T_REG)
+ (le:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+ (clobber (match_scratch:SI 2 "=0,1,r"))]
+ "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+ "*
+{
+ output_asm_insn (\"cmp/pz\t%0\", operands);
+ if (which_alternative == 2)
+ output_asm_insn (\"mov\t%0,%2\", operands);
+ if (TARGET_SH2)
+ output_asm_insn (\"bf/s\t0f\;cmp/hs\t%1,%0\;cmp/ge\t%0,%1\", operands);
+ else
+ output_asm_insn (\"bt\t1f\;bra\t0f\;cmp/hs\t%1,%0\\n1:\tcmp/ge\t%0,%1\",
+ operands);
+ if (which_alternative == 1)
+ output_asm_insn (\"or\t%0,%2\", operands);
+ else
+ output_asm_insn (\"or\t%1,%2\", operands);
+ return \"bt\t0f\;add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+ [(set_attr "length" "18,18,20")])
+
+(define_insn "cmpunsf_i1"
+ [(set (reg:SI T_REG)
+ (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+ (clobber (match_scratch:SI 3 "=0,&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+ [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqsf_i1"
+ [(set (reg:SI T_REG)
+ (uneq:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "*
+{
+ output_asm_insn (\"not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\", operands);
+ output_asm_insn (\"bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%0,%1\", operands);
+ output_asm_insn (\"mov\t%0,%3\;bt\t0f\;or\t%1,%3\", operands);
+ return \"add\t%3,%3\;tst\t%3,%3\\n0:\";
+}"
+ [(set_attr "length" "24")])
+
+(define_insn "movcc_fp_ne"
+ [(set (match_operand:CC_FP_NE 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_NE 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_gt"
+ [(set (match_operand:CC_FP_GT 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_GT 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_unlt"
+ [(set (match_operand:CC_FP_UNLT 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_UNLT 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
(define_insn "cmpgtsf_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
@@ -10146,6 +10426,22 @@ (define_insn "ieee_ccmpeqsf_t"
"* return output_ieee_ccmpeq (insn, operands);"
[(set_attr "length" "4")])
+(define_insn "*cmpltgtsf_t"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+ "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")])
+
+(define_insn "*cmporderedsf_t"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+ "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")])
+
(define_insn "cmpgtsf_t_i4"
[(set (reg:SI T_REG)
@@ -10178,6 +10474,26 @@ (define_insn "*ieee_ccmpeqsf_t_4"
[(set_attr "length" "4")
(set_attr "fp_mode" "single")])
+(define_insn "*cmpltgtsf_t_4"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "single")])
+
+(define_insn "*cmporderedsf_t_4"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "single")])
+
(define_insn "cmpeqsf_media"
[(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_operand:SF 1 "fp_arith_reg_operand" "f")
@@ -10213,18 +10529,24 @@ (define_insn "cmpunsf_media"
(define_expand "cbranchsf4"
[(set (pc)
(if_then_else (match_operator 0 "sh_float_comparison_operator"
- [(match_operand:SF 1 "arith_operand" "")
- (match_operand:SF 2 "arith_operand" "")])
+ [(match_operand:SF 1 "nonmemory_operand" "")
+ (match_operand:SF 2 "nonmemory_operand" "")])
(match_operand 3 "" "")
(pc)))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SHMEDIA)
- emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
- operands[3]));
+ {
+ if (!arith_operand (operands[1], SFmode))
+ operands[1] = copy_to_mode_reg (SFmode, operands[1]);
+ if (!arith_operand (operands[2], SFmode))
+ operands[2] = copy_to_mode_reg (SFmode, operands[2]);
+ emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+ operands[2], operands[3]));
+ }
else
- sh_emit_compare_and_branch (operands, SFmode);
+ sh_expand_float_cbranch (operands);
DONE;
}")
@@ -10426,11 +10748,39 @@ (define_insn "abssf2_i"
[(set_attr "type" "fmove")
(set_attr "fp_mode" "single")])
+(define_expand "abssc2"
+ [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+ (abs:SF (match_operand:SC 1 "fp_arith_reg_operand" "")))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "
+{
+ expand_sfunc_unop (SCmode, &gen_abssc2_i3, \"__hypotf\", ABS, operands);
+ DONE;
+}")
+
+(define_insn "abssc2_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (abs:SF (reg:SC R4_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI R5_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "adddf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(plus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
(match_operand:DF 2 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_FPU_DOUBLE || TARGET_SH3"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10438,6 +10788,12 @@ (define_expand "adddf3"
expand_df_binop (&gen_adddf3_i, operands);
DONE;
}
+ else if (TARGET_SH3)
+ {
+ expand_sfunc_binop (DFmode, &gen_adddf3_i3_wrap, \"__adddf3\", PLUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*adddf3_media"
@@ -10458,6 +10814,30 @@ (define_insn "adddf3_i"
[(set_attr "type" "dfp_arith")
(set_attr "fp_mode" "double")])
+(define_expand "adddf3_i3_wrap"
+ [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+ "TARGET_SH3"
+ "
+{
+ emit_insn (gen_adddf3_i3 (operands[1]));
+ emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+ DONE;
+}")
+
+(define_insn "adddf3_i3"
+ [(set (reg:DF R0_REG)
+ (plus:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:DI R2_REG))
+ (clobber (reg:DF R4_REG))
+ (clobber (reg:DF R6_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH3"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "subdf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(minus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10494,7 +10874,7 @@ (define_expand "muldf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(mult:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
(match_operand:DF 2 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_FPU_DOUBLE || TARGET_SH3"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10502,6 +10882,12 @@ (define_expand "muldf3"
expand_df_binop (&gen_muldf3_i, operands);
DONE;
}
+ else if (TARGET_SH3)
+ {
+ expand_sfunc_binop (DFmode, &gen_muldf3_i3_wrap, \"__muldf3\", MULT,
+ operands);
+ DONE;
+ }
}")
(define_insn "*muldf3_media"
@@ -10522,6 +10908,32 @@ (define_insn "muldf3_i"
[(set_attr "type" "dfp_mul")
(set_attr "fp_mode" "double")])
+(define_expand "muldf3_i3_wrap"
+ [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+ "TARGET_SH3"
+ "
+{
+ emit_insn (gen_muldf3_i3 (operands[1]));
+ emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+ DONE;
+}")
+
+(define_insn "muldf3_i3"
+ [(set (reg:DF R0_REG)
+ (mult:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:DI R2_REG))
+ (clobber (reg:DF R4_REG))
+ (clobber (reg:DF R6_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH3"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "divdf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(div:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10651,6 +11063,73 @@ (define_insn "fix_truncdfsi2_i"
;; (use (match_dup 2))])
;; (set (match_dup 0) (reg:SI FPUL_REG))])
+(define_insn "cmpnedf_i1"
+ [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+ (compare:CC_FP_NE (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtdf_i1"
+ [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+ (compare:CC_FP_GT (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltdf_i1"
+ [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+ (compare:CC_FP_UNLT (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqdf_i1_finite"
+ [(set (reg:SI T_REG)
+ (eq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (clobber (match_scratch:SI 2 "=&r"))]
+ "TARGET_SH1_SOFTFP && flag_finite_math_only"
+ "cmp/eq\t%R0,%R1\;mov\t%S0,%2\;bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;or\t%S1,%2\;add\t%2,%2\;or\t%R0,%2\;tst\t%2,%2\\n0:"
+ [(set_attr "length" "18")])
+
+(define_insn "cmpundf_i1"
+ [(set (reg:SI T_REG)
+ (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
+ (match_operand:DF 1 "arith_reg_operand" "r,r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+ (clobber (match_scratch:SI 3 "=0,&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+ [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqdf_i1"
+ [(set (reg:SI T_REG)
+ (uneq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
+ "TARGET_SH1_SOFTFP"
+ "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%R0,%R1\; bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;mov\t%S0,%3\;or\t%S1,%3\;add\t%3,%3\;or\t%R0,%3\;tst\t%3,%3\\n0:"
+ [(set_attr "length" "30")])
+
(define_insn "cmpgtdf_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:DF 0 "arith_reg_operand" "f")
@@ -10682,6 +11161,26 @@ (define_insn "*ieee_ccmpeqdf_t"
[(set_attr "length" "4")
(set_attr "fp_mode" "double")])
+(define_insn "*cmpltgtdf_t"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+ (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_DOUBLE"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "double")])
+
+(define_insn "*cmpordereddf_t_4"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+ (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "double")])
+
(define_insn "cmpeqdf_media"
[(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_operand:DF 1 "fp_arith_reg_operand" "f")
@@ -10717,18 +11216,24 @@ (define_insn "cmpundf_media"
(define_expand "cbranchdf4"
[(set (pc)
(if_then_else (match_operator 0 "sh_float_comparison_operator"
- [(match_operand:DF 1 "arith_operand" "")
- (match_operand:DF 2 "arith_operand" "")])
+ [(match_operand:DF 1 "nonmemory_operand" "")
+ (match_operand:DF 2 "nonmemory_operand" "")])
(match_operand 3 "" "")
(pc)))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SHMEDIA)
- emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
- operands[3]));
+ {
+ if (!arith_operand (operands[1], DFmode))
+ operands[1] = copy_to_mode_reg (DFmode, operands[1]);
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+ operands[2], operands[3]));
+ }
else
- sh_emit_compare_and_branch (operands, DFmode);
+ sh_expand_float_cbranch (operands);
DONE;
}")
@@ -10823,7 +11328,7 @@ (define_insn "absdf2_i"
(define_expand "extendsfdf2"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(float_extend:DF (match_operand:SF 1 "fpul_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10832,6 +11337,18 @@ (define_expand "extendsfdf2"
get_fpscr_rtx ()));
DONE;
}
+ else if (TARGET_SH2E)
+ {
+ expand_sfunc_unop (SFmode, &gen_extendsfdf2_i2e, \"__extendsfdf2\",
+ FLOAT_EXTEND, operands);
+ DONE;
+ }
+ else if (TARGET_SH1)
+ {
+ expand_sfunc_unop (SFmode, &gen_extendsfdf2_i1, \"__extendsfdf2\",
+ FLOAT_EXTEND, operands);
+ DONE;
+ }
}")
(define_insn "*extendsfdf2_media"
@@ -10850,16 +11367,94 @@ (define_insn "extendsfdf2_i4"
[(set_attr "type" "fp")
(set_attr "fp_mode" "double")])
+;; ??? In order to use this efficiently, we'd have to have an extra
+;; register class for r0 and r1 - and that would cause repercussions in
+;; register allocation elsewhere. So just say we clobber r0 / r1, and
+;; that we can use an arbitrary target. */
+(define_insn_and_split "extendsfdf2_i1"
+ [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+ (float_extend:DF (reg:SF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (reg:DF R0_REG))]
+ "emit_insn (gen_extendsfdf2_i1_r0 (operands[1]));"
+ [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i1_r0"
+ [(set (reg:DF R0_REG) (float_extend:DF (reg:SF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "extendsfdf2_i2e"
+ [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+ (float_extend:DF (reg:SF FR4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI FPUL_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (reg:DF R0_REG))]
+ "emit_insn (gen_extendsfdf2_i2e_r0 (operands[1]));"
+ [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i2e_r0"
+ [(set (reg:DF R0_REG) (float_extend:DF (reg:SF FR4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI FPUL_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "truncdfsf2"
[(set (match_operand:SF 0 "fpul_operand" "")
- (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
- "
-{
+ (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
+ "
+{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
{
emit_df_insn (gen_truncdfsf2_i4 (operands[0], operands[1],
- get_fpscr_rtx ()));
+ get_fpscr_rtx ()));
+ DONE;
+ }
+ else if (TARGET_SH2E)
+ {
+ expand_sfunc_unop (DFmode, &gen_truncdfsf2_i2e, \"__truncdfsf2\",
+ FLOAT_TRUNCATE, operands);
+ DONE;
+ }
+ else if (TARGET_SH1)
+ {
+ expand_sfunc_unop (DFmode, &gen_truncdfsf2_i1, \"__truncdfsf2\",
+ FLOAT_TRUNCATE, operands);
DONE;
}
}")
@@ -10879,6 +11474,37 @@ (define_insn "truncdfsf2_i4"
"fcnvds %1,%0"
[(set_attr "type" "fp")
(set_attr "fp_mode" "double")])
+
+(define_insn "truncdfsf2_i1"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (float_truncate:SF (reg:DF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "truncdfsf2_i2e"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=w")
+ (float_truncate:SF (reg:DF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI FPUL_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
;; Bit field extract patterns. These give better code for packed bitfields,
;; because they allow auto-increment addresses to be generated.