This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
divdf3 (Was: Re: RFC: Handling of libgcc symbols in SH shared libraries)
- From: amylaar at spamcop dot net (Joern Rennecke)
- To: kumar107 at rediffmail dot com
- Cc: joern dot rennecke at superh dot com (Joern Rennecke), gcc-patches at gcc dot gnu dot org
- Date: Sat, 14 Aug 2004 20:39:42 +0100 (BST)
- Subject: divdf3 (Was: Re: RFC: Handling of libgcc symbols in SH shared libraries)
Your divdf3 implementation is also rather slow. You should be able
to do the division in about 100 cycles by proper use of the multplier.
I've written an outline and a start below. Note that the gaps between
dmulu.l / sts / intructions fed by sts are a good place to put checks for
infinity and argument 0 being zero or denormal, to do some more shift
operations, and preparing the exponents.
/* Copyright (C) 2004 Free Software Foundation, Inc.
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
In addition to the permissions in the GNU General Public License, the
Free Software Foundation gives you unlimited permission to link the
compiled version of this file into combinations with other programs,
and to distribute those combinations without any restriction coming
from the use of this file. (The General Public License restrictions
do apply in other respects; for example, they cover modification of
the file, and distribution when not linked into a combine
executable.)
This file is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; see the file COPYING. If not, write to
the Free Software Foundation, 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
! divdf3 for the Renesas / SuperH SH CPUs.
! Algorithm and start contributed by Joern Rennecke joern.rennecke@superh.com
/* y = 1/x ; x (- [1,2)
y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 1/256
y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
y2 = y1 - ((y1) * x - 1) * y0 = y-x*d^4
z0 = y2*x ; x1 = x - z0*x /* 32 * 64 bit */
z1 = y2*x1 (round to nearest odd 0.5 ulp);
x2 = x1 - z1*x
z = x/y = z0 + z1 - 0.5 ulp + (x2 > 0) * ulp
Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
with suitable scaling and/or top truncation.
x truncated to 20 bits is sufficient to calculate y0 or even y1.
Table entries are adjusted by +128 to use full signed byte range. */
GLOBAL(divdf3):
mov.l LOCAL(x7ff00000),t0
mov #12,t1
mov DBL1H,x_h
shld t1,x_h
tst t0,DBL1H
mov #26,t2
mov x_h,t3
shld t2,t3
mova tab,r0
mov.l LOCAL(x70000000),yn ! (1.5 << 32) - (0x80 << 21)
mov #21,t4
mov.b @(r0,t3),r0
sub x_h,yn
bt LOCAL(zero_denorm_arg1)
shld t4,r0
sub r0,yn ! yn := y0
dmulu.l yn,x !
sts mach,y0x
add y0x,y0x ! remove leading one
dmulu.l y0x,yn
sts mach,d0
sub d0,yn ! yn := y1
dmulu.l yn,x
sts mach,y0x
add y0x,y0x ! remove leading one
dmulu.l y0x,yn
sts mach,d0
sub d0,yn ! yn := y2