This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

divdf3 (Was: Re: RFC: Handling of libgcc symbols in SH shared libraries)


Your divdf3 implementation is also rather slow.  You should be able
to do the division in about 100 cycles by proper use of the multplier.
I've written an outline and a start below.  Note that the gaps between
dmulu.l / sts / intructions fed by sts are a good place to put checks for
infinity and argument 0 being zero or denormal, to do some more shift
operations, and preparing the exponents.

/* Copyright (C) 2004 Free Software Foundation, Inc.

This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.

In addition to the permissions in the GNU General Public License, the
Free Software Foundation gives you unlimited permission to link the
compiled version of this file into combinations with other programs,
and to distribute those combinations without any restriction coming
from the use of this file.  (The General Public License restrictions
do apply in other respects; for example, they cover modification of
the file, and distribution when not linked into a combine
executable.)

This file is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; see the file COPYING.  If not, write to
the Free Software Foundation, 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.  */

! divdf3 for the Renesas / SuperH SH CPUs.
! Algorithm and start contributed by Joern Rennecke joern.rennecke@superh.com

/* y = 1/x  ; x (- [1,2)
   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 1/256

   y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
   y2 = y1 - ((y1) * x - 1) * y0 = y-x*d^4

   z0 = y2*x ;  x1 = x - z0*x /* 32 * 64 bit */
   z1 = y2*x1 (round to nearest odd 0.5 ulp);
   x2 = x1 - z1*x

   z = x/y = z0 + z1 - 0.5 ulp + (x2 > 0) * ulp

   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
   with suitable scaling and/or top truncation.
   x truncated to 20 bits is sufficient to calculate y0 or even y1.
   Table entries are adjusted by +128 to use full signed byte range.  */


GLOBAL(divdf3):
 mov.l	LOCAL(x7ff00000),t0
 mov	#12,t1
 mov	DBL1H,x_h
 shld	t1,x_h
 tst	t0,DBL1H
 mov	#26,t2
 mov	x_h,t3
 shld	t2,t3
 mova tab,r0
 mov.l LOCAL(x70000000),yn	! (1.5 << 32) - (0x80 << 21)
 mov	#21,t4
 mov.b @(r0,t3),r0
 sub x_h,yn
 bt	LOCAL(zero_denorm_arg1)
 shld t4,r0

 sub r0,yn	! yn := y0

 dmulu.l yn,x	!




 sts mach,y0x




 add y0x,y0x	! remove leading one

 dmulu.l y0x,yn




 sts mach,d0




 sub d0,yn	! yn := y1

 dmulu.l yn,x




 sts mach,y0x




 add y0x,y0x	! remove leading one

 dmulu.l y0x,yn




 sts mach,d0




 sub d0,yn	! yn := y2


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]