This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: mul + div with 64 bit signed ints on IA32

To: dewar at gnat dot com
Subject: Re: mul + div with 64 bit signed ints on IA32
From: Frank Klemm <pfk at fuchs dot offl dot uni-jena dot de>
Date: Tue, 4 Sep 2001 23:37:22 +0200
>Received: (from pfk@localhost)by fuchs.offl.uni-jena.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id XAA12892;Tue, 4 Sep 2001 23:37:22 +0200
Cc: gcc at gcc dot gnu dot org
References: <20010904205028.3F12DF2B62@nile.gnat.com>

On Tue, Sep 04, 2001 at 04:50:28PM -0400, dewar@gnat.com wrote:
>
> <<Another floating point problem are the rounding bits of the FPU.
> It should be forced that these two bits are always '11' (round to zero).
> This would decrease code size and speed up significantly the code.
> >>
> 
> Surely you jest?
> 
> Round to zero (otherwise known as truncation) has much nastier properties
> than round to nearest. Almost always round to nearest should be the
> default.
>
Timings are measured on my Athlon.

a)

Default rounding method of C and C++ for floating point and integer division is
rounding to zero.

    (int)+1.23 = +1
    (int)+2.99 = +2
    (int)-1.23 = -1
    (int)-2.99 = -2
    +17 / 9    = +1
    -17 / 9    = -1   (C99)

b)

Code without this proposal:

        fldl    variable
        fistpl  __tmp
	movl    __tmp, %eax

 3 clocks, 9 bytes (variable and __tmp are on the stack, includes load of
the float and load of the result into a CPU register)

        fldl    variable
        fnstcw  __tmp
        movl    __tmp, %reg
        movb    $12, __tmp+1
        fldcw   __tmp
        movl    %reg, __tmp
        fistpl  __tmp2
        fldcw   __tmp
        movl    __tmp2, %eax

51 clocks, 28 bytes (variable and __tmp are on the stack, includes load of
the float and load of the result into a CPU register)


c)
    floor(), ceil(), round() and rint() are clean, they are not changing the
    RC flags.

d)
	int64_t  u;
	int64_t  v;
	int64_t  w;
	int32_t  x;
	int32_t  y;
	int32_t  z;

	x = y * z;

	movl	y, %eax
	imull	z, %eax
	movl	%eax, x

	w = (int64_t) x * y;

	fildl	x
	fimul	y
	fistpll w

	u = v / w;

	fildll	v
	fildll	w
	fdiv
	fistpll u

	u = v / x;

	fildll	v
	fidivl	x
	fistpll u

	u = v / x + y;

	fildll	v
	fidivl	x
	fiaddl	y
	fistpll u


	This is also faster and much shorter than the current solution.
	uint64_t are also possible, but more difficult. a % b is also
	possible, but also much more difficult.

e)
	What rounding is _good_ and what is _bad_ depends what you are want
	to do. If you have US$ 100 and you want to by something for US$ 17.50
	it is not wise to get  round(100/17.5) = 6  items, because you can't
        pay that (17.5*6 = 105).

-- 
Frank Klemm

References:
- Re: mul + div with 64 bit signed ints on IA32
  - From: dewar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]