This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: mul + div with 64 bit signed ints on IA32
- To: dewar at gnat dot com
- Subject: Re: mul + div with 64 bit signed ints on IA32
- From: Frank Klemm <pfk at fuchs dot offl dot uni-jena dot de>
- Date: Tue, 4 Sep 2001 23:37:22 +0200
- >Received: (from pfk@localhost)by fuchs.offl.uni-jena.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id XAA12892;Tue, 4 Sep 2001 23:37:22 +0200
- Cc: gcc at gcc dot gnu dot org
- References: <20010904205028.3F12DF2B62@nile.gnat.com>
On Tue, Sep 04, 2001 at 04:50:28PM -0400, dewar@gnat.com wrote:
>
> <<Another floating point problem are the rounding bits of the FPU.
> It should be forced that these two bits are always '11' (round to zero).
> This would decrease code size and speed up significantly the code.
> >>
>
> Surely you jest?
>
> Round to zero (otherwise known as truncation) has much nastier properties
> than round to nearest. Almost always round to nearest should be the
> default.
>
Timings are measured on my Athlon.
a)
Default rounding method of C and C++ for floating point and integer division is
rounding to zero.
(int)+1.23 = +1
(int)+2.99 = +2
(int)-1.23 = -1
(int)-2.99 = -2
+17 / 9 = +1
-17 / 9 = -1 (C99)
b)
Code without this proposal:
fldl variable
fistpl __tmp
movl __tmp, %eax
3 clocks, 9 bytes (variable and __tmp are on the stack, includes load of
the float and load of the result into a CPU register)
fldl variable
fnstcw __tmp
movl __tmp, %reg
movb $12, __tmp+1
fldcw __tmp
movl %reg, __tmp
fistpl __tmp2
fldcw __tmp
movl __tmp2, %eax
51 clocks, 28 bytes (variable and __tmp are on the stack, includes load of
the float and load of the result into a CPU register)
c)
floor(), ceil(), round() and rint() are clean, they are not changing the
RC flags.
d)
int64_t u;
int64_t v;
int64_t w;
int32_t x;
int32_t y;
int32_t z;
x = y * z;
movl y, %eax
imull z, %eax
movl %eax, x
w = (int64_t) x * y;
fildl x
fimul y
fistpll w
u = v / w;
fildll v
fildll w
fdiv
fistpll u
u = v / x;
fildll v
fidivl x
fistpll u
u = v / x + y;
fildll v
fidivl x
fiaddl y
fistpll u
This is also faster and much shorter than the current solution.
uint64_t are also possible, but more difficult. a % b is also
possible, but also much more difficult.
e)
What rounding is _good_ and what is _bad_ depends what you are want
to do. If you have US$ 100 and you want to by something for US$ 17.50
it is not wise to get round(100/17.5) = 6 items, because you can't
pay that (17.5*6 = 105).
--
Frank Klemm