This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
long long / long long
- To: Jan Hubicka <jh at suse dot cz>
- Subject: long long / long long
- From: Frank Klemm <pfk at fuchs dot offl dot uni-jena dot de>
- Date: Sun, 9 Sep 2001 04:02:34 +0200
- >Received: (from pfk@localhost)by fuchs.offl.uni-jena.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id EAA05925;Sun, 9 Sep 2001 04:02:34 +0200
- Cc: gcc at gcc dot gnu dot org
- References: <20010908153112.8DAF1F2B62@nile.gnat.com> <20010908181701.K8451@atrey.karlin.mff.cuni.cz>
---- Code ----------------------------------------------------
.text
.type __divdi3,@function
.global __divdi3
__divdi3:
fildll 12(%esp)
fildll 4(%esp)
subl $12,%esp
movl %esp,%ecx
movw $0x0C00,%ax
fnstcw (%ecx)
orw 0(%ecx),%ax
movw %ax,2(%ecx)
fldcw 2(%ecx)
fdivp
fistpll 4(%ecx)
fldcw 0(%ecx)
movl 4(%esp),%eax
movl 8(%esp),%edx
addl $12,%esp
ret
---- "Benchmark": Duration of a loop of --------------------------
long long x [1000];
long long y [1000];
for (i = 0; i < 1000; i++)
s += x[i] / y[i];
---- results ----------------------------------------------------
Old routine on Athlon:
106 clocks including the a outer loop and storing the arguments on the stack.
This routine on Athlon:
79 clocks including the a outer loop and storing the arguments on the stack.
+ shorter
+ can be inlined
+ sometimes the rounding control switch can be moved avoided by moving it outside a loop
+ faster for a lot of data
- slower for trivial data (?)
- do not work with SSE2 (needs 63 or 64 bit mantissa)
---- optimization -----------------------------------------------
This routine on Athlon after inling and moving fstcw/fldcw outside the loop:
21 clocks including the a outer loop
Interested? Or are 64 bit are uninteresting for benchmarks?
--
Frank Klemm
Still remaining:
long long % long long
long long / long
long long % long
long long / const
long long % const