[PATCH, i386]: Fix PR target/13958:Conversion from unsigned to double is painfully slow on P4
Uros Bizjak
ubizjak@gmail.com
Fri Mar 21 20:52:00 GMT 2008
Hello!
Due to store forwarding penalty (this is how partial memory access is
called nowadays), the code from PR runs "painfully" slow:
--cut here--
unsigned a[2]={1,2};
inline unsigned foo1(int i) { return a[i]; }
int main()
{
double x=0;
int i;
for ( i=0; i<100000000; ++i )
x+=foo1(i%2);
return (int)x;
}
--cut here--
The inner loop is compiled (-O2 -march=pentium4 -malign-double) to:
.L4:
movl %ecx, %eax
andl $1, %eax
movl a(,%eax,4), %eax
xorl %edx, %edx
(*) pushl %edx
(*) pushl %eax
(*) fildll (%esp)
addl $8, %esp
faddp %st, %st(1)
addl $1, %ecx
cmpl $100000000, %ecx
jne .L4
Instructions marked with (*) form partial memory access.
Runtime:
time ./a.out
real 0m0.794s
user 0m0.724s
sys 0m0.000s
Patched gcc creates:
.L4:
movl %edx, %eax
andl $1, %eax
movd a(,%eax,4), %xmm0
movq %xmm0, -16(%ebp)
fildll -16(%ebp)
faddp %st, %st(1)
addl $1, %edx
cmpl $100000000, %edx
jne .L4
time ./a.out
real 0m0.123s
user 0m0.124s
sys 0m0.000s
This represents more than 5.8x speedup on what is claimed as:
--quote--
Btw, such conversions are quite common in numerical codes that deal
with uniform grids: the array index can be used as a coordinate (usually
after some trivial scaling). Given that the indices used in libstdc++
are usually of the type size_t the slow conversion can have quite a
negative performance impact.
--unqoute--
I guess that such a speedup comes quite handy. This code prefers DImode
aligned to 8, since we are dealing with real DImode values. H.J. -
should we align DImode values to 8 for TARGET_MMX/TARGET_SSE ?
2008-03-21 Uros Bizjak <ubizjak@gmail.com>
PR target/13958
* config/i386/i386.md ("*floatunssi<mode2>_1"): New pattern with
corresponding post-reload splitters.
("floatunssi<mode>2"): Expand to unsigned_float x87 insn pattern
when x87 FP math is selected.
* config/i386/i386-protos.h (ix86_expand_convert_uns_sixf_sse):
New function prototype.
* config/i386/i386.c (ix86_expand_convert_uns_sixf_sse): New
unreachable function to ease macroization of insn patterns.
The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}, patch is committed to SVN.
RMs, Do we want this patch in 4.3.1, although it isn't strictly a
regression?
Uros.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p.diff.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20080321/c922e916/attachment.txt>
More information about the Gcc-patches
mailing list