[PATCH, i386]: Fix PR target/29852: Use fprem and fprem1 insns for SSE math
Uros Bizjak
ubizjak@gmail.com
Wed Nov 29 22:12:00 GMT 2006
Hello!
This patch implements fmod and remainder intrinsics using x87
instructions also for SSE math. In order to shorten truncation sequences
and x87->SSE reg reloads, truncxfsf2_mixed and truncxfdf2_mixed patterns
have to be enabled also for non-mixed SSE/387 math.
The testcase from PR:
double foo(double a, double b)
{
double x = fmod(a, 1.1);
return x + b;
}
compiles for x86_64 target to (-O2 -mno-math-errno for clarity):
movsd %xmm0, -16(%rsp)
fldl -16(%rsp)
fldl .LC0(%rip)
fxch %st(1)
.L2:
fprem
fnstsw %ax
testb $4, %ah
jne .L2
fstp %st(1)
fstpl -8(%rsp) <<- this is the truncation insn
movsd -8(%rsp), %xmm0
addsd %xmm1, %xmm0
ret
As shown in the PR, this patch executed synthetic fmod() testcase more
than 4 times faster than unpatched gcc and almost 2 times faster than icc.
2006-11-29 Uros Bizjak <ubizjak@gmail.com>
PR target/29852
config/i386/i386.md (*truncxfsf2_mixed, *truncxfdf2_mixed): Enable
insn patterns for TARGET_80387.
(*truncxfsf2_i387, *truncxfdf2_i387): Remove.
(*truncxfsf2_i387_1): Rename to *truncxfsf2_i387.
(*truncxfdf2_i387_1): Rename to *truncxfdf2_i387.
(fmod<mode>3, remainder<mode>3): Enable expaders for SSE math.
Generate truncxf<mode>2 insn patterns for strict SSE math.
Patch was bootstrapped on x86_64-pc-linux-gnu and regression tested for
c, c++ and fortran.
OK for mainline?
Uros.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: i386-fmodsse.diff
Type: text/x-patch
Size: 4619 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20061129/27e66867/attachment.bin>
More information about the Gcc-patches
mailing list