This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 29 Nov 2006 18:18:50 -0000
- Subject: [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
- References: <bug-29852-13404@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #6 from ubizjak at gmail dot com 2006-11-29 18:18 -------
(In reply to comment #5)
> Can we make sure to always emit proper truncation to SF/DFmode if not
> TARGET_MIX_SSE_I387? Just in case two fprem instructions follow each other
> and so we don't truncate by moving to memory or SSE registers. It would be
> bad to let excess precision (aka bug 323) sneak in for fpmath=sse when we
> tell people to use that to prevent excess precision.
We can't make any guarantees about truncation, but ...
... following patch can.
2006-11-29 Uros Bizjak <ubizjak@gmail.com>
PR target/XXX
config/i386/i386.md (*truncxfsf2_mixed, *truncxfdf2_mixed): Enable
patterns for TARGET_80387.
(*truncxfsf2_i387, *truncxfdf2_i387): Remove.
(fmod<mode>3, remainder<mode>3): Enable patterns for SSE math.
Generate truncxf<mode>2 instructions for strict SSE math.
for the testcase:
double test1(double a)
{
double x = fmod(a, 1.1);
return fmod(x, 2.1);
}
patched gcc generates (-fno-math-errno for clarity):
test1:
.LFB2:
movsd %xmm0, -16(%rsp)
fldl -16(%rsp)
fldl .LC0(%rip)
fxch %st(1)
.L2:
fprem
fnstsw %ax
testb $4, %ah
jne .L2
fstp %st(1)
fstpl -8(%rsp)
fldl -8(%rsp)
fldl .LC1(%rip)
fxch %st(1)
.L3:
fprem
fnstsw %ax
testb $4, %ah
jne .L3
fstp %st(1)
fstpl -8(%rsp)
movsd -8(%rsp), %xmm0
ret
.LFE2:
In order to get optimal code, truncxf?f2_mixed patterns have to be enabled,
otherwise reload does its job by moving values again to memory and back. The
patch bootstrapps OK, but it will take over night for a regression test.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852