This is the mail archive of the
mailing list for the GCC project.
sin/cos via SSE2, and an alignment bug (was Re: sqrt via SSE2)
> > There's then the issue, though it's probably more one for the next glibc
> > release after gcc-3.1 appears, of whether a sin() implementation using
> > code and a suitable rational-function approximation could get adequate
> > results in less than the 190-or-so cycles that fsin takes: I'm pretty
> > it's possible, even given that the necessary two divides can't take less
> > than 70 ticks and that one might want a table-lookup for argument
> Yes, we need to address this issue eventually.
Annoyingly, whilst I've quickly cobbled together a strategy that ought to
work for sin() -- a degree-4 Pad\'e approximation is accurate to within
7.5e-18 in [0 .. Pi/32], a 64-V2DF lookup table and a trig identity extend
to [0 .. 2*Pi], and I trust that
MOVSD twopi, XMM0
DIVSD XMM0, XMM1 -- divide by 53-bit-precision 2*PI
CVTTPD2DQ XMM1, XMM2 -- round to nearest integer
CVTTDQ2PD XMM2, XMM3 -- bring back to a double
MULSD XMM0, XMM3 -- multiple by 53-bit-precision 2*PI
SUBSD XMM3, XMM1 -- and get the remainder
is good-enough argument reduction for -ffast-math -- the actual
implementation really wants to be written using the SSE2 built-ins which at
present don't exist.
So I'll put that on a back-burner for the moment and continue bug-hunting:
I've got a rather suspicious problem at the moment where the use of unions
containing attribute(("V4SI")) elements either crashes the compiler in
expr.c, or generates code which uses MOVPD on non-16-byte-aligned objects
I'll bring in a more complete report tomorrow if the problem still exists
in -20020218, but for the moment see PR C/5680.