*From*: "Tom Womack" <tom at womack dot net>*To*: "Jan Hubicka" <jh at suse dot cz>*Cc*: <gcc at gcc dot gnu dot org>*Date*: Tue, 19 Feb 2002 13:07:12 -0000*Subject*: sin/cos via SSE2, and an alignment bug (was Re: sqrt via SSE2)*References*: <001b01c1b6ff$f11fe6c0$5637f380@maths.nottingham.ac.uk> <20020217170516.GB14522@atrey.karlin.mff.cuni.cz> <000501c1b934$b36408c0$5637f380@maths.nottingham.ac.uk> <20020219111705.GE24069@atrey.karlin.mff.cuni.cz>

> > There's then the issue, though it's probably more one for the next glibc > > release after gcc-3.1 appears, of whether a sin() implementation using SSE2 > > code and a suitable rational-function approximation could get adequate > > results in less than the 190-or-so cycles that fsin takes: I'm pretty sure > > it's possible, even given that the necessary two divides can't take less > > than 70 ticks and that one might want a table-lookup for argument reduction. > > Yes, we need to address this issue eventually. Annoyingly, whilst I've quickly cobbled together a strategy that ought to work for sin() -- a degree-4 Pad\'e approximation is accurate to within 7.5e-18 in [0 .. Pi/32], a 64-V2DF lookup table and a trig identity extend to [0 .. 2*Pi], and I trust that MOVSD twopi, XMM0 DIVSD XMM0, XMM1 -- divide by 53-bit-precision 2*PI CVTTPD2DQ XMM1, XMM2 -- round to nearest integer CVTTDQ2PD XMM2, XMM3 -- bring back to a double MULSD XMM0, XMM3 -- multiple by 53-bit-precision 2*PI SUBSD XMM3, XMM1 -- and get the remainder is good-enough argument reduction for -ffast-math -- the actual implementation really wants to be written using the SSE2 built-ins which at present don't exist. So I'll put that on a back-burner for the moment and continue bug-hunting: I've got a rather suspicious problem at the moment where the use of unions containing attribute(("V4SI")) elements either crashes the compiler in expr.c, or generates code which uses MOVPD on non-16-byte-aligned objects and segfaults. I'll bring in a more complete report tomorrow if the problem still exists in -20020218, but for the moment see PR C/5680. Tom

