This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, rs6000] 2/3 Add x86 SSE <xmmintrin.h> intrinsics to GCC PPC64LE taget
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>, David Edelsohn <dje dot gcc at gmail dot com>
- Date: Fri, 18 Aug 2017 18:50:44 -0500
- Subject: Re: [PATCH, rs6000] 2/3 Add x86 SSE <xmmintrin.h> intrinsics to GCC PPC64LE taget
- Authentication-results: sourceware.org; auth=none
- References: <1502915740.16102.62.camel@oc7878010663> <20170817052841.GH13471@gate.crashing.org> <1503020434.7915.65.camel@oc7878010663>
On Thu, Aug 17, 2017 at 08:40:34PM -0500, Steven Munroe wrote:
> > > +/* Convert the lower SPFP value to a 32-bit integer according to the current
> > > + rounding mode. */
> > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > +_mm_cvtss_si32 (__m128 __A)
> > > +{
> > > + __m64 res = 0;
> > > +#ifdef _ARCH_PWR8
> > > + __m128 vtmp;
> > > + __asm__(
> > > + "xxsldwi %x1,%x2,%x2,3;\n"
> > > + "xscvspdp %x1,%x1;\n"
> > > + "fctiw %1,%1;\n"
> > > + "mfvsrd %0,%x1;\n"
> > > + : "=r" (res),
> > > + "=&wi" (vtmp)
> > > + : "wa" (__A)
> > > + : );
> > > +#endif
> > > + return (res);
> > > +}
> >
> > Maybe it could do something better than return the wrong answer for non-p8?
>
> Ok this gets tricky. Before _ARCH_PWR8 the vector to scalar transfer
> would go through storage. But that is not the worst of it.
Float to int conversion goes trough storage on older systems, too.
> The semantic of cvtss requires rint or llrint. But __builtin_rint will
> generate a call to libm unless we assert -ffast-math.
Yeah, we should fix that some day. If we can.
> And we don't have
> builtins to generate fctiw/fctid directly.
Yup. Well, __builtin_rint*, but that currently calls out to libm.
> So I will add the #else using __builtin_rint if that libm dependency is
> ok (this will pop in the DG test for older machines.
Another option is to not support this intrinsic for < POWER8.
I don't have a big (or well-informed) opinion on which it best; but I
doubt always returning 0 is the best we can do ;-)
Segher