[PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

Paul A. Clarke pc@us.ibm.com
Fri Oct 8 01:04:23 GMT 2021


On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote:
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible.
> 
> __builtin_set_fpscr_rn makes optimised code (using mtfsb[01])
> automatically, fwiw.
> 
> > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > analogous implementations in config/i386/smmintrin.h.
> 
> Hrm.  Using function-like macros is begging for trouble, as usual.  But
> the x86 version does this, so meh.
> 
> > +extern __inline __m128d
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_pd (__m128d __A, int __rounding)
> > +{
> > +  __v2df __r;
> > +  union {
> > +    double __fr;
> > +    long long __fpscr;
> > +  } __enables_save, __fpscr_save;
> > +
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +    {
> > +      /* Save enabled exceptions, disable all exceptions,
> > +	 and preserve the rounding mode.  */
> > +#ifdef _ARCH_PWR9
> > +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> 
> The __volatile__ does likely not do what you want.  As far as I can see
> you do not want one here anyway?
> 
> "volatile" does not order asm wrt fp insns, which you likely *do* want.

Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
has no effect at all (6.47.1):

| The optional volatile qualifier has no effect. All basic asm blocks are
| implicitly volatile.

So, it could be removed without concern.

> > +  __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };
> 
> You put spaces after only some casts, btw?  Well maybe I found the one
> place you did it wrong, heh :-)  And you can avoid having so many parens
> by making extra variables -- much more readable.

I'll fix this.

> > +  switch (__rounding)
> 
> You do not need any of that __ either.

I'm surprised that I don't. A .h file needs to be concerned about the
namespace it inherits, no?

> > +/* { dg-do run } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> 
> "dg-do run" requires vsx_hw, not just vsx_ok.  Testing on a machine
> without VSX (so before p7) would have shown that, but do you have access
> to any?  This is one of those things we are only told about a year after
> it was added, because no one who tests often does that on so old
> hardware :-)
> 
> So, okay for trunk (and backports after some burn-in) with that vsx_ok
> fixed.  That asm needs fixing, but you can do that later.

OK.

Thanks!

PC


More information about the Gcc-patches mailing list