[PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics
Segher Boessenkool
segher@kernel.crashing.org
Thu Oct 7 23:39:06 GMT 2021
On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote:
> No attempt is made to optimize writing the FPSCR (by checking if the new
> value would be the same), other than using lighter weight instructions
> when possible.
__builtin_set_fpscr_rn makes optimised code (using mtfsb[01])
automatically, fwiw.
> Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> convert _mm_ceil* and _mm_floor* into macros. This matches the current
> analogous implementations in config/i386/smmintrin.h.
Hrm. Using function-like macros is begging for trouble, as usual. But
the x86 version does this, so meh.
> +extern __inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_pd (__m128d __A, int __rounding)
> +{
> + __v2df __r;
> + union {
> + double __fr;
> + long long __fpscr;
> + } __enables_save, __fpscr_save;
> +
> + if (__rounding & _MM_FROUND_NO_EXC)
> + {
> + /* Save enabled exceptions, disable all exceptions,
> + and preserve the rounding mode. */
> +#ifdef _ARCH_PWR9
> + __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
The __volatile__ does likely not do what you want. As far as I can see
you do not want one here anyway?
"volatile" does not order asm wrt fp insns, which you likely *do* want.
> + __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };
You put spaces after only some casts, btw? Well maybe I found the one
place you did it wrong, heh :-) And you can avoid having so many parens
by making extra variables -- much more readable.
> + switch (__rounding)
You do not need any of that __ either.
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
"dg-do run" requires vsx_hw, not just vsx_ok. Testing on a machine
without VSX (so before p7) would have shown that, but do you have access
to any? This is one of those things we are only told about a year after
it was added, because no one who tests often does that on so old
hardware :-)
So, okay for trunk (and backports after some burn-in) with that vsx_ok
fixed. That asm needs fixing, but you can do that later.
Thanks!
Segher
More information about the Gcc-patches
mailing list