[Bug target/107748] [13 Regression] Isn't _mm_cvtsbh_ss incorrect?

Fri Nov 18 11:46:40 GMT 2022

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #2)
> float
> _mm_cvtsbh_ss (__bf16 __A)
> {
>   union{ float sf; __bf16 bf[2];} __tmp;
>   __tmp.sf = 0.0f;
>   __tmp.bf[1] = __A;
>   return __tmp.sf;
> }
> 
> Looks like gcc can optimize it to
> 
> _mm_cvtsbh_ss(bool _Accum):
>         movd    %xmm0, %eax
>         sall    $16, %eax
>         movd    %eax, %xmm0
>         ret

That is an option too, but please uglify with __ the sf and bf identifiers
above.
Also, not just for this but more importantly for the __bf16 -> float
conversions
gcc emits for -ffast-math or for cstorebf4 or cbranchcc4, it would be nice if
we optimized those so that if the source and destination are in SSE registers
that we don't convert from SSE to GPR, shift and convert back from GPR to SSE,
while we could do it through some permutation of the SSE register that just
pretends it is a V*HImode and moves the first element to second and zeros the
first (and perhaps all elements above second too, or not, whatever is faster).
Dunno if it could be done as a peephole2, or something different.
Just try:
__attribute__((optimize ("fast-math")))
float foo (__bf16 x) { return x; }
int bar (__bf16 x, __bf16 y) { return x == y; }
void baz (void);
void qux (__bf16 x, __bf16 y) { if (x == y) baz (); }
Oh, and one more thing, for -mavx512bf16 -mavx512vl -ffast-math it would be
nice
to use the AVX512BF16 instruction for float -> __bf16 conversions rather than
library routine.  But that instruction doesn't handle sNaNs properly and
flushes subnormals to 0, so I think we shouldn't do it if HONORS_NANS (BFmode)
or
!flag_unsafe_math_optimizations.