Testcase (cf. https://godbolt.org/g/UoU3zj): using T = float; using To [[gnu::vector_size(32)]] = T; using From [[gnu::vector_size(32)]] = unsigned; #define A2(I) (T)a[I], (T)a[1+I] #define A4(I) A2(I), A2(2+I) #define A8(I) A4(I), A4(4+I) To f(From a) { return To{A8(0)}; } This compiles to: vpand .LC0(%rip), %ymm0, %ymm1 vpsrld $16, %ymm0, %ymm0 vcvtdq2ps %ymm0, %ymm0 vcvtdq2ps %ymm1, %ymm1 vmulps .LC1(%rip), %ymm0, %ymm0 vaddps %ymm0, %ymm1, %ymm0 ret The last vmulps and vaddps can be contracted to vfmadd132ps .LC1(%rip), %ymm1, %ymm0. The same is true for vector_size(16).
Confirmed.
ix86_expand_convert_uns_sisf_sse and ix86_expand_vector_convert_uns_vsivsf should check if FMA exists and expand directly to them instead of doing MULT PLUS seperately.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:ad9fcb961c0705f56907a728c3748c011a0a8048 commit r12-3382-gad9fcb961c0705f56907a728c3748c011a0a8048 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sat Sep 4 07:48:43 2021 -0700 x86: Enable FMA in unsigned SI to SF expanders Enable FMA in scalar/vector unsigned SI to SF expanders. Don't check TARGET_AVX512F which has vcvtusi2ss and vcvtudq2ps instructions. gcc/ PR target/85819 * config/i386/i386-expand.c (ix86_expand_convert_uns_sisf_sse): Enable FMA. (ix86_expand_vector_convert_uns_vsivsf): Likewise. gcc/testsuite/ PR target/85819 * gcc.target/i386/pr85819-1a.c: New test. * gcc.target/i386/pr85819-1b.c: Likewise. * gcc.target/i386/pr85819-2a.c: Likewise. * gcc.target/i386/pr85819-2b.c: Likewise. * gcc.target/i386/pr85819-2c.c: Likewise. * gcc.target/i386/pr85819-3.c: Likewise.
Fixed for GCC 12.