#include <experimental/simd> namespace stdx = std::experimental; auto f(stdx::native_simd<int> a, stdx::native_simd_mask<int> k) { ++where(k, a); return a; } With AVX512 this should compile to a bitmask to vectormask conversion with subsequent subtraction: kmovw k0, edi vpbroadcastmw2d zmm1, k0 vpsubd zmm0, zmm0, zmm1 Instead we get: vmovdqa32 zmm1, zmm0 mov eax, 1 kmovw k1, edi vpbroadcastd zmm0, eax vmovdqa32 zmm2, zmm1 vpaddd zmm2{k1}, zmm1, zmm0 vmovdqa32 zmm0, zmm2 Without AVX512 this should compile to a single subtraction: vpsubd ymm0, ymm0, ymm1 Instead we get: mov eax, 1 vmovd xmm2, eax vpbroadcastd ymm2, xmm2 vpaddd ymm2, ymm0, ymm2 vpblendvb ymm0, ymm0, ymm2, ymm1
The optimized AVX512 part was wrong. It should be vpternlogd zmm1, zmm1, zmm1, 0xFF kmovw k1, edi vpsubd zmm0{k1}, zmm0, zmm1
The master branch has been updated by Matthias Kretz <mkretz@gcc.gnu.org>: https://gcc.gnu.org/g:6ce55180d494b616e2e3e68ffedfe9007e42ca06 commit r13-6333-g6ce55180d494b616e2e3e68ffedfe9007e42ca06 Author: Matthias Kretz <m.kretz@gsi.de> Date: Mon Feb 20 16:33:31 2023 +0100 libstdc++: More efficient masked inc-/decrement implementation Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: PR libstdc++/108856 * include/experimental/bits/simd_builtin.h (_SimdImplBuiltin::_S_masked_unary): More efficient implementation of masked inc-/decrement for integers and floats without AVX2. * include/experimental/bits/simd_x86.h (_SimdImplX86::_S_masked_unary): New. Use AVX512 masked subtract builtins for masked inc-/decrement.
The releases/gcc-12 branch has been updated by Matthias Kretz <mkretz@gcc.gnu.org>: https://gcc.gnu.org/g:4452077962d0c327dcb08670ab73f7197be53e91 commit r12-9640-g4452077962d0c327dcb08670ab73f7197be53e91 Author: Matthias Kretz <m.kretz@gsi.de> Date: Mon Feb 20 16:33:31 2023 +0100 libstdc++: More efficient masked inc-/decrement implementation Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: PR libstdc++/108856 * include/experimental/bits/simd_builtin.h (_SimdImplBuiltin::_S_masked_unary): More efficient implementation of masked inc-/decrement for integers and floats without AVX2. * include/experimental/bits/simd_x86.h (_SimdImplX86::_S_masked_unary): New. Use AVX512 masked subtract builtins for masked inc-/decrement. (cherry picked from commit 6ce55180d494b616e2e3e68ffedfe9007e42ca06)
The releases/gcc-11 branch has been updated by Matthias Kretz <mkretz@gcc.gnu.org>: https://gcc.gnu.org/g:7408248888717405a30d9ee01c65aac8839926d2 commit r11-10814-g7408248888717405a30d9ee01c65aac8839926d2 Author: Matthias Kretz <m.kretz@gsi.de> Date: Mon Feb 20 16:33:31 2023 +0100 libstdc++: More efficient masked inc-/decrement implementation Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: PR libstdc++/108856 * include/experimental/bits/simd_builtin.h (_SimdImplBuiltin::_S_masked_unary): More efficient implementation of masked inc-/decrement for integers and floats without AVX2. * include/experimental/bits/simd_x86.h (_SimdImplX86::_S_masked_unary): New. Use AVX512 masked subtract builtins for masked inc-/decrement. (cherry picked from commit 6ce55180d494b616e2e3e68ffedfe9007e42ca06)
Resolved on all branches.