108856 – Increment and decrement on std::experimental::where_expression should optimize better

Bug 108856 - Increment and decrement on std::experimental::where_expression should optimize better

Summary: Increment and decrement on std::experimental::where_expression should optimiz...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	libstdc++ (show other bugs)
Version:	13.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Matthias Kretz (Vir)

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2023-02-20 10:23 UTC by Matthias Kretz (Vir)
Modified:	2023-05-25 07:07 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	x86_64--, i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2023-02-20 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matthias Kretz (Vir) 2023-02-20 10:23:05 UTC

#include <experimental/simd>

namespace stdx = std::experimental;

auto f(stdx::native_simd<int> a, stdx::native_simd_mask<int> k)
{
  ++where(k, a);
  return a;
}

With AVX512 this should compile to a bitmask to vectormask conversion with subsequent subtraction:
	kmovw	k0, edi
	vpbroadcastmw2d	zmm1, k0
	vpsubd	zmm0, zmm0, zmm1

Instead we get:
  vmovdqa32 zmm1, zmm0
  mov eax, 1
  kmovw k1, edi
  vpbroadcastd zmm0, eax
  vmovdqa32 zmm2, zmm1
  vpaddd zmm2{k1}, zmm1, zmm0
  vmovdqa32 zmm0, zmm2

Without AVX512 this should compile to a single subtraction:
	vpsubd	ymm0, ymm0, ymm1

Instead we get:
  mov eax, 1
  vmovd xmm2, eax
  vpbroadcastd ymm2, xmm2
  vpaddd ymm2, ymm0, ymm2
  vpblendvb ymm0, ymm0, ymm2, ymm1

Comment 1 Matthias Kretz (Vir) 2023-02-20 14:09:20 UTC

The optimized AVX512 part was wrong. It should be 
	vpternlogd	zmm1, zmm1, zmm1, 0xFF
	kmovw	k1, edi
	vpsubd	zmm0{k1}, zmm0, zmm1

Comment 2 GCC Commits 2023-02-24 18:40:47 UTC

The master branch has been updated by Matthias Kretz <mkretz@gcc.gnu.org>:

https://gcc.gnu.org/g:6ce55180d494b616e2e3e68ffedfe9007e42ca06

commit r13-6333-g6ce55180d494b616e2e3e68ffedfe9007e42ca06
Author: Matthias Kretz <m.kretz@gsi.de>
Date:   Mon Feb 20 16:33:31 2023 +0100

    libstdc++: More efficient masked inc-/decrement implementation
    
    Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/108856
            * include/experimental/bits/simd_builtin.h
            (_SimdImplBuiltin::_S_masked_unary): More efficient
            implementation of masked inc-/decrement for integers and floats
            without AVX2.
            * include/experimental/bits/simd_x86.h
            (_SimdImplX86::_S_masked_unary): New. Use AVX512 masked subtract
            builtins for masked inc-/decrement.

Comment 3 GCC Commits 2023-05-23 10:02:45 UTC

The releases/gcc-12 branch has been updated by Matthias Kretz <mkretz@gcc.gnu.org>:

https://gcc.gnu.org/g:4452077962d0c327dcb08670ab73f7197be53e91

commit r12-9640-g4452077962d0c327dcb08670ab73f7197be53e91
Author: Matthias Kretz <m.kretz@gsi.de>
Date:   Mon Feb 20 16:33:31 2023 +0100

    libstdc++: More efficient masked inc-/decrement implementation
    
    Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/108856
            * include/experimental/bits/simd_builtin.h
            (_SimdImplBuiltin::_S_masked_unary): More efficient
            implementation of masked inc-/decrement for integers and floats
            without AVX2.
            * include/experimental/bits/simd_x86.h
            (_SimdImplX86::_S_masked_unary): New. Use AVX512 masked subtract
            builtins for masked inc-/decrement.
    
    (cherry picked from commit 6ce55180d494b616e2e3e68ffedfe9007e42ca06)

Comment 4 GCC Commits 2023-05-25 07:04:37 UTC

The releases/gcc-11 branch has been updated by Matthias Kretz <mkretz@gcc.gnu.org>:

https://gcc.gnu.org/g:7408248888717405a30d9ee01c65aac8839926d2

commit r11-10814-g7408248888717405a30d9ee01c65aac8839926d2
Author: Matthias Kretz <m.kretz@gsi.de>
Date:   Mon Feb 20 16:33:31 2023 +0100

    libstdc++: More efficient masked inc-/decrement implementation
    
    Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/108856
            * include/experimental/bits/simd_builtin.h
            (_SimdImplBuiltin::_S_masked_unary): More efficient
            implementation of masked inc-/decrement for integers and floats
            without AVX2.
            * include/experimental/bits/simd_x86.h
            (_SimdImplX86::_S_masked_unary): New. Use AVX512 masked subtract
            builtins for masked inc-/decrement.
    
    (cherry picked from commit 6ce55180d494b616e2e3e68ffedfe9007e42ca06)

Comment 5 Matthias Kretz (Vir) 2023-05-25 07:07:44 UTC

Resolved on all branches.