void foo (float * __restrict x, int *flag)
for (int i = 0; i < 512; ++i)
float a = x[2*i+0] + 3.f;
float b = x[2*i+1] + 177.f;
x[2*i+0] = a;
x[2*i+1] = b;
fails to vectorize on x86_64 with -march=znver4 (it needs masked stores
enabled by tuning). This is because we do not support VMAT_CONTIGUOUS_PERMUTE
for either .MASK_LOAD nor .MASK_STORE. Simply enabling that shows we fail
to properly handle the mask part.
The proper solution is to handle them in SLP which they are not either.
I have a patch.
The master branch has been updated by Richard Biener <email@example.com>:
Author: Richard Biener <firstname.lastname@example.org>
Date: Wed Aug 23 14:28:26 2023 +0200
tree-optimization/111115 - SLP of masked stores
The following adds the capability to do SLP on .MASK_STORE, I do not
plan to add interleaving support.
* tree-vectorizer.h (vect_slp_child_index_for_operand): New.
* tree-vect-data-refs.cc (can_group_stmts_p): Also group
* tree-vect-slp.cc (arg3_arg2_map): New.
(vect_get_operand_map): Handle IFN_MASK_STORE.
(vect_slp_child_index_for_operand): New function.
(vect_build_slp_tree_1): Handle statements with no LHS,
masked store ifns.
* tree-vect-stmts.cc (vect_check_store_rhs): Lookup the
SLP child corresponding to the ifn value index.
(vectorizable_store): Likewise for the mask index. Support
(vectorizable_load): Lookup the SLP child corresponding to the
ifn mask index.
* lib/target-supports.exp (check_effective_target_vect_masked_store):
Supported with check_avx_available.
* gcc.dg/vect/slp-mask-store-1.c: New testcase.
Fixed for GCC 14.