Bug 111115 - Failure to vectorize conditional grouped store
Summary: Failure to vectorize conditional grouped store
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: 14.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2023-08-23 09:50 UTC by Richard Biener
Modified: 2023-08-24 09:40 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-08-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2023-08-23 09:50:37 UTC
void foo (float * __restrict x, int *flag)
{
  for (int i = 0; i < 512; ++i)
    {
      if (flag[i])
        {
          float a = x[2*i+0] + 3.f;
          float b = x[2*i+1] + 177.f;
          x[2*i+0] = a;
          x[2*i+1] = b;
        }
    }
}

fails to vectorize on x86_64 with -march=znver4 (it needs masked stores
enabled by tuning).  This is because we do not support VMAT_CONTIGUOUS_PERMUTE
for either .MASK_LOAD nor .MASK_STORE.  Simply enabling that shows we fail
to properly handle the mask part.

The proper solution is to handle them in SLP which they are not either.
Comment 1 Richard Biener 2023-08-23 12:35:14 UTC
I have a patch.
Comment 2 GCC Commits 2023-08-24 09:40:34 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a1558e9ad856938f165f838733955b331ebbec09

commit r14-3441-ga1558e9ad856938f165f838733955b331ebbec09
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Aug 23 14:28:26 2023 +0200

    tree-optimization/111115 - SLP of masked stores
    
    The following adds the capability to do SLP on .MASK_STORE, I do not
    plan to add interleaving support.
    
            PR tree-optimization/111115
    gcc/
            * tree-vectorizer.h (vect_slp_child_index_for_operand): New.
            * tree-vect-data-refs.cc (can_group_stmts_p): Also group
            .MASK_STORE.
            * tree-vect-slp.cc (arg3_arg2_map): New.
            (vect_get_operand_map): Handle IFN_MASK_STORE.
            (vect_slp_child_index_for_operand): New function.
            (vect_build_slp_tree_1): Handle statements with no LHS,
            masked store ifns.
            (vect_remove_slp_scalar_calls): Likewise.
            * tree-vect-stmts.cc (vect_check_store_rhs): Lookup the
            SLP child corresponding to the ifn value index.
            (vectorizable_store): Likewise for the mask index.  Support
            masked stores.
            (vectorizable_load): Lookup the SLP child corresponding to the
            ifn mask index.
    
    gcc/testsuite/
            * lib/target-supports.exp (check_effective_target_vect_masked_store):
            Supported with check_avx_available.
            * gcc.dg/vect/slp-mask-store-1.c: New testcase.
Comment 3 Richard Biener 2023-08-24 09:40:50 UTC
Fixed for GCC 14.