[gcc/devel/omp/gcc-14] [og10] vect: Add target hook to prefer gather/scatter instructions

Fri Jun 28 09:49:08 GMT 2024

https://gcc.gnu.org/g:4abc54b6d6c3129cf4233e49231b1255b236c2be

commit 4abc54b6d6c3129cf4233e49231b1255b236c2be
Author: Julian Brown <julian@codesourcery.com>
Date:   Wed Nov 25 09:08:01 2020 -0800

    [og10] vect: Add target hook to prefer gather/scatter instructions
    
    For AMD GCN, the instructions available for loading/storing vectors are
    always scatter/gather operations (i.e. there are separate addresses for
    each vector lane), so the current heuristic to avoid gather/scatter
    operations with too many elements in get_group_load_store_type is
    counterproductive. Avoiding such operations in that function can
    subsequently lead to a missed vectorization opportunity whereby later
    analyses in the vectorizer try to use a very wide array type which is
    not available on this target, and thus it bails out.
    
    The attached patch adds a target hook to override the "single_element_p"
    heuristic in the function as a target hook, and activates it for GCN. This
    allows much better code to be generated for affected loops.
    
    2021-01-13  Julian Brown  <julian@codesourcery.com>
    
    gcc/
            * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
            documentation hook.
            * doc/tm.texi: Regenerate.
            * target.def (prefer_gather_scatter): Add target hook under vectorizer.
            * tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer
            gather/scatter instructions to scalar/elementwise fallback.
            * config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define
            hook.

Diff:
---
 gcc/ChangeLog.omp      | 11 +++++++++++
 gcc/config/gcn/gcn.cc  |  2 ++
 gcc/doc/tm.texi        |  5 +++++
 gcc/doc/tm.texi.in     |  2 ++
 gcc/target.def         |  8 ++++++++
 gcc/tree-vect-stmts.cc |  9 +++++++--
 6 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 06ee9d83b27..e8ff6483444 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,14 @@
+2021-01-13  Julian Brown  <julian@codesourcery.com>
+
+	* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
+	documentation hook.
+	* doc/tm.texi: Regenerate.
+	* target.def (prefer_gather_scatter): Add target hook under vectorizer.
+	* tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer
+	gather/scatter instructions to scalar/elementwise fallback.
+	* config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define
+	hook.
+
 2021-01-13  Julian Brown  <julian@codesourcery.com>
 
 	* omp-offload.cc (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter.
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index d6531f55190..a247eecd8e8 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -8059,6 +8059,8 @@ gcn_dwarf_register_span (rtx rtl)
   gcn_vector_alignment_reachable
 #undef  TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p
+#undef  TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c8b8b126b24..e64c7541f60 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6482,6 +6482,11 @@ The default is @code{NULL_TREE} which means to not vectorize scatter
 stores.
 @end deftypefn
 
+@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+This hook is set to TRUE if gather loads or scatter stores are cheaper on
+this target than a sequence of elementwise loads or stores.
+@end deftypevr
+
 @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 658e1e63371..645950b12d7 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4309,6 +4309,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_BUILTIN_SCATTER
 
+@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+
 @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
 
 @hook TARGET_SIMD_CLONE_ADJUST
diff --git a/gcc/target.def b/gcc/target.def
index fdad7bbc93e..e4b26a7df3e 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2044,6 +2044,14 @@ all zeros.  GCC can then try to branch around the instruction instead.",
  (unsigned ifn),
  default_empty_mask_is_expensive)
 
+/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\
+we cannot use a contiguous access.  */
+DEFHOOKPOD
+(prefer_gather_scatter,
+ "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\
+this target than a sequence of elementwise loads or stores.",
+ bool, false)
+
 /* Target builtin that implements vector gather operation.  */
 DEFHOOK
 (builtin_gather,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f8d8636b139..a7e33120eda 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2217,9 +2217,14 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
 	 it probably isn't a win to use separate strided accesses based
 	 on nearby locations.  Or, even if it's a win over scalar code,
 	 it might not be a win over vectorizing at a lower VF, if that
-	 allows us to use contiguous accesses.  */
+	 allows us to use contiguous accesses.
+
+	 On some targets (e.g. AMD GCN), always use gather/scatter accesses
+	 here since those are the only types of vector loads/stores available,
+	 and the fallback case of using elementwise accesses is very
+	 inefficient.  */
       if (*memory_access_type == VMAT_ELEMENTWISE
-	  && single_element_p
+	  && (targetm.vectorize.prefer_gather_scatter || single_element_p)
 	  && loop_vinfo
 	  && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
 						 masked_p, gs_info))