This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Generalise gather and scatter optabs


On Wed, Nov 6, 2019 at 5:06 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> The gather and scatter optabs required the vector offset to be
> the integer equivalent of the vector mode being loaded or stored.
> This patch generalises them so that the two vectors can have different
> element sizes, although they still need to have the same number of
> elements.
>
> One consequence of this is that it's possible (if unlikely)
> for two IFN_GATHER_LOADs to have the same arguments but different
> return types.  E.g. the same scalar base and vector of 32-bit offsets
> could be used to load 8-bit elements and to load 16-bit elements.
> From just looking at the arguments, we could wrongly deduce that
> they're equivalent.
>
> I know we saw this happen at one point with IFN_WHILE_ULT,
> and we dealt with it there by passing a zero of the return type
> as an extra argument.  Doing the same here also makes the load
> and store functions have the same argument assignment.
>
> For now this patch should be a no-op, but later SVE patches take
> advantage of the new flexibility.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-11-06  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * optabs.def (gather_load_optab, mask_gather_load_optab)
>         (scatter_store_optab, mask_scatter_store_optab): Turn into
>         conversion optabs, with the offset mode given explicitly.
>         * doc/md.texi: Update accordingly.
>         * config/aarch64/aarch64-sve-builtins-base.cc
>         (svld1_gather_impl::expand): Likewise.
>         (svst1_scatter_impl::expand): Likewise.
>         * internal-fn.c (gather_load_direct, scatter_store_direct): Likewise.
>         (expand_scatter_store_optab_fn): Likewise.
>         (direct_gather_load_optab_supported_p): Likewise.
>         (direct_scatter_store_optab_supported_p): Likewise.
>         (expand_gather_load_optab_fn): Likewise.  Expect the mask argument
>         to be argument 4.
>         (internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD.
>         (internal_gather_scatter_fn_supported_p): Replace the offset sign
>         argument with the offset vector type.  Require the two vector
>         types to have the same number of elements but allow their element
>         sizes to be different.  Treat the optabs as conversion optabs.
>         * internal-fn.h (internal_gather_scatter_fn_supported_p): Update
>         prototype accordingly.
>         * optabs-query.c (supports_at_least_one_mode_p): Replace with...
>         (supports_vec_convert_optab_p): ...this new function.
>         (supports_vec_gather_load_p): Update accordingly.
>         (supports_vec_scatter_store_p): Likewise.
>         * tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info.
>         Replace the offset sign and bits parameters with a scalar type tree.
>         * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise.
>         Pass back the offset vector type instead of the scalar element type.
>         Allow the offset to be wider than the memory elements.  Search for
>         an offset type that the target supports, stopping once we've
>         reached the maximum of the element size and pointer size.
>         Update call to internal_gather_scatter_fn_supported_p.
>         (vect_check_gather_scatter): Update calls accordingly.
>         When testing a new scale before knowing the final offset type,
>         check whether the scale is supported for any signed or unsigned
>         offset type.  Check whether the target supports the source and
>         target types of a conversion before deciding whether to look
>         through the conversion.  Record the chosen offset_vectype.
>         * tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete.
>         (vect_recog_gather_scatter_pattern): Get the scalar offset type
>         directly from the gs_info's offset_vectype instead.  Pass a zero
>         of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD.
>         * tree-vect-stmts.c (check_load_store_masking): Update call to
>         internal_gather_scatter_fn_supported_p, passing the offset vector
>         type recorded in the gs_info.
>         (vect_truncate_gather_scatter_offset): Update call to
>         vect_check_gather_scatter, leaving it to search for a valid
>         offset vector type.
>         (vect_use_strided_gather_scatters_p): Convert the offset to the
>         element type of the gs_info's offset_vectype.
>         (vect_get_gather_scatter_ops): Get the offset vector type directly
>         from the gs_info.
>         (vect_get_strided_load_store_ops): Likewise.
>         (vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD
>         and IFN_MASK_GATHER_LOAD.
>         * config/aarch64/aarch64-sve.md (gather_load<mode>): Rename to...
>         (gather_load<mode><v_int_equiv>): ...this.
>         (mask_gather_load<mode>): Rename to...
>         (mask_gather_load<mode><v_int_equiv>): ...this.
>         (scatter_store<mode>): Rename to...
>         (scatter_store<mode><v_int_equiv>): ...this.
>         (mask_scatter_store<mode>): Rename to...
>         (mask_scatter_store<mode><v_int_equiv>): ...this.
>
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2019-09-30 17:55:27.403766854 +0100
> +++ gcc/optabs.def      2019-11-06 16:03:37.368360019 +0000
> @@ -91,6 +91,10 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
>  OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
>  OPTAB_CD(maskload_optab, "maskload$a$b")
>  OPTAB_CD(maskstore_optab, "maskstore$a$b")
> +OPTAB_CD(gather_load_optab, "gather_load$a$b")
> +OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
> +OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
> +OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")
>
> @@ -425,11 +429,6 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
>  OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
>  OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
>
> -OPTAB_D (gather_load_optab, "gather_load$a")
> -OPTAB_D (mask_gather_load_optab, "mask_gather_load$a")
> -OPTAB_D (scatter_store_optab, "scatter_store$a")
> -OPTAB_D (mask_scatter_store_optab, "mask_scatter_store$a")
> -
>  OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
>  OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
>  OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2019-11-06 12:29:15.562690117 +0000
> +++ gcc/doc/md.texi     2019-11-06 16:03:37.364360047 +0000
> @@ -4959,12 +4959,12 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}
>
>  This pattern is not allowed to @code{FAIL}.
>
> -@cindex @code{gather_load@var{m}} instruction pattern
> -@item @samp{gather_load@var{m}}
> +@cindex @code{gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{gather_load@var{m}@var{n}}
>  Load several separate memory locations into a vector of mode @var{m}.
> -Operand 1 is a scalar base address and operand 2 is a vector of
> -offsets from that base.  Operand 0 is a destination vector with the
> -same number of elements as the offset.  For each element index @var{i}:
> +Operand 1 is a scalar base address and operand 2 is a vector of mode @var{n}
> +containing offsets from that base.  Operand 0 is a destination vector with
> +the same number of elements as @var{n}.  For each element index @var{i}:
>
>  @itemize @bullet
>  @item
> @@ -4981,20 +4981,20 @@ load the value at that address into elem
>  The value of operand 3 does not matter if the offsets are already
>  address width.
>
> -@cindex @code{mask_gather_load@var{m}} instruction pattern
> -@item @samp{mask_gather_load@var{m}}
> -Like @samp{gather_load@var{m}}, but takes an extra mask operand as
> +@cindex @code{mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>
> -@cindex @code{scatter_store@var{m}} instruction pattern
> -@item @samp{scatter_store@var{m}}
> +@cindex @code{scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> -Operand 0 is a scalar base address and operand 1 is a vector of offsets
> -from that base.  Operand 4 is the vector of values that should be stored,
> -which has the same number of elements as the offset.  For each element
> -index @var{i}:
> +Operand 0 is a scalar base address and operand 1 is a vector of mode
> +@var{n} containing offsets from that base.  Operand 4 is the vector of
> +values that should be stored, which has the same number of elements as
> +@var{n}.  For each element index @var{i}:
>
>  @itemize @bullet
>  @item
> @@ -5011,9 +5011,9 @@ store element @var{i} of operand 4 to th
>  The value of operand 2 does not matter if the offsets are already
>  address width.
>
> -@cindex @code{mask_scatter_store@var{m}} instruction pattern
> -@item @samp{mask_scatter_store@var{m}}
> -Like @samp{scatter_store@var{m}}, but takes an extra mask operand as
> +@cindex @code{mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>
> Index: gcc/config/aarch64/aarch64-sve-builtins-base.cc
> ===================================================================
> --- gcc/config/aarch64/aarch64-sve-builtins-base.cc     2019-10-29 08:59:18.407479604 +0000
> +++ gcc/config/aarch64/aarch64-sve-builtins-base.cc     2019-11-06 16:03:37.348360159 +0000
> @@ -1076,7 +1076,9 @@ public:
>      /* Put the predicate last, as required by mask_gather_load_optab.  */
>      e.rotate_inputs_left (0, 5);
>      machine_mode mem_mode = e.memory_vector_mode ();
> -    insn_code icode = direct_optab_handler (mask_gather_load_optab, mem_mode);
> +    machine_mode int_mode = aarch64_sve_int_mode (mem_mode);
> +    insn_code icode = convert_optab_handler (mask_gather_load_optab,
> +                                            mem_mode, int_mode);
>      return e.use_exact_insn (icode);
>    }
>  };
> @@ -2043,8 +2045,10 @@ public:
>      e.prepare_gather_address_operands (1);
>      /* Put the predicate last, as required by mask_scatter_store_optab.  */
>      e.rotate_inputs_left (0, 6);
> -    insn_code icode = direct_optab_handler (mask_scatter_store_optab,
> -                                           e.memory_vector_mode ());
> +    machine_mode mem_mode = e.memory_vector_mode ();
> +    machine_mode int_mode = aarch64_sve_int_mode (mem_mode);
> +    insn_code icode = convert_optab_handler (mask_scatter_store_optab,
> +                                            mem_mode, int_mode);
>      return e.use_exact_insn (icode);
>    }
>  };
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c   2019-09-12 10:59:55.139303681 +0100
> +++ gcc/internal-fn.c   2019-11-06 16:03:37.368360019 +0000
> @@ -103,11 +103,11 @@ #define not_direct { -2, -2, false }
>  #define mask_load_direct { -1, 2, false }
>  #define load_lanes_direct { -1, -1, false }
>  #define mask_load_lanes_direct { -1, -1, false }
> -#define gather_load_direct { -1, -1, false }
> +#define gather_load_direct { 3, 1, false }
>  #define mask_store_direct { 3, 2, false }
>  #define store_lanes_direct { 0, 0, false }
>  #define mask_store_lanes_direct { 0, 0, false }
> -#define scatter_store_direct { 3, 3, false }
> +#define scatter_store_direct { 3, 1, false }
>  #define unary_direct { 0, 0, true }
>  #define binary_direct { 0, 0, true }
>  #define ternary_direct { 0, 0, true }
> @@ -2785,7 +2785,8 @@ expand_scatter_store_optab_fn (internal_
>        create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
>      }
>
> -  insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)));
> +  insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)),
> +                                          TYPE_MODE (TREE_TYPE (offset)));
>    expand_insn (icode, i, ops);
>  }
>
> @@ -2813,11 +2814,12 @@ expand_gather_load_optab_fn (internal_fn
>    create_integer_operand (&ops[i++], scale_int);
>    if (optab == mask_gather_load_optab)
>      {
> -      tree mask = gimple_call_arg (stmt, 3);
> +      tree mask = gimple_call_arg (stmt, 4);
>        rtx mask_rtx = expand_normal (mask);
>        create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
>      }
> -  insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)));
> +  insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
> +                                          TYPE_MODE (TREE_TYPE (offset)));
>    expand_insn (icode, i, ops);
>  }
>
> @@ -3084,11 +3086,11 @@ #define direct_cond_ternary_optab_suppor
>  #define direct_mask_load_optab_supported_p direct_optab_supported_p
>  #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
> -#define direct_gather_load_optab_supported_p direct_optab_supported_p
> +#define direct_gather_load_optab_supported_p convert_optab_supported_p
>  #define direct_mask_store_optab_supported_p direct_optab_supported_p
>  #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
> -#define direct_scatter_store_optab_supported_p direct_optab_supported_p
> +#define direct_scatter_store_optab_supported_p convert_optab_supported_p
>  #define direct_while_optab_supported_p convert_optab_supported_p
>  #define direct_fold_extract_optab_supported_p direct_optab_supported_p
>  #define direct_fold_left_optab_supported_p direct_optab_supported_p
> @@ -3513,8 +3515,6 @@ internal_fn_mask_index (internal_fn fn)
>        return 2;
>
>      case IFN_MASK_GATHER_LOAD:
> -      return 3;
> -
>      case IFN_MASK_SCATTER_STORE:
>        return 4;
>
> @@ -3546,27 +3546,30 @@ internal_fn_stored_value_index (internal
>     IFN.  For loads, VECTOR_TYPE is the vector type of the load result,
>     while for stores it is the vector type of the stored data argument.
>     MEMORY_ELEMENT_TYPE is the type of the memory elements being loaded
> -   or stored.  OFFSET_SIGN is the sign of the offset argument, which is
> -   only relevant when the offset is narrower than an address.  SCALE is
> -   the amount by which the offset should be multiplied *after* it has
> -   been extended to address width.  */
> +   or stored.  OFFSET_VECTOR_TYPE is the vector type that holds the
> +   offset from the shared base address of each loaded or stored element.
> +   SCALE is the amount by which these offsets should be multiplied
> +   *after* they have been extended to address width.  */
>
>  bool
>  internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type,
>                                         tree memory_element_type,
> -                                       signop offset_sign, int scale)
> +                                       tree offset_vector_type, int scale)
>  {
>    if (!tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (vector_type)),
>                            TYPE_SIZE (memory_element_type)))
>      return false;
> +  if (maybe_ne (TYPE_VECTOR_SUBPARTS (vector_type),
> +               TYPE_VECTOR_SUBPARTS (offset_vector_type)))
> +    return false;
>    optab optab = direct_internal_fn_optab (ifn);
> -  insn_code icode = direct_optab_handler (optab, TYPE_MODE (vector_type));
> +  insn_code icode = convert_optab_handler (optab, TYPE_MODE (vector_type),
> +                                          TYPE_MODE (offset_vector_type));
>    int output_ops = internal_load_fn_p (ifn) ? 1 : 0;
> +  bool unsigned_p = TYPE_UNSIGNED (TREE_TYPE (offset_vector_type));
>    return (icode != CODE_FOR_nothing
> -         && insn_operand_matches (icode, 2 + output_ops,
> -                                  GEN_INT (offset_sign == UNSIGNED))
> -         && insn_operand_matches (icode, 3 + output_ops,
> -                                  GEN_INT (scale)));
> +         && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p))
> +         && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
>  }
>
>  /* Expand STMT as though it were a call to internal function FN.  */
> Index: gcc/internal-fn.h
> ===================================================================
> --- gcc/internal-fn.h   2019-03-08 18:14:26.725006353 +0000
> +++ gcc/internal-fn.h   2019-11-06 16:03:37.368360019 +0000
> @@ -220,7 +220,7 @@ extern bool internal_gather_scatter_fn_p
>  extern int internal_fn_mask_index (internal_fn);
>  extern int internal_fn_stored_value_index (internal_fn);
>  extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
> -                                                   tree, signop, int);
> +                                                   tree, tree, int);
>
>  extern void expand_internal_call (gcall *);
>  extern void expand_internal_call (internal_fn, gcall *);
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2019-11-06 14:02:26.000000000 +0000
> +++ gcc/optabs-query.c  2019-11-06 16:03:37.368360019 +0000
> @@ -698,14 +698,18 @@ lshift_cheap_p (bool speed_p)
>    return cheap[speed_p];
>  }
>
> -/* Return true if optab OP supports at least one mode.  */
> +/* Return true if vector conversion optab OP supports at least one mode,
> +   given that the second mode is always an integer vector.  */
>
>  static bool
> -supports_at_least_one_mode_p (optab op)
> +supports_vec_convert_optab_p (optab op)
>  {
>    for (int i = 0; i < NUM_MACHINE_MODES; ++i)
> -    if (direct_optab_handler (op, (machine_mode) i) != CODE_FOR_nothing)
> -      return true;
> +    if (VECTOR_MODE_P ((machine_mode) i))
> +      for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j)
> +       if (convert_optab_handler (op, (machine_mode) i,
> +                                  (machine_mode) j) != CODE_FOR_nothing)
> +         return true;
>
>    return false;
>  }
> @@ -722,7 +726,7 @@ supports_vec_gather_load_p ()
>    this_fn_optabs->supports_vec_gather_load_cached = true;
>
>    this_fn_optabs->supports_vec_gather_load
> -    = supports_at_least_one_mode_p (gather_load_optab);
> +    = supports_vec_convert_optab_p (gather_load_optab);
>
>    return this_fn_optabs->supports_vec_gather_load;
>  }
> @@ -739,7 +743,7 @@ supports_vec_scatter_store_p ()
>    this_fn_optabs->supports_vec_scatter_store_cached = true;
>
>    this_fn_optabs->supports_vec_scatter_store
> -    = supports_at_least_one_mode_p (scatter_store_optab);
> +    = supports_vec_convert_optab_p (scatter_store_optab);
>
>    return this_fn_optabs->supports_vec_scatter_store;
>  }
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-11-06 14:02:26.000000000 +0000
> +++ gcc/tree-vectorizer.h       2019-11-06 16:03:37.372359991 +0000
> @@ -1678,8 +1678,8 @@ extern opt_result vect_verify_datarefs_a
>  extern bool vect_slp_analyze_and_verify_instance_alignment (slp_instance);
>  extern opt_result vect_analyze_data_ref_accesses (vec_info *);
>  extern opt_result vect_prune_runtime_alias_test_list (loop_vec_info);
> -extern bool vect_gather_scatter_fn_p (bool, bool, tree, tree, unsigned int,
> -                                     signop, int, internal_fn *, tree *);
> +extern bool vect_gather_scatter_fn_p (vec_info *, bool, bool, tree, tree,
> +                                     tree, int, internal_fn *, tree *);
>  extern bool vect_check_gather_scatter (stmt_vec_info, loop_vec_info,
>                                        gather_scatter_info *);
>  extern opt_result vect_find_stmt_data_reference (loop_p, gimple *,
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2019-11-06 12:28:22.000000000 +0000
> +++ gcc/tree-vect-data-refs.c   2019-11-06 16:03:37.368360019 +0000
> @@ -3660,28 +3660,22 @@ vect_prune_runtime_alias_test_list (loop
>  /* Check whether we can use an internal function for a gather load
>     or scatter store.  READ_P is true for loads and false for stores.
>     MASKED_P is true if the load or store is conditional.  MEMORY_TYPE is
> -   the type of the memory elements being loaded or stored.  OFFSET_BITS
> -   is the number of bits in each scalar offset and OFFSET_SIGN is the
> -   sign of the offset.  SCALE is the amount by which the offset should
> +   the type of the memory elements being loaded or stored.  OFFSET_TYPE
> +   is the type of the offset that is being applied to the invariant
> +   base address.  SCALE is the amount by which the offset should
>     be multiplied *after* it has been converted to address width.
>
> -   Return true if the function is supported, storing the function
> -   id in *IFN_OUT and the type of a vector element in *ELEMENT_TYPE_OUT.  */
> +   Return true if the function is supported, storing the function id in
> +   *IFN_OUT and the vector type for the offset in *OFFSET_VECTYPE_OUT.  */
>
>  bool
> -vect_gather_scatter_fn_p (bool read_p, bool masked_p, tree vectype,
> -                         tree memory_type, unsigned int offset_bits,
> -                         signop offset_sign, int scale,
> -                         internal_fn *ifn_out, tree *element_type_out)
> +vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p,
> +                         tree vectype, tree memory_type, tree offset_type,
> +                         int scale, internal_fn *ifn_out,
> +                         tree *offset_vectype_out)
>  {
>    unsigned int memory_bits = tree_to_uhwi (TYPE_SIZE (memory_type));
>    unsigned int element_bits = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> -  if (offset_bits > element_bits)
> -    /* Internal functions require the offset to be the same width as
> -       the vector elements.  We can extend narrower offsets, but it isn't
> -       safe to truncate wider offsets.  */
> -    return false;
> -
>    if (element_bits != memory_bits)
>      /* For now the vector elements must be the same width as the
>         memory elements.  */
> @@ -3694,14 +3688,28 @@ vect_gather_scatter_fn_p (bool read_p, b
>    else
>      ifn = masked_p ? IFN_MASK_SCATTER_STORE : IFN_SCATTER_STORE;
>
> -  /* Test whether the target supports this combination.  */
> -  if (!internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type,
> -                                              offset_sign, scale))
> -    return false;
> +  for (;;)
> +    {
> +      tree offset_vectype = get_vectype_for_scalar_type (vinfo, offset_type);
> +      if (!offset_vectype)
> +       return false;
>
> -  *ifn_out = ifn;
> -  *element_type_out = TREE_TYPE (vectype);
> -  return true;
> +      /* Test whether the target supports this combination.  */
> +      if (internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type,
> +                                                 offset_vectype, scale))
> +       {
> +         *ifn_out = ifn;
> +         *offset_vectype_out = offset_vectype;
> +         return true;
> +       }
> +
> +      if (TYPE_PRECISION (offset_type) >= POINTER_SIZE
> +         && TYPE_PRECISION (offset_type) >= element_bits)
> +       return false;
> +
> +      offset_type = build_nonstandard_integer_type
> +       (TYPE_PRECISION (offset_type) * 2, TYPE_UNSIGNED (offset_type));
> +    }
>  }
>
>  /* STMT_INFO is a call to an internal gather load or scatter store function.
> @@ -3744,7 +3752,7 @@ vect_check_gather_scatter (stmt_vec_info
>    machine_mode pmode;
>    int punsignedp, reversep, pvolatilep = 0;
>    internal_fn ifn;
> -  tree element_type;
> +  tree offset_vectype;
>    bool masked_p = false;
>
>    /* See whether this is already a call to a gather/scatter internal function.
> @@ -3905,13 +3913,18 @@ vect_check_gather_scatter (stmt_vec_info
>             {
>               int new_scale = tree_to_shwi (op1);
>               /* Only treat this as a scaling operation if the target
> -                supports it.  */
> +                supports it for at least some offset type.  */
>               if (use_ifn_p
> -                 && !vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p,
> -                                               vectype, memory_type, 1,
> -                                               TYPE_SIGN (TREE_TYPE (op0)),
> +                 && !vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr),
> +                                               masked_p, vectype, memory_type,
> +                                               signed_char_type_node,
> +                                               new_scale, &ifn,
> +                                               &offset_vectype)
> +                 && !vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr),
> +                                               masked_p, vectype, memory_type,
> +                                               unsigned_char_type_node,
>                                                 new_scale, &ifn,
> -                                               &element_type))
> +                                               &offset_vectype))
>                 break;
>               scale = new_scale;
>               off = op0;
> @@ -3925,6 +3938,16 @@ vect_check_gather_scatter (stmt_vec_info
>           if (!POINTER_TYPE_P (TREE_TYPE (op0))
>               && !INTEGRAL_TYPE_P (TREE_TYPE (op0)))
>             break;
> +
> +         /* Don't include the conversion if the target is happy with
> +            the current offset type.  */
> +         if (use_ifn_p
> +             && vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr),
> +                                          masked_p, vectype, memory_type,
> +                                          TREE_TYPE (off), scale, &ifn,
> +                                          &offset_vectype))
> +           break;
> +
>           if (TYPE_PRECISION (TREE_TYPE (op0))
>               == TYPE_PRECISION (TREE_TYPE (off)))
>             {
> @@ -3932,14 +3955,6 @@ vect_check_gather_scatter (stmt_vec_info
>               continue;
>             }
>
> -         /* The internal functions need the offset to be the same width
> -            as the elements of VECTYPE.  Don't include operations that
> -            cast the offset from that width to a different width.  */
> -         if (use_ifn_p
> -             && (int_size_in_bytes (TREE_TYPE (vectype))
> -                 == int_size_in_bytes (TREE_TYPE (off))))
> -           break;
> -
>           if (TYPE_PRECISION (TREE_TYPE (op0))
>               < TYPE_PRECISION (TREE_TYPE (off)))
>             {
> @@ -3966,10 +3981,9 @@ vect_check_gather_scatter (stmt_vec_info
>
>    if (use_ifn_p)
>      {
> -      if (!vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p, vectype,
> -                                    memory_type, TYPE_PRECISION (offtype),
> -                                    TYPE_SIGN (offtype), scale, &ifn,
> -                                    &element_type))
> +      if (!vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p,
> +                                    vectype, memory_type, offtype, scale,
> +                                    &ifn, &offset_vectype))
>         return false;
>      }
>    else
> @@ -3989,7 +4003,8 @@ vect_check_gather_scatter (stmt_vec_info
>         return false;
>
>        ifn = IFN_LAST;
> -      element_type = TREE_TYPE (vectype);
> +      /* The offset vector type will be read from DECL when needed.  */
> +      offset_vectype = NULL_TREE;
>      }
>
>    info->ifn = ifn;
> @@ -3997,9 +4012,9 @@ vect_check_gather_scatter (stmt_vec_info
>    info->base = base;
>    info->offset = off;
>    info->offset_dt = vect_unknown_def_type;
> -  info->offset_vectype = NULL_TREE;
> +  info->offset_vectype = offset_vectype;
>    info->scale = scale;
> -  info->element_type = element_type;
> +  info->element_type = TREE_TYPE (vectype);
>    info->memory_type = memory_type;
>    return true;
>  }
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c    2019-11-06 14:02:26.000000000 +0000
> +++ gcc/tree-vect-patterns.c    2019-11-06 16:03:37.372359991 +0000
> @@ -4498,28 +4498,6 @@ vect_get_load_store_mask (stmt_vec_info
>    gcc_unreachable ();
>  }
>
> -/* Return the scalar offset type that an internal gather/scatter function
> -   should use.  GS_INFO describes the gather/scatter operation.  */
> -
> -static tree
> -vect_get_gather_scatter_offset_type (gather_scatter_info *gs_info)
> -{
> -  tree offset_type = TREE_TYPE (gs_info->offset);
> -  unsigned int element_bits = tree_to_uhwi (TYPE_SIZE (gs_info->element_type));
> -
> -  /* Enforced by vect_check_gather_scatter.  */
> -  unsigned int offset_bits = TYPE_PRECISION (offset_type);
> -  gcc_assert (element_bits >= offset_bits);
> -
> -  /* If the offset is narrower than the elements, extend it according
> -     to its sign.  */
> -  if (element_bits > offset_bits)
> -    return build_nonstandard_integer_type (element_bits,
> -                                          TYPE_UNSIGNED (offset_type));
> -
> -  return offset_type;
> -}
> -
>  /* Return MASK if MASK is suitable for masking an operation on vectors
>     of type VECTYPE, otherwise convert it into such a form and return
>     the result.  Associate any conversion statements with STMT_INFO's
> @@ -4604,7 +4582,7 @@ vect_recog_gather_scatter_pattern (stmt_
>    /* Get the invariant base and non-invariant offset, converting the
>       latter to the same width as the vector elements.  */
>    tree base = gs_info.base;
> -  tree offset_type = vect_get_gather_scatter_offset_type (&gs_info);
> +  tree offset_type = TREE_TYPE (gs_info.offset_vectype);
>    tree offset = vect_add_conversion_to_pattern (offset_type, gs_info.offset,
>                                                 stmt_info);
>
> @@ -4613,12 +4591,13 @@ vect_recog_gather_scatter_pattern (stmt_
>    gcall *pattern_stmt;
>    if (DR_IS_READ (dr))
>      {
> +      tree zero = build_zero_cst (gs_info.element_type);
>        if (mask != NULL)
> -       pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base,
> -                                                  offset, scale, mask);
> +       pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5, base,
> +                                                  offset, scale, zero, mask);
>        else
> -       pattern_stmt = gimple_build_call_internal (gs_info.ifn, 3, base,
> -                                                  offset, scale);
> +       pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base,
> +                                                  offset, scale, zero);
>        tree load_lhs = vect_recog_temp_ssa_var (gs_info.element_type, NULL);
>        gimple_call_set_lhs (pattern_stmt, load_lhs);
>      }
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-11-06 14:02:26.000000000 +0000
> +++ gcc/tree-vect-stmts.c       2019-11-06 16:03:37.372359991 +0000
> @@ -1910,10 +1910,9 @@ check_load_store_masking (loop_vec_info
>        internal_fn ifn = (is_load
>                          ? IFN_MASK_GATHER_LOAD
>                          : IFN_MASK_SCATTER_STORE);
> -      tree offset_type = TREE_TYPE (gs_info->offset);
>        if (!internal_gather_scatter_fn_supported_p (ifn, vectype,
>                                                    gs_info->memory_type,
> -                                                  TYPE_SIGN (offset_type),
> +                                                  gs_info->offset_vectype,
>                                                    gs_info->scale))
>         {
>           if (dump_enabled_p ())
> @@ -2046,35 +2045,33 @@ vect_truncate_gather_scatter_offset (stm
>        if (!wi::multiple_of_p (wi::to_widest (step), scale, SIGNED, &factor))
>         continue;
>
> -      /* See whether we can calculate (COUNT - 1) * STEP / SCALE
> -        in OFFSET_BITS bits.  */
> +      /* Determine the minimum precision of (COUNT - 1) * STEP / SCALE.  */
>        widest_int range = wi::mul (count, factor, SIGNED, &overflow);
>        if (overflow)
>         continue;
>        signop sign = range >= 0 ? UNSIGNED : SIGNED;
> -      if (wi::min_precision (range, sign) > element_bits)
> -       {
> -         overflow = wi::OVF_UNKNOWN;
> -         continue;
> -       }
> +      unsigned int min_offset_bits = wi::min_precision (range, sign);
>
> -      /* See whether the target supports the operation.  */
> +      /* Find the narrowest viable offset type.  */
> +      unsigned int offset_bits = 1U << ceil_log2 (min_offset_bits);
> +      tree offset_type = build_nonstandard_integer_type (offset_bits,
> +                                                        sign == UNSIGNED);
> +
> +      /* See whether the target supports the operation with an offset
> +        no narrower than OFFSET_TYPE.  */
>        tree memory_type = TREE_TYPE (DR_REF (dr));
> -      if (!vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p, vectype,
> -                                    memory_type, element_bits, sign, scale,
> -                                    &gs_info->ifn, &gs_info->element_type))
> +      if (!vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p,
> +                                    vectype, memory_type, offset_type, scale,
> +                                    &gs_info->ifn, &gs_info->offset_vectype))
>         continue;
>
> -      tree offset_type = build_nonstandard_integer_type (element_bits,
> -                                                        sign == UNSIGNED);
> -
>        gs_info->decl = NULL_TREE;
>        /* Logically the sum of DR_BASE_ADDRESS, DR_INIT and DR_OFFSET,
>          but we don't need to store that here.  */
>        gs_info->base = NULL_TREE;
> +      gs_info->element_type = TREE_TYPE (vectype);
>        gs_info->offset = fold_convert (offset_type, step);
>        gs_info->offset_dt = vect_constant_def;
> -      gs_info->offset_vectype = NULL_TREE;
>        gs_info->scale = scale;
>        gs_info->memory_type = memory_type;
>        return true;
> @@ -2104,22 +2101,12 @@ vect_use_strided_gather_scatters_p (stmt
>      return vect_truncate_gather_scatter_offset (stmt_info, loop_vinfo,
>                                                 masked_p, gs_info);
>
> -  scalar_mode element_mode = SCALAR_TYPE_MODE (gs_info->element_type);
> -  unsigned int element_bits = GET_MODE_BITSIZE (element_mode);
> -  tree offset_type = TREE_TYPE (gs_info->offset);
> -  unsigned int offset_bits = TYPE_PRECISION (offset_type);
> -
> -  /* Enforced by vect_check_gather_scatter.  */
> -  gcc_assert (element_bits >= offset_bits);
> +  tree old_offset_type = TREE_TYPE (gs_info->offset);
> +  tree new_offset_type = TREE_TYPE (gs_info->offset_vectype);
>
> -  /* If the elements are wider than the offset, convert the offset to the
> -     same width, without changing its sign.  */
> -  if (element_bits > offset_bits)
> -    {
> -      bool unsigned_p = TYPE_UNSIGNED (offset_type);
> -      offset_type = build_nonstandard_integer_type (element_bits, unsigned_p);
> -      gs_info->offset = fold_convert (offset_type, gs_info->offset);
> -    }
> +  gcc_assert (TYPE_PRECISION (new_offset_type)
> +             >= TYPE_PRECISION (old_offset_type));
> +  gs_info->offset = fold_convert (new_offset_type, gs_info->offset);
>
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location,
> @@ -2963,7 +2950,6 @@ vect_get_gather_scatter_ops (class loop
>                              gather_scatter_info *gs_info,
>                              tree *dataref_ptr, tree *vec_offset)
>  {
> -  vec_info *vinfo = stmt_info->vinfo;
>    gimple_seq stmts = NULL;
>    *dataref_ptr = force_gimple_operand (gs_info->base, &stmts, true, NULL_TREE);
>    if (stmts != NULL)
> @@ -2973,10 +2959,8 @@ vect_get_gather_scatter_ops (class loop
>        new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
>        gcc_assert (!new_bb);
>      }
> -  tree offset_type = TREE_TYPE (gs_info->offset);
> -  tree offset_vectype = get_vectype_for_scalar_type (vinfo, offset_type);
>    *vec_offset = vect_get_vec_def_for_operand (gs_info->offset, stmt_info,
> -                                             offset_vectype);
> +                                             gs_info->offset_vectype);
>  }
>
>  /* Prepare to implement a grouped or strided load or store using
> @@ -3009,8 +2993,7 @@ vect_get_strided_load_store_ops (stmt_ve
>    /* The offset given in GS_INFO can have pointer type, so use the element
>       type of the vector instead.  */
>    tree offset_type = TREE_TYPE (gs_info->offset);
> -  tree offset_vectype = get_vectype_for_scalar_type (loop_vinfo, offset_type);
> -  offset_type = TREE_TYPE (offset_vectype);
> +  offset_type = TREE_TYPE (gs_info->offset_vectype);
>
>    /* Calculate X = DR_STEP / SCALE and convert it to the appropriate type.  */
>    tree step = size_binop (EXACT_DIV_EXPR, DR_STEP (dr),
> @@ -3019,7 +3002,7 @@ vect_get_strided_load_store_ops (stmt_ve
>    step = force_gimple_operand (step, &stmts, true, NULL_TREE);
>
>    /* Create {0, X, X*2, X*3, ...}.  */
> -  *vec_offset = gimple_build (&stmts, VEC_SERIES_EXPR, offset_vectype,
> +  *vec_offset = gimple_build (&stmts, VEC_SERIES_EXPR, gs_info->offset_vectype,
>                               build_zero_cst (offset_type), step);
>    if (stmts)
>      gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
> @@ -9442,16 +9425,17 @@ vectorizable_load (stmt_vec_info stmt_in
>
>                     if (memory_access_type == VMAT_GATHER_SCATTER)
>                       {
> +                       tree zero = build_zero_cst (vectype);
>                         tree scale = size_int (gs_info.scale);
>                         gcall *call;
>                         if (loop_masks)
>                           call = gimple_build_call_internal
> -                           (IFN_MASK_GATHER_LOAD, 4, dataref_ptr,
> -                            vec_offset, scale, final_mask);
> +                           (IFN_MASK_GATHER_LOAD, 5, dataref_ptr,
> +                            vec_offset, scale, zero, final_mask);
>                         else
>                           call = gimple_build_call_internal
> -                           (IFN_GATHER_LOAD, 3, dataref_ptr,
> -                            vec_offset, scale);
> +                           (IFN_GATHER_LOAD, 4, dataref_ptr,
> +                            vec_offset, scale, zero);
>                         gimple_call_set_nothrow (call, true);
>                         new_stmt = call;
>                         data_ref = NULL_TREE;
> Index: gcc/config/aarch64/aarch64-sve.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-sve.md   2019-10-29 17:01:12.639889324 +0000
> +++ gcc/config/aarch64/aarch64-sve.md   2019-11-06 16:03:37.352360131 +0000
> @@ -1336,7 +1336,7 @@ (define_insn "@aarch64_ldnt1<mode>"
>  ;; -------------------------------------------------------------------------
>
>  ;; Unpredicated gather loads.
> -(define_expand "gather_load<mode>"
> +(define_expand "gather_load<mode><v_int_equiv>"
>    [(set (match_operand:SVE_SD 0 "register_operand")
>         (unspec:SVE_SD
>           [(match_dup 5)
> @@ -1354,7 +1354,7 @@ (define_expand "gather_load<mode>"
>
>  ;; Predicated gather loads for 32-bit elements.  Operand 3 is true for
>  ;; unsigned extension and false for signed extension.
> -(define_insn "mask_gather_load<mode>"
> +(define_insn "mask_gather_load<mode><v_int_equiv>"
>    [(set (match_operand:SVE_S 0 "register_operand" "=w, w, w, w, w, w")
>         (unspec:SVE_S
>           [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl")
> @@ -1376,7 +1376,7 @@ (define_insn "mask_gather_load<mode>"
>
>  ;; Predicated gather loads for 64-bit elements.  The value of operand 3
>  ;; doesn't matter in this case.
> -(define_insn "mask_gather_load<mode>"
> +(define_insn "mask_gather_load<mode><v_int_equiv>"
>    [(set (match_operand:SVE_D 0 "register_operand" "=w, w, w, w")
>         (unspec:SVE_D
>           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl")
> @@ -1395,7 +1395,7 @@ (define_insn "mask_gather_load<mode>"
>  )
>
>  ;; Likewise, but with the offset being sign-extended from 32 bits.
> -(define_insn "*mask_gather_load<mode>_sxtw"
> +(define_insn "*mask_gather_load<mode><v_int_equiv>_sxtw"
>    [(set (match_operand:SVE_D 0 "register_operand" "=w, w")
>         (unspec:SVE_D
>           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
> @@ -1417,7 +1417,7 @@ (define_insn "*mask_gather_load<mode>_sx
>  )
>
>  ;; Likewise, but with the offset being zero-extended from 32 bits.
> -(define_insn "*mask_gather_load<mode>_uxtw"
> +(define_insn "*mask_gather_load<mode><v_int_equiv>_uxtw"
>    [(set (match_operand:SVE_D 0 "register_operand" "=w, w")
>         (unspec:SVE_D
>           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
> @@ -2054,7 +2054,7 @@ (define_insn "@aarch64_stnt1<mode>"
>  ;; -------------------------------------------------------------------------
>
>  ;; Unpredicated scatter stores.
> -(define_expand "scatter_store<mode>"
> +(define_expand "scatter_store<mode><v_int_equiv>"
>    [(set (mem:BLK (scratch))
>         (unspec:BLK
>           [(match_dup 5)
> @@ -2072,7 +2072,7 @@ (define_expand "scatter_store<mode>"
>
>  ;; Predicated scatter stores for 32-bit elements.  Operand 2 is true for
>  ;; unsigned extension and false for signed extension.
> -(define_insn "mask_scatter_store<mode>"
> +(define_insn "mask_scatter_store<mode><v_int_equiv>"
>    [(set (mem:BLK (scratch))
>         (unspec:BLK
>           [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl")
> @@ -2094,7 +2094,7 @@ (define_insn "mask_scatter_store<mode>"
>
>  ;; Predicated scatter stores for 64-bit elements.  The value of operand 2
>  ;; doesn't matter in this case.
> -(define_insn "mask_scatter_store<mode>"
> +(define_insn "mask_scatter_store<mode><v_int_equiv>"
>    [(set (mem:BLK (scratch))
>         (unspec:BLK
>           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl")
> @@ -2113,7 +2113,7 @@ (define_insn "mask_scatter_store<mode>"
>  )
>
>  ;; Likewise, but with the offset being sign-extended from 32 bits.
> -(define_insn_and_rewrite "*mask_scatter_store<mode>_sxtw"
> +(define_insn_and_rewrite "*mask_scatter_store<mode><v_int_equiv>_sxtw"
>    [(set (mem:BLK (scratch))
>         (unspec:BLK
>           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
> @@ -2139,7 +2139,7 @@ (define_insn_and_rewrite "*mask_scatter_
>  )
>
>  ;; Likewise, but with the offset being zero-extended from 32 bits.
> -(define_insn "*mask_scatter_store<mode>_uxtw"
> +(define_insn "*mask_scatter_store<mode><v_int_equiv>_uxtw"
>    [(set (mem:BLK (scratch))
>         (unspec:BLK
>           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]