This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Add support for masked load/store_lanes


On 11/17/2017 02:36 AM, Richard Sandiford wrote:
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> This patch adds support for vectorising groups of IFN_MASK_LOADs
>> and IFN_MASK_STOREs using conditional load/store-lanes instructions.
>> This requires new internal functions to represent the result
>> (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs.
>>
>> The normal IFN_{LOAD,STORE}_LANES functions are const operations
>> that logically just perform the permute: the load or store is
>> encoded as a MEM operand to the call statement.  In contrast,
>> the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of
>> interface as IFN_MASK_{LOAD,STORE}, since the memory is only
>> conditionally accessed.
>>
>> The AArch64 patterns were added as part of the main LD[234]/ST[234] patch.
>>
>> Tested on aarch64-linux-gnu (both with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linux-gnu.  OK to install?
> 
> Here's an updated (and much simpler) version that applies on top of the
> series I just posted to remove vectorizable_mask_load_store.  Tested as
> before.
> 
> Thanks,
> Richard
> 
> 
> 2017-11-17  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document.
> 	(vec_mask_store_lanes@var{m}@var{n}): Likewise.
> 	* optabs.def (vec_mask_load_lanes_optab): New optab.
> 	(vec_mask_store_lanes_optab): Likewise.
> 	* internal-fn.def (MASK_LOAD_LANES): New internal function.
> 	(MASK_STORE_LANES): Likewise.
> 	* internal-fn.c (mask_load_lanes_direct): New macro.
> 	(mask_store_lanes_direct): Likewise.
> 	(expand_mask_load_optab_fn): Handle masked operations.
> 	(expand_mask_load_lanes_optab_fn): New macro.
> 	(expand_mask_store_optab_fn): Handle masked operations.
> 	(expand_mask_store_lanes_optab_fn): New macro.
> 	(direct_mask_load_lanes_optab_supported_p): Likewise.
> 	(direct_mask_store_lanes_optab_supported_p): Likewise.
> 	* tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p
> 	parameter.
> 	(vect_load_lanes_supported): Likewise.
> 	* tree-vect-data-refs.c (strip_conversion): New function.
> 	(can_group_stmts_p): Likewise.
> 	(vect_analyze_data_ref_accesses): Use it instead of checking
> 	for a pair of assignments.
> 	(vect_store_lanes_supported): Take a masked_p parameter.
> 	(vect_load_lanes_supported): Likewise.
> 	* tree-vect-loop.c (vect_analyze_loop_2): Update calls to
> 	vect_store_lanes_supported and vect_load_lanes_supported.
> 	* tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
> 	* tree-vect-stmts.c (get_group_load_store_type): Take a masked_p
> 	parameter.  Don't allow gaps for masked accesses.
> 	Use vect_get_store_rhs.  Update calls to vect_store_lanes_supported
> 	and vect_load_lanes_supported.
> 	(get_load_store_type): Take a masked_p parameter and update
> 	call to get_group_load_store_type.
> 	(vectorizable_store): Update call to get_load_store_type.
> 	Handle IFN_MASK_STORE_LANES.
> 	(vectorizable_load): Update call to get_load_store_type.
> 	Handle IFN_MASK_LOAD_LANES.
> 
> gcc/testsuite/
> 	* gcc.dg/vect/vect-ooo-group-1.c: New test.
> 	* gcc.target/aarch64/sve_mask_struct_load_1.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_1_run.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_2.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_2_run.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_3.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_3_run.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_4.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_5.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_6.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_7.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_load_8.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_store_1.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_store_1_run.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_store_2.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_store_2_run.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_store_3.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_store_3_run.c: Likewise.
> 	* gcc.target/aarch64/sve_mask_struct_store_4.c: Likewise.
> 
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi	2017-11-17 09:06:19.783260344 +0000
> +++ gcc/doc/md.texi	2017-11-17 09:35:23.400133274 +0000
> @@ -4855,6 +4855,23 @@ loads for vectors of mode @var{n}.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{vec_mask_load_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_mask_load_lanes@var{m}@var{n}}
> +Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional
> +mask operand (operand 2) that specifies which elements of the destination
> +vectors should be loaded.  Other elements of the destination
> +vectors are set to zero.  The operation is equivalent to:
> +
> +@smallexample
> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
> +  if (operand2[j])
> +    for (i = 0; i < c; i++)
> +      operand0[i][j] = operand1[j * c + i];
> +@end smallexample
Don't you need to set operand0[i][j] to zero if operand2[j] is zero for
this to be correct?  And if that's the case, don't you need to expose
the set to zero as a side effect?



> +@cindex @code{vec_mask_store_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_mask_store_lanes@var{m}@var{n}}
> +Like @samp{vec_store_lanes@var{m}@var{n}}, but takes an additional
> +mask operand (operand 2) that specifies which elements of the source
> +vectors should be stored.  The operation is equivalent to:
> +
> +@smallexample
> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
> +  if (operand2[j])
> +    for (i = 0; i < c; i++)
> +      operand0[j * c + i] = operand1[i][j];
> +@end smallexample
> +
> +This pattern is not allowed to @code{FAIL}.
Is the asymmetry between loads and stores intentional?  In particular
for loads "Other elements of the destination vectors are set to zero"



> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c	2017-11-17 09:35:23.085133247 +0000
> +++ gcc/tree-vect-data-refs.c	2017-11-17 09:35:23.404633274 +0000
> @@ -2791,6 +2791,62 @@ dr_group_sort_cmp (const void *dra_, con
>    return cmp;
>  }
>  
> +/* If OP is the result of a conversion, return the unconverted value,
> +   otherwise return null.  */
> +
> +static tree
> +strip_conversion (tree op)
> +{
> +  if (TREE_CODE (op) != SSA_NAME)
> +    return NULL_TREE;
> +  gimple *stmt = SSA_NAME_DEF_STMT (op);
> +  if (!is_gimple_assign (stmt)
> +      || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
> +    return NULL_TREE;
> +  return gimple_assign_rhs1 (stmt);
> +}
DO you have any desire to walk back through multiple conversions?
They're only used for masks when comparing if masks are the same, so it
probably doesn't matter in practice if we handle multiple conversions I
guess.

Somehow I know we've got to have an equivalent of this routine lying
around somewhere :-)  Though I don't think it's worth the time to find.

Not an ACK or NAK.  I'm a bit hung up on the doc issue and how it
potentially impacts how we support this capability.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]