This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Add support for fully-predicated loops

From: James Greenhalgh <james dot greenhalgh at arm dot com>
To: Jeff Law <law at redhat dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "richard dot sandiford at linaro dot org" <richard dot sandiford at linaro dot org>, <nd at arm dot com>
Date: Sun, 7 Jan 2018 17:08:53 +0000
Subject: Re: Add support for fully-predicated loops
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=pass (sender IP is 217.140.96.140) smtp.mailfrom=arm.com; linaro.org; dkim=none (message not signed) header.d=none;linaro.org; dmarc=bestguesspass action=none header.from=arm.com;
Nodisclaimer: True
References: <87po8hymvt.fsf@linaro.org> <a2b1204e-f4a5-3f1a-0184-3ca82d788a6c@redhat.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

On Mon, Dec 18, 2017 at 07:40:00PM +0000, Jeff Law wrote:
> On 11/17/2017 07:56 AM, Richard Sandiford wrote:
> > This patch adds support for using a single fully-predicated loop instead
> > of a vector loop and a scalar tail.  An SVE WHILELO instruction generates
> > the predicate for each iteration of the loop, given the current scalar
> > iv value and the loop bound.  This operation is wrapped up in a new internal
> > function called WHILE_ULT.  E.g.:
> > 
> >    WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
> >    WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }
> > 
> > The third WHILE_ULT argument is needed to make the operation
> > unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
> > seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
> > different numbers of elements.
> > 
> > Note that the patch uses "mask" and "fully-masked" instead of
> > "predicate" and "fully-predicated", to follow existing GCC terminology.
> > 
> > This patch just handles the simple cases, punting for things like
> > reductions and live-out values.  Later patches remove most of these
> > restrictions.
> > 
> > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> > and powerpc64le-linux-gnu.  OK to install?
> > 
> > Richard
> > 
> > 
> > 2017-11-17  Richard Sandiford  <richard.sandiford@linaro.org>
> > 	    Alan Hayward  <alan.hayward@arm.com>
> > 	    David Sherwood  <david.sherwood@arm.com>
> > 
> > gcc/
> > 	* optabs.def (while_ult_optab): New optab.
> > 	* doc/md.texi (while_ult@var{m}@var{n}): Document.
> > 	* internal-fn.def (WHILE_ULT): New internal function.
> > 	* internal-fn.h (direct_internal_fn_supported_p): New override
> > 	that takes two types as argument.
> > 	* internal-fn.c (while_direct): New macro.
> > 	(expand_while_optab_fn): New function.
> > 	(convert_optab_supported_p): Likewise.
> > 	(direct_while_optab_supported_p): New macro.
> > 	* wide-int.h (wi::udiv_ceil): New function.
> > 	* tree-vectorizer.h (rgroup_masks): New structure.
> > 	(vec_loop_masks): New typedef.
> > 	(_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
> > 	and fully_masked_p.
> > 	(LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
> > 	(LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
> > 	(vect_max_vf): New function.
> > 	(slpeel_make_loop_iterate_ntimes): Delete.
> > 	(vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
> > 	(vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
> > 	)vect_record_loop_mask, vect_get_loop_mask): Likewise.
> > 	* tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
> > 	internal-fn.h, stor-layout.h and optabs-query.h.
> > 	(vect_set_loop_mask): New function.
> > 	(add_preheader_seq): Likewise.
> > 	(add_header_seq): Likewise.
> > 	(vect_maybe_permute_loop_masks): Likewise.
> > 	(vect_set_loop_masks_directly): Likewise.
> > 	(vect_set_loop_condition_masked): Likewise.
> > 	(vect_set_loop_condition_unmasked): New function, split out from
> > 	slpeel_make_loop_iterate_ntimes.
> > 	(slpeel_make_loop_iterate_ntimes): Rename to..
> > 	(vect_set_loop_condition): ...this.  Use vect_set_loop_condition_masked
> > 	for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
> > 	(vect_do_peeling): Update call accordingly.
> > 	(vect_gen_vector_loop_niters): Use VF as the step for fully-masked
> > 	loops.
> > 	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> > 	mask_compare_type, can_fully_mask_p and fully_masked_p.
> > 	(release_vec_loop_masks): New function.
> > 	(_loop_vec_info): Use it to free the loop masks.
> > 	(can_produce_all_loop_masks_p): New function.
> > 	(vect_get_max_nscalars_per_iter): Likewise.
> > 	(vect_verify_full_masking): Likewise.
> > 	(vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
> > 	retries, and free the mask rgroups before retrying.  Check loop-wide
> > 	reasons for disallowing fully-masked loops.  Make the final decision
> > 	about whether use a fully-masked loop or not.
> > 	(vect_estimate_min_profitable_iters): Do not assume that peeling
> > 	for the number of iterations will be needed for fully-masked loops.
> > 	(vectorizable_reduction): Disable fully-masked loops.
> > 	(vectorizable_live_operation): Likewise.
> > 	(vect_halve_mask_nunits): New function.
> > 	(vect_double_mask_nunits): Likewise.
> > 	(vect_record_loop_mask): Likewise.
> > 	(vect_get_loop_mask): Likewise.
> > 	(vect_transform_loop): Handle the case in which the final loop
> > 	iteration might handle a partial vector.  Call vect_set_loop_condition
> > 	instead of slpeel_make_loop_iterate_ntimes.
> > 	* tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
> > 	(check_load_store_masking): New function.
> > 	(prepare_load_store_mask): Likewise.
> > 	(vectorizable_store): Handle fully-masked loops.
> > 	(vectorizable_load): Likewise.
> > 	(supportable_widening_operation): Use vect_halve_mask_nunits for
> > 	booleans.
> > 	(supportable_narrowing_operation): Likewise vect_double_mask_nunits.
> > 	(vect_gen_while): New function.
> > 	* config/aarch64/aarch64.md (umax<mode>3): New expander.
> > 	(aarch64_uqdec<mode>): New insn.
> > 	* config/aarch64/aarch64-sve.md (<perm_optab>_<mode>)
> > 	(*aarch64_sve_<perm_insn><perm_hilo><mode>): New predicate patterns.
> > 
> > gcc/testsuite/
> > 	* gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization.
> > 	* gcc.dg/tree-ssa/peel1.c: Likewise.
> > 	* gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for
> > 	variable-length vectors.
> > 	* gcc.target/aarch64/sve_vcond_6.c: XFAIL test for AND.
> > 	* gcc.target/aarch64/sve_vec_bool_cmp_1.c: Expect BIC instead of NOT.
> > 	* gcc.target/aarch64/sve_slp_1.c: Check for a fully-masked loop.
> > 	* gcc.target/aarch64/sve_slp_2.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_3.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_4.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_6.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_8.c: New test.
> > 	* gcc.target/aarch64/sve_slp_8_run.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_9.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_9_run.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_10.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_10_run.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_11.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_11_run.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_12.c: Likewise.
> > 	* gcc.target/aarch64/sve_slp_12_run.c: Likewise.
> > 	* gcc.target/aarch64/sve_ld1r_2.c: Likewise.
> > 	* gcc.target/aarch64/sve_ld1r_2_run.c: Likewise.
> > 	* gcc.target/aarch64/sve_while_1.c: Likewise.
> > 	* gcc.target/aarch64/sve_while_2.c: Likewise.
> > 	* gcc.target/aarch64/sve_while_3.c: Likewise.
> > 	* gcc.target/aarch64/sve_while_4.c: Likewise.
> Like other SVE related patches, I haven't looked at the aarch64 specific
> bits, just the generic bits.
> 
> Sadly, I'm totally lost on this one....   I understand at a 30000ft
> level what you're trying to do and many of the low level primitives made
> sense.  But I wasn't able to go from those primitives to the higher
> level implementation details, even though the higher level
> implementation details didn't seem all that large.
> 
> I trust your judgment on this stuff.
> 
> OK for the trunk.

The AArch64 bits are OK.

Thanks,
James

Follow-Ups:
- Re: Add support for fully-predicated loops
  - From: Christophe Lyon

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]