Handle peeling for alignment with masking

Sun Jan 7 20:54:00 GMT 2018

On Thu, Dec 14, 2017 at 12:12:01AM +0000, Jeff Law wrote:
> On 11/17/2017 08:13 AM, Richard Sandiford wrote:
> > This patch adds support for aligning vectors by using a partial
> > first iteration.  E.g. if the start pointer is 3 elements beyond
> > an aligned address, the first iteration will have a mask in which
> > the first three elements are false.
> > 
> > On SVE, the optimisation is only useful for vector-length-specific
> > code.  Vector-length-agnostic code doesn't try to align vectors
> > since the vector length might not be a power of 2.
> > 
> > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> > and powerpc64le-linux-gnu.  OK to install?
> > 
> > Richard
> > 
> > 
> > 2017-11-17  Richard Sandiford  <richard.sandiford@linaro.org>
> > 	    Alan Hayward  <alan.hayward@arm.com>
> > 	    David Sherwood  <david.sherwood@arm.com>
> > 
> > gcc/
> > 	* tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
> > 	(LOOP_VINFO_MASK_SKIP_NITERS): New macro.
> > 	(vect_use_loop_mask_for_alignment_p): New function.
> > 	(vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
> > 	* tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
> > 	niters_skip argument.  Make sure that the first niters_skip elements
> > 	of the first iteration are inactive.
> > 	(vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
> > 	Update call to vect_set_loop_masks_directly.
> > 	(get_misalign_in_elems): New function, split out from...
> > 	(vect_gen_prolog_loop_niters): ...here.
> > 	(vect_update_init_of_dr): Take a code argument that specifies whether
> > 	the adjustment should be added or subtracted.
> > 	(vect_update_init_of_drs): Likewise.
> > 	(vect_prepare_for_masked_peels): New function.
> > 	(vect_do_peeling): Skip prologue peeling if we're using a mask
> > 	instead.  Update call to vect_update_inits_of_drs.
> > 	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> > 	mask_skip_niters.
> > 	(vect_analyze_loop_2): Allow fully-masked loops with peeling for
> > 	alignment.  Do not include the number of peeled iterations in
> > 	the minimum threshold in that case.
> > 	(vectorizable_induction): Adjust the start value down by
> > 	LOOP_VINFO_MASK_SKIP_NITERS iterations.
> > 	(vect_transform_loop): Call vect_prepare_for_masked_peels.
> > 	Take the number of skipped iterations into account when calculating
> > 	the loop bounds.
> > 	* tree-vect-stmts.c (vect_gen_while_not): New function.
> OK.
> jeff

The AArch64 tests are OK, but:

> > Index: gcc/testsuite/gcc.target/aarch64/sve_peel_ind_2_run.c
> > ===================================================================
> > --- /dev/null	2017-11-14 14:28:07.424493901 +0000
> > +++ gcc/testsuite/gcc.target/aarch64/sve_peel_ind_2_run.c	2017-11-17 15:11:51.121849349 +0000
> > @@ -0,0 +1,18 @@
> > +/* { dg-do run { target aarch64_sve_hw } } */
> > +/* { dg-options "-O3 -march=armv8-a+sve -mtune=thunderx" } */
> > +/* { dg-options "-O3 -march=armv8-a+sve -mtune=thunderx -msve-vector-bits=256" { target aarch64_sve256_hw } } */
> > +

I'd put the comment from sve_peel_ind_2.c as to why we have
the -mtune=thunderx here too.

Thanks,
James