Handle peeling for alignment with masking
James Greenhalgh
james.greenhalgh@arm.com
Sun Jan 7 20:54:00 GMT 2018
On Thu, Dec 14, 2017 at 12:12:01AM +0000, Jeff Law wrote:
> On 11/17/2017 08:13 AM, Richard Sandiford wrote:
> > This patch adds support for aligning vectors by using a partial
> > first iteration. E.g. if the start pointer is 3 elements beyond
> > an aligned address, the first iteration will have a mask in which
> > the first three elements are false.
> >
> > On SVE, the optimisation is only useful for vector-length-specific
> > code. Vector-length-agnostic code doesn't try to align vectors
> > since the vector length might not be a power of 2.
> >
> > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> > and powerpc64le-linux-gnu. OK to install?
> >
> > Richard
> >
> >
> > 2017-11-17 Richard Sandiford <richard.sandiford@linaro.org>
> > Alan Hayward <alan.hayward@arm.com>
> > David Sherwood <david.sherwood@arm.com>
> >
> > gcc/
> > * tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
> > (LOOP_VINFO_MASK_SKIP_NITERS): New macro.
> > (vect_use_loop_mask_for_alignment_p): New function.
> > (vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
> > * tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
> > niters_skip argument. Make sure that the first niters_skip elements
> > of the first iteration are inactive.
> > (vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
> > Update call to vect_set_loop_masks_directly.
> > (get_misalign_in_elems): New function, split out from...
> > (vect_gen_prolog_loop_niters): ...here.
> > (vect_update_init_of_dr): Take a code argument that specifies whether
> > the adjustment should be added or subtracted.
> > (vect_update_init_of_drs): Likewise.
> > (vect_prepare_for_masked_peels): New function.
> > (vect_do_peeling): Skip prologue peeling if we're using a mask
> > instead. Update call to vect_update_inits_of_drs.
> > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> > mask_skip_niters.
> > (vect_analyze_loop_2): Allow fully-masked loops with peeling for
> > alignment. Do not include the number of peeled iterations in
> > the minimum threshold in that case.
> > (vectorizable_induction): Adjust the start value down by
> > LOOP_VINFO_MASK_SKIP_NITERS iterations.
> > (vect_transform_loop): Call vect_prepare_for_masked_peels.
> > Take the number of skipped iterations into account when calculating
> > the loop bounds.
> > * tree-vect-stmts.c (vect_gen_while_not): New function.
> OK.
> jeff
The AArch64 tests are OK, but:
> > Index: gcc/testsuite/gcc.target/aarch64/sve_peel_ind_2_run.c
> > ===================================================================
> > --- /dev/null 2017-11-14 14:28:07.424493901 +0000
> > +++ gcc/testsuite/gcc.target/aarch64/sve_peel_ind_2_run.c 2017-11-17 15:11:51.121849349 +0000
> > @@ -0,0 +1,18 @@
> > +/* { dg-do run { target aarch64_sve_hw } } */
> > +/* { dg-options "-O3 -march=armv8-a+sve -mtune=thunderx" } */
> > +/* { dg-options "-O3 -march=armv8-a+sve -mtune=thunderx -msve-vector-bits=256" { target aarch64_sve256_hw } } */
> > +
I'd put the comment from sve_peel_ind_2.c as to why we have
the -mtune=thunderx here too.
Thanks,
James
More information about the Gcc-patches
mailing list