This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
- From: Richard Biener <rguenther at suse dot de>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- Cc: Marc Glisse <marc dot glisse at inria dot fr>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 5 Jul 2016 12:13:42 +0200 (CEST)
- Subject: Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
- Authentication-results: sourceware.org; auth=none
- References: <5761570E.2020906@foss.arm.com> <alpine.DEB.2.20.1606152318260.6282@laptop-mg.saclay.inria.fr> <57626C14.3050903@foss.arm.com> <alpine.LSU.2.11.1606280952420.29772@t29.fhfr.qr> <5774D79F.3090101@foss.arm.com> <alpine.LSU.2.11.1607011348480.29772@t29.fhfr.qr> <577B7212.6080809@foss.arm.com>
On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>
> On 01/07/16 13:02, Richard Biener wrote:
> > On Thu, 30 Jun 2016, Kyrill Tkachov wrote:
> >
> > > On 28/06/16 08:54, Richard Biener wrote:
> > > > On Thu, 16 Jun 2016, Kyrill Tkachov wrote:
> > > >
> > > > > On 15/06/16 22:53, Marc Glisse wrote:
> > > > > > On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
> > > > > >
> > > > > > > This is a respin of
> > > > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
> > > > > > > feedback.
> > > > > > > I've changed the code to cast the operand to an unsigned type
> > > > > > > before
> > > > > > > applying the multiplication algorithm
> > > > > > > and cast it back to the signed type at the end.
> > > > > > > Whether to perform the cast is now determined by the function
> > > > > > > cast_mult_synth_to_unsigned in which I've implemented
> > > > > > > the cases that Marc mentioned in [1]. Please do let me know
> > > > > > > if there are any other cases that need to be handled.
> > > > > > Ah, I never meant those cases as an exhaustive list, I was just
> > > > > > looking
> > > > > > for
> > > > > > examples showing that the transformation was unsafe, and those 2
> > > > > > came to
> > > > > > mind:
> > > > > >
> > > > > > - x*15 -> x*16-x the second one can overflow even when the first one
> > > > > > doesn't.
> > > > > >
> > > > > > - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe
> > > > > > that's
> > > > > > redundant with the negate_variant check?)
> > > > > >
> > > > > > On the other hand, as long as we remain in the 'positive'
> > > > > > operations,
> > > > > > turning x*3 to x<<1+x seems perfectly safe. And even x*30 to
> > > > > > (x*16-x)*2
> > > > > > cannot cause spurious overflows. But I didn't look at the algorithm
> > > > > > closely
> > > > > > enough to characterize the safe cases. Now if you have done it,
> > > > > > that's
> > > > > > good
> > > > > > :-) Otherwise, we might want to err on the side of caution.
> > > > > >
> > > > > I'll be honest, I didn't give it much thought beyond convincing myself
> > > > > that
> > > > > the two cases you listed are legitimate.
> > > > > Looking at expand_mult_const in expmed.c can be helpful (where it
> > > > > updates
> > > > > val_so_far for checking purposes) to see
> > > > > the different algorithm cases. I think the only steps that could cause
> > > > > overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the
> > > > > final
> > > > > step is negate_variant, which are what you listed (and covered in this
> > > > > patch).
> > > > >
> > > > > richi is away on PTO for the time being though, so we have some time
> > > > > to
> > > > > convince ourselves :)
> > > > ;) I think the easiest way would be to always use unsigned arithmetic.
> > > >
> > > > While VRP doesn't do anything on vector operations we still have some
> > > > match.pd patterns that rely on correct overflow behavior and those
> > > > may be enabled for vector types as well.
> > > That's fine with me.
> > > Here's the patch that performs the casts to unsigned and back when the
> > > input
> > > type doesn't wrap on overflow.
> > >
> > > Bootstrapped and tested on arm, aarch64, x86_64.
> > >
> > > How's this?
> > +static bool
> > +target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
> > + tree scaltype)
> > +{
> > ...
> > + tree vectype = get_vectype_for_scalar_type (scaltype);
> > +
> > + if (!vectype)
> > + return false;
> > +
> > + /* All synthesis algorithms require shifts, so bail out early if
> > + target cannot vectorize them.
> > + TODO: If the target doesn't support vector shifts but supports
> > vector
> > + addition we could synthesize shifts that way.
> > vect_synth_mult_by_constant
> > + would need to be updated to do that. */
> > + if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
> > + return false;
> >
> > I think these tests should be done in the caller before calling
> > synth_mult (and vectype be passed down accordingly). Also I wonder
> > if synth_mult for a * 2 produces a << 1 or a + a - the case of
> > a * 2 -> a + a was what Marcs patch handled. Guarding off LSHIFT_EXPR
> > support that early will make that fail on targets w/o vector shift.
>
> I believe it depends on the relative rtx costs (which of course are not
> relevant
> at vector gimple level). Anyway, I've moved the check outside.
>
> I've also added code to synthesise the shifts by additions when vector shift
> is not available (the new function is synth_lshift_by_additions).
>
> > + bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
> > + bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
> > +
> > + if (var == negate_variant
> > + && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
> > + return false;
> >
> > I think we don't have any targets that support PLUS but not MINUS
> > or targets that do not support NEGATE. OTOH double-checking doesn't
> > matter.
> >
> > +apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
> > + stmt_vec_info stmt_vinfo)
> > +{
> > + if (TREE_INT_CST_LOW (op2) == 0
> >
> > integer_zerop (op2)
>
> Ok
>
> > + && (code == LSHIFT_EXPR
> > + || code == PLUS_EXPR))
> >
> > + tree itype = TREE_TYPE (op2);
> >
> > I think it's dangerous to use the type of op2 given you synthesize
> > shifts as well. Why not use the type of op1?
>
> Yeah, we should be taking the type of op1. Fixed.
>
> > + /* TODO: Targets that don't support vector shifts could synthesize
> > + them using vector additions. target_supports_mult_synth_alg
> > would
> > + need to be updated to allow this. */
> > + switch (alg.op[i])
> > + {
> >
> > so I suppose one could at least special-case << 1 and always use
> > PLUS for that.
>
> As said above, I added code to synthesize all constant shifts using additions
> if the target doesn't support vector shifts. Note that on targets that do
> support
> vector shifts we will still generate a shift. I think that's fair enough but
> if you
> really want to special-case this to always generate an addition I suppose I
> can do that.
>
> > Otherwise looks ok to me. There is a PR with a testcase for
> > the x * 2 issue so please add that if it is fixed or amend the fix if not
> > ;)
>
> Thanks for the review. I've added the * 2 and * 4 case.
> As for testing I've bootstrapped and tested the patch on aarch64 and x86_64
> with synth_shift_p in vect_synth_mult_by_constant hacked to be always true to
> exercise the paths that synthesize the shift by additions. Marc, could you
> test this
> on the sparc targets you care about?
Looks good.
Thanks,
Richard.
> Thanks,
> Kyrill
>
> 2016-07-05 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> PR target/65951
> * tree-vect-patterns.c: Include mult-synthesis.h.
> (target_supports_mult_synth_alg): New function.
> (synth_lshift_by_additions): Likewise.
> (apply_binop_and_append_stmt): Likewise.
> (vect_synth_mult_by_constant): Likewise.
> (target_has_vecop_for_code): Likewise.
> (vect_recog_mult_pattern): Use above functions to synthesize vector
> multiplication by integer constants.
>
> 2016-07-05 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> PR target/65951
> * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
> * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
> * gcc.dg/vect/pr65951.c: Likewise.
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)