This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114


Ok....well, I see a path forward, somewhere there....

however (bah), I can't push that subset of patches - I came back from a week's
holiday and misremembered - the AArch64 changes depend upon the introduction of
the _scal_optabs, not just the tree changes  :( .

I'll try to post optab migration patches next week for x86, rs6000 (mostly, I
haven't figured out paired.md yet), ARM, and IA64 (fwiw; has only v2sf
reductions). Having looked at MIPS/Loongson I'm feeling a bit bewildered and not
sure how to proceed, so I think I must ask the MIPS maintainers (CC'd) for
assistance: how can one add a vec_extract, to produce a scalar result, to the
end of each reduc_ optab ?

--Alan


Richard Biener wrote:
> > On Mon, Oct 6, 2014 at 7:30 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> >> Ok, so unless there are objections, I plan to commit patches 1, 2, 4, 5, and
>> >> 6,
>> >> which have been previously approved, in that sequence. (Of those, all bar
>> >> patch
>> >> 2 are AArch64 only.) I think this is better than maintaining an
>> >> ever-expanding
>> >> patch series.
> >
> > Agreed.
> >
>> >> Then I'll get to work on migrating all backends to the new _scal_ optab (and
>> >> removing the vector optab). Certainly I'd like to replace vec_shr/l with
>> >> vec_perm_expr too, but I'm conscious that the end of stage 1 is approaching!
> >
> > I suppose we all are.  It will last until end of October at least
> > (stage1 of gcc 4.9
> > ended Nov 22th, certainly a bit late).
> >
> > I do expect we will continue merging already developed / posted stuff through
> > stage3 (as usual).
> >
> > That said, it would be really nice to get rid of VEC_RSHIFT_EXPR.
> >
> > Thanks,
> > Richard.
> >
>> >> --Alan
>> >>
>> >>
>> >>
>> >>
>> >> Richard Biener wrote:
>>> >>> On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence <alan.lawrence@arm.com>
>>> >>> wrote:
>>>> >>>> The end goal here is to remove this code from tree-vect-loop.c
>>>> >>>> (vect_create_epilog_for_reduction):
>>>> >>>>
>>>> >>>>       if (BYTES_BIG_ENDIAN)
>>>> >>>>         bitpos = size_binop (MULT_EXPR,
>>>> >>>>                              bitsize_int (TYPE_VECTOR_SUBPARTS (vectype)
>>>> >>>> -
>>>> >>>> 1),
>>>> >>>>                              TYPE_SIZE (scalar_type));
>>>> >>>>       else
>>>> >>>>
>>>> >>>> as this is the root cause of PR/61114 (see testcase there, failing on all
>>>> >>>> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard
>>>> >>>> Biener,
>>>> >>>> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
>>>> >>>> suspicious". The code snippet above is used on two paths:
>>>> >>>>
>>>> >>>> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
>>>> >>>> reduc_[us](plus|min|max)_optab.
>>>> >>>> The optab is documented as "the scalar result is stored in the least
>>>> >>>> significant bits of operand 0", but the tree code as "the first element
>>>> >>>> in
>>>> >>>> the vector holding the result of the reduction of all elements of the
>>>> >>>> operand". This mismatch means that when the tree code is folded, the code
>>>> >>>> snippet above reads the result from the wrong end of the vector.
>>>> >>>>
>>>> >>>> The strategy (as per
>>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define
>>>> >>>> new
>>>> >>>> tree codes and optabs that produce scalar results directly; this seems
>>>> >>>> better than tying (the element of the vector into which the result is
>>>> >>>> placed) to (the endianness of the target), and avoids generating extra
>>>> >>>> moves
>>>> >>>> on current bigendian targets. However, the previous optabs are retained
>>>> >>>> for
>>>> >>>> now as a migration strategy so as not to break existing backends; moving
>>>> >>>> individual platforms over will follow.
>>>> >>>>
>>>> >>>> A complication here is on AArch64, where we directly generate
>>>> >>>> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
>>>> >>>> remove this folding in order to decouple the midend and AArch64 backend.
>>> >>>
>>> >>> Sounds fine.  I hope we can transition all backends for 5.0 and remove
>>> >>> the vector variant optabs (maybe renaming the scalar ones).
>>> >>>
>>>> >>>> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
>>>> >>>> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the
>>>> >>>> optab
>>>> >>>> is defined in an endianness-dependent way, leading to significant
>>>> >>>> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab
>>>> >>>> is
>>>> >>>> never used!). Few platforms appear to handle vec_shr_optab (and fewer
>>>> >>>> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
>>>> >>>> the existing optab to be endianness-neutral.
>>>> >>>>
>>>> >>>> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
>>>> >>>> updates that implementation to fit the new endianness-neutral
>>>> >>>> specification,
>>>> >>>> serving as a guide for other existing backends. Patches/RFCs 15 and 16
>>>> >>>> are
>>>> >>>> equivalents for MIPS and PowerPC; I haven't tested these but hope they
>>>> >>>> act
>>>> >>>> as useful pointers for the port maintainers.
>>>> >>>>
>>>> >>>> Finally patch 14 cleans up the affected part of tree-vect-loop.c
>>>> >>>> (vect_create_epilog_for_reduction).
>>> >>>
>>> >>> As said during the individual patches review I'd like the vectorizer to
>>> >>> use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
>>> >>> only whole-element amounts).  This means we can remove
>>> >>> VEC_RSHIFT_EXPR.  It also means that if the backend defines
>>> >>> vec_perm_const (which it really should) it can handle the special
>>> >>> permutes that boil down to a possibly more efficient vector shift
>>> >>> there (a good optimization anyway).  Until it does that all backends
>>> >>> would at least create correct code (with the endian dependent
>>> >>> vec_shr removed).
>>> >>>
>>> >>> Richard.
>>> >>>
>>>> >>>> --Alan
>>>> >>>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]