This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114
- From: Alan Lawrence <alan dot lawrence at arm dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, Marcus Shawcroft <marcus dot shawcroft at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 06 Oct 2014 18:31:07 +0100
- Subject: Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114
- Authentication-results: sourceware.org; auth=none
- References: <541AC4D2 dot 9040901 at arm dot com> <CAFiYyc0wF_ep0sdTiqgn8KV5Jxzfg4BxmkXJPN7wKJ91Qotakw at mail dot gmail dot com>
Ok, so unless there are objections, I plan to commit patches 1, 2, 4, 5, and 6,
which have been previously approved, in that sequence. (Of those, all bar patch
2 are AArch64 only.) I think this is better than maintaining an ever-expanding
Then I'll get to work on migrating all backends to the new _scal_ optab (and
removing the vector optab). Certainly I'd like to replace vec_shr/l with
vec_perm_expr too, but I'm conscious that the end of stage 1 is approaching!
Richard Biener wrote:
On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence <firstname.lastname@example.org> wrote:
The end goal here is to remove this code from tree-vect-loop.c
bitpos = size_binop (MULT_EXPR,
bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) -
as this is the root cause of PR/61114 (see testcase there, failing on all
bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener,
"all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
suspicious". The code snippet above is used on two paths:
(Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
The optab is documented as "the scalar result is stored in the least
significant bits of operand 0", but the tree code as "the first element in
the vector holding the result of the reduction of all elements of the
operand". This mismatch means that when the tree code is folded, the code
snippet above reads the result from the wrong end of the vector.
The strategy (as per
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new
tree codes and optabs that produce scalar results directly; this seems
better than tying (the element of the vector into which the result is
placed) to (the endianness of the target), and avoids generating extra moves
on current bigendian targets. However, the previous optabs are retained for
now as a migration strategy so as not to break existing backends; moving
individual platforms over will follow.
A complication here is on AArch64, where we directly generate
REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
remove this folding in order to decouple the midend and AArch64 backend.
Sounds fine. I hope we can transition all backends for 5.0 and remove
the vector variant optabs (maybe renaming the scalar ones).
(Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab
is defined in an endianness-dependent way, leading to significant
complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab is
never used!). Few platforms appear to handle vec_shr_optab (and fewer
bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
the existing optab to be endianness-neutral.
Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
updates that implementation to fit the new endianness-neutral specification,
serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
equivalents for MIPS and PowerPC; I haven't tested these but hope they act
as useful pointers for the port maintainers.
Finally patch 14 cleans up the affected part of tree-vect-loop.c
As said during the individual patches review I'd like the vectorizer to
use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
only whole-element amounts). This means we can remove
VEC_RSHIFT_EXPR. It also means that if the backend defines
vec_perm_const (which it really should) it can handle the special
permutes that boil down to a possibly more efficient vector shift
there (a good optimization anyway). Until it does that all backends
would at least create correct code (with the endian dependent