[PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114
Mon Sep 22 11:21:00 GMT 2014
On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence <email@example.com> wrote:
> The end goal here is to remove this code from tree-vect-loop.c
> if (BYTES_BIG_ENDIAN)
> bitpos = size_binop (MULT_EXPR,
> bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) -
> TYPE_SIZE (scalar_type));
> as this is the root cause of PR/61114 (see testcase there, failing on all
> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener,
> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
> suspicious". The code snippet above is used on two paths:
> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
> The optab is documented as "the scalar result is stored in the least
> significant bits of operand 0", but the tree code as "the first element in
> the vector holding the result of the reduction of all elements of the
> operand". This mismatch means that when the tree code is folded, the code
> snippet above reads the result from the wrong end of the vector.
> The strategy (as per
> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new
> tree codes and optabs that produce scalar results directly; this seems
> better than tying (the element of the vector into which the result is
> placed) to (the endianness of the target), and avoids generating extra moves
> on current bigendian targets. However, the previous optabs are retained for
> now as a migration strategy so as not to break existing backends; moving
> individual platforms over will follow.
> A complication here is on AArch64, where we directly generate
> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
> remove this folding in order to decouple the midend and AArch64 backend.
Sounds fine. I hope we can transition all backends for 5.0 and remove
the vector variant optabs (maybe renaming the scalar ones).
> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab
> is defined in an endianness-dependent way, leading to significant
> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab is
> never used!). Few platforms appear to handle vec_shr_optab (and fewer
> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
> the existing optab to be endianness-neutral.
> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
> updates that implementation to fit the new endianness-neutral specification,
> serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
> equivalents for MIPS and PowerPC; I haven't tested these but hope they act
> as useful pointers for the port maintainers.
> Finally patch 14 cleans up the affected part of tree-vect-loop.c
As said during the individual patches review I'd like the vectorizer to
use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
only whole-element amounts). This means we can remove
VEC_RSHIFT_EXPR. It also means that if the backend defines
vec_perm_const (which it really should) it can handle the special
permutes that boil down to a possibly more efficient vector shift
there (a good optimization anyway). Until it does that all backends
would at least create correct code (with the endian dependent
More information about the Gcc-patches