[PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result
Richard Biener
richard.guenther@gmail.com
Mon Sep 22 10:34:00 GMT 2014
On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.
>
> These are presently documented as producing a vector with the result in
> element 0, and this is inconsistent with their use in tree-vect-loop.c
> (which on bigendian targets pulls the bits out of the wrong end of the
> vector result). This leads to bugs on bigendian targets - see also
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114.
>
> I discounted "fixing" the vectorizer (to read from element 0) and then
> making bigendian targets (whose architectural insn produces the result in
> lane N-1) permute the result vector, as optimization of vectors in RTL seems
> unlikely to remove such a permute and would lead to a performance
> regression.
>
> Instead it seems more natural for the tree code to produce a scalar result
> (producing a vector with the result in lane 0 has already caused confusion,
> e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).
>
> However, this patch preserves the meaning of the optab (producing a result
> in lane 0 on little-endian architectures or N-1 on bigendian), thus
> generally avoiding the need to change backends. Thus, expr.c extracts an
> endianness-dependent element from the optab result to give the result
> expected for the tree code.
>
> Previously posted as an RFC
> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra
> VIEW_CONVERT_EXPR if the types of the reduction/result do not match.
Huh. Does that ever happen? Please use a NOP_EXPR instead of
a VIEW_CONVERT_EXPR.
Ok with that change.
Thanks,
Richard.
> Testing:
> x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
> aarch64-none-linux-gnu: bootstrap
> aarch64-none-elf: check-gcc, check-g++
> arm-none-eabi: check-gcc
>
> aarch64_be-none-elf: check-gcc, showing
> FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
> FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
> Passes the (previously-failing) reduced testcase on
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114
>
> Have also assembler/stage-1 tested that testcase on PowerPC, also
> fixed.
> gcc/ChangeLog:
>
> * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
> extract_bit_field around optab result.
>
> * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
> produce
> scalar not vector.
>
> * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
> type
> for REDUC_{MIN,MAX,PLUS}_EXPR.
>
> * tree-vect-loop.c (vect_analyze_loop): Update comment.
> (vect_create_epilog_for_reduction): For direct vector reduction, use
> result of tree code directly without extract_bit_field.
>
> * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
> comment.
More information about the Gcc-patches
mailing list