This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[autovect] [patch] vectorize reduction without specialized target support





The epilog code of a vectorized reduction computation reduces a vector of
partial results into a single scalar result. Currently we do that using a
"REDUC_op" stmt followed by a BIT_FIELD expression to extract the scalar
result from the vector register:
      va_4 = reduc_op <va_3>
      a_5 = bit_field_ref_expr <va_4, bitpos>
This scheme is only applicable for targets that support these REDUC_op
operations.

This patch adds support for two additional ways to generate the reduction
epilog code, that don't require a specialized REDUC_op:

(alternative 1) sequentially compute the reduction on the scalar elements
(the partial results) of the vector.

(alternative 2) For targets that support whole vector shifts, we generate
the following:
   for (offset = VS/2; offset >= element_size; offset/=2) {
       va' = vec_shift_left <va, offset>
       va = vop <va, va'>
   }
   va = bit_field_ref_expr <va, bitpos>
For example, when VS=16 bytes and the data type operated upon is ints, this
will be generated:
      va_4 = vec_shift_left <va_3, 8>
      va_5 = vop <va_3, va_4>
      va_6 = vec_shift_left <va_5, 4>
      va_7 = vop <va_5, va_6>
      a_5 = bit_field_ref_expr <va_7, bitpos>
This scheme requires adding new tree-codes for the vector-shifts. I
introduced the simplest vector-shift possible - the shift amount is in
bytes, and is a constant (immediate value). This has to potential to be
easily expanded to the most efficient code, and be applicable to as many
targets as possible.

The overall scheme is:
if (HAVE_REDUC_op)
  generate current scheme using REDUC_op
else if (HAVE_whole-vector-shift)
  generate alternative 2
else
  generate alternative 1

This allows vectorizing reduction without special target support - the
testcases are changed approprietly (enabled for all relevant targets). Also
removed the now unnecessary reduc_op patterns from altivec.md (these were
implemented using vector shifts).

Bootstrapped and testsed on powerpc-darwin and i686-pc-linux-gnu, committed
to autovect branch

dorit

Changelog:

        * genopinit.c (vec_shli_optab, vec_shri_optab): Initialize new
optabs.
        (reduc_plus_optab): Removed.  Replcaed with...
        (reduc_splus_optab, reduc_uplus_optab): Initialize new optabs.
        * optabs.c (optab_for_tree_code): Return reduc_splus_optab or
        reduc_uplus_optab instead of reduc_plus_optab.
        (expand_vec_shift_expr): New function.
        (init_optabs): Initialize new optabs. Remove initialization of
        reduc_plus_optab.
        * optabs.h (OTI_reduc_plus): Removed. Replaced with...
        (OTI_reduc_splus, OTI_reduc_uplus): New.
        (reduc_plus_optab): Removed.  Replcaed with...
        (reduc_splus_optab, reduc_uplus_optab): New optabs.
        (vec_shli_optab, vec_shri_optab): New optabs.
        (expand_vec_shift_expr): New function declaration.

        * tree.def (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR): New tree-codes.
        * tree-inline.c (estimate_num_insns_1): Handle new tree-codes.
        * expr.c (expand_expr_real_1): Handle new tree-codes.
        * tree-pretty-print.c (dump_generic_node, op_symbol, op_prio):
Likewise.

        * tree-vect-transform.c (vect_create_epilog_for_reduction): Add two
        alternatives for generating reduction epilog code.
        (vectorizable_reduction): Don't fail of direct reduction support is
        not available.
        (vectorizable_target_reduction_pattern): Likewise.

        * config/rs6000/altivec.md (reduc_smax_v4si, reduc_smax_v4sf,
        reduc_umax_v4si, reduc_smin_v4si, reduc_smin_v4sf, reduc_umin_v4si,
        reduc_plus_v4si, reduc_plus_v4sf): Removed.
        (vec_shli_<mode>, vec_shri_<mode>, altivec_vsumsws_nomode,
        reduc_splus_<mode>, reduc_uplus_v16qi): New.

testsuite/Changelog:

        * gcc.dg/vect/no_version/vect-reduc-1.c: Vectorizable on all
        relevant platforms - remove "target powerpc*-*-*" restriction.
        * gcc.dg/vect/no_version/vect-reduc-2.c: Likewise.
        * gcc.dg/vect/no_version/vect-reduc-3.c: Likewise.
        * gcc.dg/vect/no_version/vect-reduc-2short.c: New test.
        * gcc.dg/vect/no_version/vect-reduc-2char.c: New test.
        * gcc.dg/vect/no_version/vect-reduc-3short.c: New test.
        * gcc.dg/vect/no_version/vect-reduc-3char.c: New test.

patch:

(See attached file: autovect.june15)

Attachment: autovect.june15
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]