This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] (4.1 stage 2 projects): vectorize reduction, part 3/n





Hi,

This patch allows vectorizing reduction without specialized target support,
by adding support for two additional ways to generate the reduction epilog
code, that don't require a specialized REDUC_op:

(alternative 1) sequentially compute the reduction on the scalar elements
(the partial results) of the vector.

(alternative 2) For targets that support whole vector shifts, we generate
the following:
   for (offset = VS/2; offset >= element_size; offset/=2) {
       va' = vec_shift <va, offset>
       va = vop <va, va'>
   }
   va = bit_field_ref_expr <va, bitpos>
For example, when VS=128bit bytes and the data type operated upon is ints,
the following will be generated for little-endian targets (for big-endian
we generate the same with vec_shift_right instead of vec_shift_left):
      va_4 = vec_shift_left <va_3, 64>
      va_5 = vop <va_3, va_4>
      va_6 = vec_shift_left <va_5, 32>
      va_7 = vop <va_5, va_6>
      a_5 = bit_field_ref_expr <va_7, bitpos>
This scheme requires adding new tree-codes for the vector-shifts. I
introduced vector-shifts that take the shift amount as a constant
(immediate value), and in bits.

The overall scheme is:
if (HAVE_REDUC_op)
  generate current scheme using REDUC_op
else if (HAVE_whole-vector-shift)
  generate alternative 2
else
  generate alternative 1

The testcases are changed approprietly (enabled for all relevant targets).
Also removed the now unnecessary reduc_op patterns from altivec.md (these
were implemented using vector shifts), and the vect_reduction keyword from
target-supports.exp. Added floating point tests and tests for more data
types. Some of these testcases require additional flags (e.g., -trapv,
-ffast-math) - I handled that in vect.exp.

One thing in the patch that would need to be handled differently is this
bit:

Index: optabs.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/optabs.c,v
retrieving revision 1.281
diff -u -3 -p -r1.281 optabs.c
--- optabs.c    18 Jun 2005 13:18:37 -0000      1.281
+++ optabs.c    19 Jun 2005 16:26:29 -0000
@@ -301,7 +301,15 @@ optab_for_tree_code (enum tree_code code
       return TYPE_UNSIGNED (type) ? reduc_umin_optab : reduc_smin_optab;

     case REDUC_PLUS_EXPR:
-      return reduc_plus_optab;
+      return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab;
+
+    case VEC_LSHIFT_EXPR:
+      /* FIXME: this optab is appropriate only if second argument is
constant.  */
+      return vec_shli_optab;
+
+    case VEC_RSHIFT_EXPR:
+      /* FIXME: this optab is appropriate only if second argument is
constant.  */
+      return vec_shri_optab;

     default:
       break;

I added that to work around vec_lower pass lowering the vector shifts -
please see http://gcc.gnu.org/ml/gcc/2005-06/msg00834.html for more detail.

Bootstrapped and testsed on powerpc-darwin and i686-pc-linux-gnu.

The max/min reductions do not get vectorized on i686 for the following
cases:
In vect-reduc-1.c/vect-reduc-1short.c: the unsigned int and unsigned short
max/min loops.
In vect-reduc-2.c/vect-reduc-2short.c: the signed int and signed char
mac/min loops.
(indeed I see only a gen_umaxv16qi3, and gen_smaxv8hi3).
Shall I introduce a target keyword for each of these? or is it
possible/easier to implement these? (also don't know what happens with
other platform...)

ok for mainline (after figuring out what to do with the tests and the
optab_for_tree_code issue)?

This is probably the last reduction patch that will make it for stage 2 of
4.1 - I will be away for a month as of Tuesday.

thanks,

dorit

Changelog:

        * genopinit.c (vec_shli_optab, vec_shri_optab): Initialize new
optabs.
        (reduc_plus_optab): Removed.  Replcaed with...
        (reduc_splus_optab, reduc_uplus_optab): Initialize new optabs.
        * optabs.c (optab_for_tree_code): Return reduc_splus_optab or
        reduc_uplus_optab instead of reduc_plus_optab.
        (expand_vec_shift_expr): New function.
        (init_optabs): Initialize new optabs. Remove initialization of
        reduc_plus_optab.
        (optab_for_tree_code): Temporarily return
vec_shli_optab/vec_shri_optab
        for VEC_LSHIFT_EXPR/VEC_RSHIFT_EXPR.
        * optabs.h (OTI_reduc_plus): Removed. Replaced with...
        (OTI_reduc_splus, OTI_reduc_uplus): New.
        (reduc_plus_optab): Removed.  Replcaed with...
        (reduc_splus_optab, reduc_uplus_optab): New optabs.
        (vec_shli_optab, vec_shri_optab): New optabs.
        (expand_vec_shift_expr): New function declaration.

        * tree.def (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR): New tree-codes.
        * tree-inline.c (estimate_num_insns_1): Handle new tree-codes.
        * expr.c (expand_expr_real_1): Handle new tree-codes.
        * tree-pretty-print.c (dump_generic_node, op_symbol, op_prio):
Likewise.

        * tree-vect-transform.c (vect_create_epilog_for_reduction): Add two
        alternatives for generating reduction epilog code.
        (vectorizable_reduction): Don't fail of direct reduction support is
        not available.
        (vectorizable_target_reduction_pattern): Likewise.

        * config/rs6000/altivec.md (reduc_smax_v4si, reduc_smax_v4sf,
        reduc_umax_v4si, reduc_smin_v4si, reduc_smin_v4sf, reduc_umin_v4si,
        reduc_plus_v4si, reduc_plus_v4sf): Removed.
        (vec_shli_<mode>, vec_shri_<mode>, altivec_vsumsws_nomode,
        reduc_splus_<mode>, reduc_uplus_v16qi): New.

testsuite/Changelog:

        * lib/target-supports.exp (check_effective_target_vect_reduction):
Remove.
        * gcc.dg/vect/vect.exp: Run tests with additional flags separately.
        * gcc.dg/vect/vect-reduc-1.c: Vectorizable on all
        relevant platforms - remove vect_reduction target keyword.
        * gcc.dg/vect/vect-reduc-2.c: Likewise.
        * gcc.dg/vect/vect-reduc-3.c: Likewise.
        * gcc.dg/vect/vect-reduc-2short.c: New test.
        * gcc.dg/vect/vect-reduc-2char.c: New test.
        * gcc.dg/vect/vect-reduc-3short.c: New test.
        * gcc.dg/vect/vect-reduc-3char.c: New test.
        * gcc.dg/vect/vect-reduc-6.c: New test.
        * gcc.dg/vect/trapv-vect-reduc-4.c: New test.
        * gcc.dg/vect/fast-math-vect-reduc-5.c: New test.

Patch and new tests:


(See attached file: reduc3.june19) (See attached file: reduc_tests.tar.gz)

Attachment: reduc3.june19
Description: Binary data

Attachment: reduc_tests.tar.gz
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]