This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

rfc and [autovect patch] supporting reduction patterns





I committed to auotvect support for additional reduction forms. There are a
couple of alternatives for how to represent these operations - I'd like to
consult on what seems to be the more suitable way. Comments appreciated.

The general reduction form we support now is:
      a = op < y , a >
such that: (1) the reduction variable 'a' is the last (second) argument,
and (2) the type of the reduction variable matches the type of the rest of
the arguments (y).

There are important reduction idioms that don't fit into this general form
- summation of elements into an accumulator of a wider type, and similarly
summation of products (dot product) or of absolute differences into an
accumulator of a wider type. These idioms are very common (especially in
multimedia workloads), and also often have specialized vector support, at
least for some data-types.

(Sometimes these computations can be vectorized even if the pattern is not
detected (i.e. stmt by stmt), but much less efficiently, due to the type
conversions that require packing/unpacking of data between vectors. Even
if/when these type-conversions are supported (by the target and
vectorizer), the vectorizer might decide not to transform the loop due to
the overheads related to packing/unpacking).

So we want to look for these patterns from within the vectorizer (the
vectorizer is probably the only optimization that can benefit from
detecting these patterns?), and the question is whether to introduce these
patterns as generic operations or to do this pattern recognition on a
target-specific basis:

==> Option 1: Generic:
* pattern detection: the code that detects these idioms will be within the
generic pattern detection code in the vectorizer.
* representation: when a pattern is detected, a new operation that can
replace the pattern is inserted; we will introduce new tree-codes for this
purpose, e.g. WIDEN_SUM(op0,op1), MULSUM(op0,op1,op2), etc, along with new
optabs, e.g. widen_sum_optab, mulsum_optab.
* vectorization: these operations will be classified by the vectorizer as
"reduction_pattern" operations, all of which are expected to have the
following form:
      a = reduc-pattern-op <w,x,y,...,a>
such that: (1) the reduction variable 'a' is the last argument, and (2) the
type of the reduction variable can be wider than the type of the rest of
the arguments (w,x,y,...).

==> Option 2: Target secific:
* pattern detection: the code that detects these idioms will be inside each
port that supports these operations, activated from the vectorizer by a
target hook.
* representation: when a pattern is detected, a new operation that can
replace the pattern is inserted; a target will introduce new
target-builtins for this purpose, e.g.
target.builtin.widenning_summation(op0,op1), and create a call to that
builtin.
* vectorization: these operations will be classified by the vectorizer as
"target_reduction_pattern" operations, all of which are expected to have
the following form:
      a = call <target_builtin <w,x,y,...,a> >
such that: (1) the reduction variable 'a' is the last argument, and (2) the
type of the reduction variable can be wider than the type of the rest of
the arguments (w,x,y,...).


I think option 1 (generic) is suitable here because the idioms in question
are general and pretty common, and this way we can avoid code duplication
between different targets. (Also at present calls to target-builtins are
not optimized very well, so it might be better to try to avoid it). On the
other hand, these operations are not widely supported for all datatypes
(usually just for a small subset). I implemented option 2 (target-specific)
for now, but switching to option 1 would be very simple.

If we go with option 1 (generic), there are still a couple of alternatives
for how to define the semantics of the new optabs:

- option 1.1: the type-size of the reduction variable (which is also the
result of the computation) is exactly double the type-size of the reduction
arguments. i.e, we can express summation of QI into HI, but we can't
express summation of QI into SI. We can solve this by either introducing an
additional tree-code&optab for "wider_widen_sum" (for which the type-size
of the reduction variable is 4 times the type-size of the other arguments),
or, leave the wider reduction forms for target specific builtins.

- option 1.2: the type of the reduction variable is always X (some default
predefined by each target). e.g., always sum into 32bit accumulators (if
the target defines X to be 32). This may not be suitable for targets that
have multiple accumulation sizes, however, one could often support the
smaller-sized accumulations by truncating the final result produced by
wider-sized accumulations, so this could potentially suffice to cover all
reduction forms a target supports. If not, then we could resort to target
specific builtins for the cases we can't express with these optabs.

Here's the patch committed to autovect (implementing the target specific
approach), along with a new testcase.

thanks,

dorit

Changelog:

        * defaults.h (TARGET_VECT_NUM_PATTERNS): New.
        * target-def.h (TARGET_VECTORIZE_BUILTIN_VECT_PATTERN_RECOG): New.
        * target.h (builtin_vect_pattern_recog): New.
        * tree-vect-analyze.c (target.h): Include.
        (vect_pattern_recog_1): Removed static qualifier. Declaration moved
to
        tree-vectorizer.h.
        (vect_recog_unsigned_subsat_pattern): Takes additional argument.
        Removed dump print. Handle target-reduction-pattern case.
        (vect_pattern_recog): Call
targetm.vectorize.builtin_vect_pattern_recog.
        (vect_analyze_loop): Move call to vect_pattern_recog to after the
call
        to vect_analyze_scalar_cycles.
        (vect_determine_vectorization_factor): Change skip test from
checking
        for reduction, to checking for STMT_VINFO_LIVE_P. Handle the case
of
        vectype being already initialized.
        (vect_analyze_operations): Call
vectorizable_target_reduction_pattern.
        (vect_mark_relevant): When dealing with the last stmt in a sequence
        that was recognized as a certain idiom - use the pattern-stmt that
        replaces the sequence.
        (vect_mark_stmts_to_be_vectorized): Remove special handling of
        STMT_VINFO_IN_PATTERN_P.
        * tree-vect-transform.c (vect_create_epilog_for_reduction): New
        function. Contains functionality that was factored out of
        vectorizable_reduction.
        (vectorizable_target_reduction_pattern): New function.
        (vectorizable_reduction): Epilog creation code was factored out
into
        a new function vect_create_epilog_for_reduction.
        (vect_transform_stmt): Call vectorizable_target_reduction_pattern.
        * tree-vectorizer.h (target_reduc_pattern_vec_info_type): New enum
        value.
        (vect_recog_unsigned_subsat_pattern): Takes additional argument.
        (_recog_func_ptr): Takes additional argument.
        (vectorizable_target_reduction_pattern): New function declaration.
        (vect_pattern_recog_func): Rename to vect_pattern_recog_funcs.

        * tree-inline.c (estimate_num_insns_1): Add missing cases -
        REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR.
        * tree-ssa-operands.c (get_expr_operands): Remove redundant cases -
        REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR.
        * tree-vect-transform.c (get_initial_def_for_reduction): Remove
case
        that is not supported yet.
        * tree.def (REDUC_BIT_AND_EXPR, REDUC_BIT_IOR_EXPR,
REDUC_BIT_XOR_EXPR)
        (REDUC_MULT_EXPR): Remove tree-codes that are not yet supported.
        * tree-vectorizer.c (reduction_code_for_scalar_code): Remove cases
that
        are not yet unsupported.
        (vect_is_simple_reduction): Fix comment.
        * tree-vect-analyze.c (vect_analyze_operations): Minor changes to
print
        messages.
        (vect_mark_stmts_to_be_vectorized): Likewise.
        (vect_determine_vectorization_factor): Likewise.

        * config/rs6000/rs6000.c (tree-flow.h, tree-data-ref.h)
        (tree-vectorizer.h): Include.
        (altivec_builtin_widening_summation): New global variable to hold
the
        decl of the builtin widening_summation.
        (target_vect_recog_widening_summation_pattern): New pattern
recognition
        function.
        (target_vect_pattern_recog_funcs): New global array to hold
pointers
        to pattern-recognition functions.
        (rs6000_builtin_vect_pattern_recog): New function. Implements the
target
        builtin vect_pattern_recog.
        (rs6000_expand_builtin): Support ALTIVEC_BUILTIN_WIDENING_SUMMATION
        case.
        (altivec_init_builtins): Add new function type
v4si_ftype_v8hi_v4si.
        Initialize a new target builtin
__builtin_altivec_widening_summation.
        Set altivec_builtin_widening_summation.
        * config/rs6000/rs6000.c (TARGET_VECT_NUM_PATTERNS): New define.
        (ALTIVEC_BUILTIN_WIDENING_SUMMATION): New builtin.
        * config/rs6000/t-rs6000 (tree-flow.h, tree-data-ref.h)
        (tree-vectorizer.h): Add dependency.


(See attached file: diff.april3)

Attachment: diff.april3
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]