This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
I committed to auotvect support for additional reduction forms. There are a couple of alternatives for how to represent these operations - I'd like to consult on what seems to be the more suitable way. Comments appreciated. The general reduction form we support now is: a = op < y , a > such that: (1) the reduction variable 'a' is the last (second) argument, and (2) the type of the reduction variable matches the type of the rest of the arguments (y). There are important reduction idioms that don't fit into this general form - summation of elements into an accumulator of a wider type, and similarly summation of products (dot product) or of absolute differences into an accumulator of a wider type. These idioms are very common (especially in multimedia workloads), and also often have specialized vector support, at least for some data-types. (Sometimes these computations can be vectorized even if the pattern is not detected (i.e. stmt by stmt), but much less efficiently, due to the type conversions that require packing/unpacking of data between vectors. Even if/when these type-conversions are supported (by the target and vectorizer), the vectorizer might decide not to transform the loop due to the overheads related to packing/unpacking). So we want to look for these patterns from within the vectorizer (the vectorizer is probably the only optimization that can benefit from detecting these patterns?), and the question is whether to introduce these patterns as generic operations or to do this pattern recognition on a target-specific basis: ==> Option 1: Generic: * pattern detection: the code that detects these idioms will be within the generic pattern detection code in the vectorizer. * representation: when a pattern is detected, a new operation that can replace the pattern is inserted; we will introduce new tree-codes for this purpose, e.g. WIDEN_SUM(op0,op1), MULSUM(op0,op1,op2), etc, along with new optabs, e.g. widen_sum_optab, mulsum_optab. * vectorization: these operations will be classified by the vectorizer as "reduction_pattern" operations, all of which are expected to have the following form: a = reduc-pattern-op <w,x,y,...,a> such that: (1) the reduction variable 'a' is the last argument, and (2) the type of the reduction variable can be wider than the type of the rest of the arguments (w,x,y,...). ==> Option 2: Target secific: * pattern detection: the code that detects these idioms will be inside each port that supports these operations, activated from the vectorizer by a target hook. * representation: when a pattern is detected, a new operation that can replace the pattern is inserted; a target will introduce new target-builtins for this purpose, e.g. target.builtin.widenning_summation(op0,op1), and create a call to that builtin. * vectorization: these operations will be classified by the vectorizer as "target_reduction_pattern" operations, all of which are expected to have the following form: a = call <target_builtin <w,x,y,...,a> > such that: (1) the reduction variable 'a' is the last argument, and (2) the type of the reduction variable can be wider than the type of the rest of the arguments (w,x,y,...). I think option 1 (generic) is suitable here because the idioms in question are general and pretty common, and this way we can avoid code duplication between different targets. (Also at present calls to target-builtins are not optimized very well, so it might be better to try to avoid it). On the other hand, these operations are not widely supported for all datatypes (usually just for a small subset). I implemented option 2 (target-specific) for now, but switching to option 1 would be very simple. If we go with option 1 (generic), there are still a couple of alternatives for how to define the semantics of the new optabs: - option 1.1: the type-size of the reduction variable (which is also the result of the computation) is exactly double the type-size of the reduction arguments. i.e, we can express summation of QI into HI, but we can't express summation of QI into SI. We can solve this by either introducing an additional tree-code&optab for "wider_widen_sum" (for which the type-size of the reduction variable is 4 times the type-size of the other arguments), or, leave the wider reduction forms for target specific builtins. - option 1.2: the type of the reduction variable is always X (some default predefined by each target). e.g., always sum into 32bit accumulators (if the target defines X to be 32). This may not be suitable for targets that have multiple accumulation sizes, however, one could often support the smaller-sized accumulations by truncating the final result produced by wider-sized accumulations, so this could potentially suffice to cover all reduction forms a target supports. If not, then we could resort to target specific builtins for the cases we can't express with these optabs. Here's the patch committed to autovect (implementing the target specific approach), along with a new testcase. thanks, dorit Changelog: * defaults.h (TARGET_VECT_NUM_PATTERNS): New. * target-def.h (TARGET_VECTORIZE_BUILTIN_VECT_PATTERN_RECOG): New. * target.h (builtin_vect_pattern_recog): New. * tree-vect-analyze.c (target.h): Include. (vect_pattern_recog_1): Removed static qualifier. Declaration moved to tree-vectorizer.h. (vect_recog_unsigned_subsat_pattern): Takes additional argument. Removed dump print. Handle target-reduction-pattern case. (vect_pattern_recog): Call targetm.vectorize.builtin_vect_pattern_recog. (vect_analyze_loop): Move call to vect_pattern_recog to after the call to vect_analyze_scalar_cycles. (vect_determine_vectorization_factor): Change skip test from checking for reduction, to checking for STMT_VINFO_LIVE_P. Handle the case of vectype being already initialized. (vect_analyze_operations): Call vectorizable_target_reduction_pattern. (vect_mark_relevant): When dealing with the last stmt in a sequence that was recognized as a certain idiom - use the pattern-stmt that replaces the sequence. (vect_mark_stmts_to_be_vectorized): Remove special handling of STMT_VINFO_IN_PATTERN_P. * tree-vect-transform.c (vect_create_epilog_for_reduction): New function. Contains functionality that was factored out of vectorizable_reduction. (vectorizable_target_reduction_pattern): New function. (vectorizable_reduction): Epilog creation code was factored out into a new function vect_create_epilog_for_reduction. (vect_transform_stmt): Call vectorizable_target_reduction_pattern. * tree-vectorizer.h (target_reduc_pattern_vec_info_type): New enum value. (vect_recog_unsigned_subsat_pattern): Takes additional argument. (_recog_func_ptr): Takes additional argument. (vectorizable_target_reduction_pattern): New function declaration. (vect_pattern_recog_func): Rename to vect_pattern_recog_funcs. * tree-inline.c (estimate_num_insns_1): Add missing cases - REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR. * tree-ssa-operands.c (get_expr_operands): Remove redundant cases - REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR. * tree-vect-transform.c (get_initial_def_for_reduction): Remove case that is not supported yet. * tree.def (REDUC_BIT_AND_EXPR, REDUC_BIT_IOR_EXPR, REDUC_BIT_XOR_EXPR) (REDUC_MULT_EXPR): Remove tree-codes that are not yet supported. * tree-vectorizer.c (reduction_code_for_scalar_code): Remove cases that are not yet unsupported. (vect_is_simple_reduction): Fix comment. * tree-vect-analyze.c (vect_analyze_operations): Minor changes to print messages. (vect_mark_stmts_to_be_vectorized): Likewise. (vect_determine_vectorization_factor): Likewise. * config/rs6000/rs6000.c (tree-flow.h, tree-data-ref.h) (tree-vectorizer.h): Include. (altivec_builtin_widening_summation): New global variable to hold the decl of the builtin widening_summation. (target_vect_recog_widening_summation_pattern): New pattern recognition function. (target_vect_pattern_recog_funcs): New global array to hold pointers to pattern-recognition functions. (rs6000_builtin_vect_pattern_recog): New function. Implements the target builtin vect_pattern_recog. (rs6000_expand_builtin): Support ALTIVEC_BUILTIN_WIDENING_SUMMATION case. (altivec_init_builtins): Add new function type v4si_ftype_v8hi_v4si. Initialize a new target builtin __builtin_altivec_widening_summation. Set altivec_builtin_widening_summation. * config/rs6000/rs6000.c (TARGET_VECT_NUM_PATTERNS): New define. (ALTIVEC_BUILTIN_WIDENING_SUMMATION): New builtin. * config/rs6000/t-rs6000 (tree-flow.h, tree-data-ref.h) (tree-vectorizer.h): Add dependency. (See attached file: diff.april3)
Attachment:
diff.april3
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |