[patch] Support vectorization of min/max location pattern - take 2
Richard Guenther
richard.guenther@gmail.com
Mon Aug 9 11:01:00 GMT 2010
On Mon, Aug 9, 2010 at 12:53 PM, Ira Rosen <IRAR@il.ibm.com> wrote:
>
>
> Richard Guenther <richard.guenther@gmail.com> wrote on 09/08/2010 12:50:14
> PM:
>> > I implemented VEC_COND_EXPR extension in the attached patch.
>> >
>> > For reduction epilogue I defined new tree codes
>> > REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR.
>>
>> Why do you need new tree codes here?
>
> After vector loop we have two vectors one with four minimums and the second
> with four corresponding array indexes. The extraction of the correct index
> out of four can be done differently on each platform (including problematic
> vector comparisons).
So the tree code is just to tie those two operations together?
>> They btw need
>> documentation - just stating the new operand is a vector isn't
>> very informative. They need documentation in generic.texi.
>
> Sorry about that, I'll add documentation for both.
Thanks.
>>
>> Likewise the new RTX codes (what are they for??)
>
> Probably there is a better way to do that, but I needed to map new vector
> comparison instructions that compare floats and return ints.
So you just need this at expansion time then and the RTXen
will never appear in RTL code? Why not use a target hook for
expanding those comparisons then? Btw, my GSoC student
implemented lowering of generic vector comparisons resulting
in a mask in tree-vect-generic.c using a target hook that eventually
uses target specific builtins. I attached the latest patch for that.
>> need documentation
>> in rtl.texi.
>>
>> Btw, you still don't adjust if-conversion to fold the COND_EXPR
>> it generates - that would generate the MIN/MAX expressions
>> directly and you wouldn't have to pattern match the COND_EXPR.
>
> I don't see how it can help to avoid pattern matching. We will still need
> to match MIN/MAX's arguments with the COND_EXPR arguments.
True, but you need to match MIN/MAX instead. Well, my point
is that if-convert shouldn't create a COND_EXPR in that case.
Richard.
> Thanks,
> Ira
>
>>
>> Richard.
>>
>> > Bootstrapped and tested on powerpc64-suse-linux.
>> > OK for mainline?
>> >
>> > Thanks,
>> > Ira
>> >
>> > ChangeLog:
>> >
>> > * tree-pretty-print.c (dump_generic_node): Handle new codes.
>> > * optabs.c (optab_for_tree_code): Likewise.
>> > (init_optabs): Initialize new optabs.
>> > (get_vcond_icode): Handle vector condition with different types
>> > of comparison and then/else operands.
>> > (expand_vec_cond_expr_p, expand_vec_cond_expr): Likewise.
>> > (get_vec_reduc_minloc_expr_icode): New function.
>> > (expand_vec_reduc_minloc_expr): New function.
>> > * optabs.h (enum convert_optab_index): Add new optabs.
>> > (vcondc_optab): Define.
>> > (vcondcu_optab, reduc_min_first_loc_optab,
> reduc_min_last_loc_optab,
>> > reduc_max_last_loc_optab): Likewise.
>> > (expand_vec_cond_expr_p): Add arguments.
>> > (get_vec_reduc_minloc_expr_code): Declare.
>> > (expand_vec_reduc_minloc_expr): Declare.
>> > * genopinit.c (optabs): Add vcondc_optab, vcondcu_optab,
>> > reduc_min_first_loc_optab, reduc_min_last_loc_optab,
>> > reduc_max_last_loc_optab.
>> > * rtl.def (GEF): New rtx.
>> > (GTF, LEF, LTF, EQF, NEQF): Likewise.
>> > * jump.c (reverse_condition): Handle new rtx.
>> > (swap_condition): Likewise.
>> > * expr.c (expand_expr_real_2): Expand new reduction tree codes.
>> > * gimple-pretty-print.c (dump_binary_rhs): Print new codes.
>> > * tree-vectorizer.h (enum vect_compound_pattern): New.
>> > (struct _stmt_vec_info): Add new field compound_pattern. Add
> macro
>> > to access it.
>> > (is_pattern_stmt_p): Return true for compound pattern.
>> > (get_minloc_reduc_epilogue_code): New.
>> > (vectorizable_condition): Add arguments.
>> > (vect_recog_compound_func_ptr): New function-pointer type.
>> > (NUM_COMPOUND_PATTERNS): New.
>> > (vect_compound_pattern_recog): Declare.
>> > * tree-vect-loop.c (vect_determine_vectorization_factor): Fix
> assert
>> > for compound patterns.
>> > (vect_analyze_scalar_cycles_1): Fix typo. Detect compound
> reduction
>> > patterns. Update comment.
>> > (vect_analyze_scalar_cycles): Update comment.
>> > (destroy_loop_vec_info): Update def stmt for the original
> pattern
>> > statement.
>> > (vect_is_simple_reduction_1): Skip compound pattern statements
> in
>> > uses check. Add spaces. Skip commutativity and type checks for
>> > minimum location statement. Fix printings.
>> > (vect_model_reduction_cost): Add min/max location pattern cost
>> > computation.
>> > (vect_create_epilog_for_reduction): Don't retrieve the original
>> > statement for compound pattern. Fix comment accordingly. Get
> tree
>> > code for reduction epilogue of min/max location computation
>> > according to the comparison operation. Don't expect to find an
>> > exit phi node for min/max statement.
>> > (vectorizable_reduction): Skip check for uses in loop for
> compound
>> > patterns. Don't retrieve the original statement for compound
> pattern.
>> > Call vectorizable_condition () with additional parameters. Skip
>> > reduction code check for compound patterns. Prepare operands for
>> > min/max location statement vectorization and pass them to
>> > vectorizable_condition ().
>> > (vectorizable_live_operation): Return TRUE for compound
> patterns.
>> > * tree.def (REDUC_MIN_FIRST_LOC_EXPR): Define.
>> > (REDUC_MIN_LAST_LOC_EXPR, REDUC_MAX_FIRST_LOC_EXPR,
>> > REDUC_MAX_LAST_LOC_EXPR): Likewise.
>> > * cfgexpand.c (expand_debug_expr): Handle new tree codes.
>> > * tree-vect-patterns.c (vect_recog_min_max_loc_pattern):
> Declare.
>> > (vect_recog_compound_func_ptrs): Likewise.
>> > (vect_recog_min_max_loc_pattern): New function.
>> > (vect_compound_pattern_recog): Likewise.
>> > * tree-vect-stmts.c (process_use): Mark compound pattern
> statements
>> > as
>> > used by reduction.
>> > (vect_mark_stmts_to_be_vectorized): Allow compound pattern
> statements
>> > to be used by reduction.
>> > (vectorizable_condition): Update comment, add arguments. Skip
> checks
>> > irrelevant for compound pattern. Check that if comparison and
>> > then/else
>> > operands are of different types, the size of the types is
> equal.Check
>> > that reduction epilogue, if needed, is supported. Prepare
> operands
>> > using new arguments.
>> > (vect_analyze_stmt): Allow nested cycle statements to be used by
>> > reduction. Call vectorizable_condition () with additional
> arguments.
>> > (vect_transform_stmt): Call vectorizable_condition () with
> additional
>> > arguments.
>> > (new_stmt_vec_info): Initialize new fields.
>> > * tree-inline.c (estimate_operator_cost): Handle new tree codes.
>> > * tree-vect-generic.c (expand_vector_operations_1): Likewise.
>> > * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>> > * config/rs6000/rs6000.c (rs6000_emit_vector_compare_inner): Add
>> > argument. Handle new rtx.
>> > (rs6000_emit_vector_compare): Handle the case of result type
>> > different
>> > from the operands, update calls to
> rs6000_emit_vector_compare_inner
>> > ().
>> > (rs6000_emit_vector_cond_expr): Use new codes in case of
> different
>> > types.
>> > * config/rs6000/altivec.md (UNSPEC_REDUC_MINLOC): New.
>> > (altivec_gefv4sf): New pattern.
>> > (altivec_gtfv4sf, altivec_eqfv4sf, reduc_min_first_loc_v4sfv4si,
>> > reduc_min_last_loc_v4sfv4si, reduc_max_first_loc_v4sfv4si,
>> > reduc_max_last_loc_v4sfv4si): Likewise.
>> > * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for
> compound
>> > patterns.
>> >
>> > testsuite/ChangeLog:
>> >
>> > * gcc.dg/vect/vect.exp: Define how to run tests named
> fast-math*.c
>> > * lib/target-supports.exp (check_effective_target_vect_cmp):
> New.
>> > * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
>> > * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
>> > gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
>> > gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
>> > gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
>> > gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
>> > gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
>> > gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
>> > gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c,
>> > gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c: Likewise.
>> >
>> >
>> > (See attached file: minloc.txt)
>> >
>> >>
>> >> I can think of 2 portability problems with your current solution:
>> >>
>> >> (1) SSE4.1 would prefer to use BLEND instructions, which perform
>> >> that entire (X & M) | (Y & ~M) operation in one insn.
>> >>
>> >> (2) The mips C.cond.PS instruction does *not* produce a bitmask
>> >> like altivec or sse do. Instead it sets multiple condition
>> >> codes. One then uses MOV[TF].PS to merge the elements based
>> >> on the individual condition codes. While there's no direct
>> >> corresponding instruction that will operate on integers, I
>> >> don't think it would be too difficult to use MOV[TF].G or
>> >> BC1AND2[FT] instructions to emulate it. In any case, this
>> >> is again a case where you don't want to expose any part of
>> >> the VEC_COND at the gimple level.
>> >>
>> >>
>> >> r~
>
>
More information about the Gcc-patches
mailing list