[patch] Support vectorization of min/max location pattern - take 2

Richard Guenther richard.guenther@gmail.com
Mon Aug 9 11:01:00 GMT 2010


On Mon, Aug 9, 2010 at 12:53 PM, Ira Rosen <IRAR@il.ibm.com> wrote:
>
>
> Richard Guenther <richard.guenther@gmail.com> wrote on 09/08/2010 12:50:14
> PM:
>> > I implemented VEC_COND_EXPR extension in the attached patch.
>> >
>> > For reduction epilogue I defined new tree codes
>> > REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR.
>>
>> Why do you need new tree codes here?
>
> After vector loop we have two vectors one with four minimums and the second
> with four corresponding array indexes. The extraction of the correct index
> out of four can be done differently on each platform (including problematic
> vector comparisons).

So the tree code is just to tie those two operations together?

>> They btw need
>> documentation - just stating the new operand is a vector isn't
>> very informative.  They need documentation in generic.texi.
>
> Sorry about that, I'll add documentation for both.

Thanks.

>>
>> Likewise the new RTX codes (what are they for??)
>
> Probably there is a better way to do that, but I needed to map new vector
> comparison instructions that compare floats and return ints.

So you just need this at expansion time then and the RTXen
will never appear in RTL code?  Why not use a target hook for
expanding those comparisons then?  Btw, my GSoC student
implemented lowering of generic vector comparisons resulting
in a mask in tree-vect-generic.c using a target hook that eventually
uses target specific builtins.  I attached the latest patch for that.

>> need documentation
>> in rtl.texi.
>>
>> Btw, you still don't adjust if-conversion to fold the COND_EXPR
>> it generates - that would generate the MIN/MAX expressions
>> directly and you wouldn't have to pattern match the COND_EXPR.
>
> I don't see how it can help to avoid pattern matching. We will still need
> to match MIN/MAX's arguments with the COND_EXPR arguments.

True, but you need to match MIN/MAX instead.  Well, my point
is that if-convert shouldn't create a COND_EXPR in that case.

Richard.

> Thanks,
> Ira
>
>>
>> Richard.
>>
>> > Bootstrapped and tested on powerpc64-suse-linux.
>> > OK for mainline?
>> >
>> > Thanks,
>> > Ira
>> >
>> > ChangeLog:
>> >
>> >        * tree-pretty-print.c (dump_generic_node): Handle new codes.
>> >        * optabs.c (optab_for_tree_code): Likewise.
>> >        (init_optabs): Initialize new optabs.
>> >        (get_vcond_icode): Handle vector condition with different types
>> >        of comparison and then/else operands.
>> >        (expand_vec_cond_expr_p, expand_vec_cond_expr): Likewise.
>> >        (get_vec_reduc_minloc_expr_icode): New function.
>> >        (expand_vec_reduc_minloc_expr): New function.
>> >        * optabs.h (enum convert_optab_index): Add new optabs.
>> >        (vcondc_optab): Define.
>> >        (vcondcu_optab, reduc_min_first_loc_optab,
> reduc_min_last_loc_optab,
>> >        reduc_max_last_loc_optab): Likewise.
>> >        (expand_vec_cond_expr_p): Add arguments.
>> >        (get_vec_reduc_minloc_expr_code): Declare.
>> >        (expand_vec_reduc_minloc_expr): Declare.
>> >        * genopinit.c (optabs): Add vcondc_optab, vcondcu_optab,
>> >        reduc_min_first_loc_optab, reduc_min_last_loc_optab,
>> >        reduc_max_last_loc_optab.
>> >        * rtl.def (GEF): New rtx.
>> >        (GTF, LEF, LTF, EQF, NEQF): Likewise.
>> >        * jump.c (reverse_condition): Handle new rtx.
>> >        (swap_condition): Likewise.
>> >        * expr.c (expand_expr_real_2): Expand new reduction tree codes.
>> >        * gimple-pretty-print.c (dump_binary_rhs): Print new codes.
>> >        * tree-vectorizer.h (enum vect_compound_pattern): New.
>> >        (struct _stmt_vec_info): Add new field compound_pattern. Add
> macro
>> >        to access it.
>> >        (is_pattern_stmt_p): Return true for compound pattern.
>> >        (get_minloc_reduc_epilogue_code): New.
>> >        (vectorizable_condition): Add arguments.
>> >        (vect_recog_compound_func_ptr): New function-pointer type.
>> >        (NUM_COMPOUND_PATTERNS): New.
>> >        (vect_compound_pattern_recog): Declare.
>> >        * tree-vect-loop.c (vect_determine_vectorization_factor): Fix
> assert
>> >        for compound patterns.
>> >        (vect_analyze_scalar_cycles_1): Fix typo. Detect compound
> reduction
>> >        patterns. Update comment.
>> >        (vect_analyze_scalar_cycles): Update comment.
>> >        (destroy_loop_vec_info): Update def stmt for the original
> pattern
>> >        statement.
>> >        (vect_is_simple_reduction_1): Skip compound pattern statements
> in
>> >        uses check. Add spaces. Skip commutativity and type checks for
>> >        minimum location statement. Fix printings.
>> >        (vect_model_reduction_cost): Add min/max location pattern cost
>> >        computation.
>> >        (vect_create_epilog_for_reduction): Don't retrieve the original
>> >        statement for compound pattern. Fix comment accordingly. Get
> tree
>> >        code for reduction epilogue of min/max location computation
>> >        according to the comparison operation. Don't expect to find an
>> >        exit phi node for min/max statement.
>> >        (vectorizable_reduction): Skip check for uses in loop for
> compound
>> >        patterns. Don't retrieve the original statement for compound
> pattern.
>> >        Call vectorizable_condition () with additional parameters. Skip
>> >        reduction code check for compound patterns. Prepare operands for
>> >        min/max location statement vectorization and pass them to
>> >        vectorizable_condition ().
>> >        (vectorizable_live_operation): Return TRUE for compound
> patterns.
>> >        * tree.def (REDUC_MIN_FIRST_LOC_EXPR): Define.
>> >        (REDUC_MIN_LAST_LOC_EXPR, REDUC_MAX_FIRST_LOC_EXPR,
>> >        REDUC_MAX_LAST_LOC_EXPR): Likewise.
>> >        * cfgexpand.c (expand_debug_expr): Handle new tree codes.
>> >        * tree-vect-patterns.c (vect_recog_min_max_loc_pattern):
> Declare.
>> >        (vect_recog_compound_func_ptrs): Likewise.
>> >        (vect_recog_min_max_loc_pattern): New function.
>> >        (vect_compound_pattern_recog): Likewise.
>> >        * tree-vect-stmts.c (process_use): Mark compound pattern
> statements
>> > as
>> >        used by reduction.
>> >        (vect_mark_stmts_to_be_vectorized): Allow compound pattern
> statements
>> >        to be used by reduction.
>> >        (vectorizable_condition): Update comment, add arguments. Skip
> checks
>> >        irrelevant for compound pattern. Check that if comparison and
>> > then/else
>> >        operands are of different types, the size of the types is
> equal.Check
>> >        that reduction epilogue, if needed, is supported. Prepare
> operands
>> >        using new arguments.
>> >        (vect_analyze_stmt): Allow nested cycle statements to be used by
>> >        reduction. Call vectorizable_condition () with additional
> arguments.
>> >        (vect_transform_stmt): Call vectorizable_condition () with
> additional
>> >        arguments.
>> >        (new_stmt_vec_info): Initialize new fields.
>> >        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
>> >        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
>> >        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>> >        * config/rs6000/rs6000.c (rs6000_emit_vector_compare_inner): Add
>> >        argument. Handle new rtx.
>> >        (rs6000_emit_vector_compare): Handle the case of result type
>> > different
>> >        from the operands, update calls to
> rs6000_emit_vector_compare_inner
>> > ().
>> >        (rs6000_emit_vector_cond_expr): Use new codes in case of
> different
>> >        types.
>> >        * config/rs6000/altivec.md (UNSPEC_REDUC_MINLOC): New.
>> >        (altivec_gefv4sf): New pattern.
>> >        (altivec_gtfv4sf, altivec_eqfv4sf, reduc_min_first_loc_v4sfv4si,
>> >        reduc_min_last_loc_v4sfv4si, reduc_max_first_loc_v4sfv4si,
>> >        reduc_max_last_loc_v4sfv4si): Likewise.
>> >        * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for
> compound
>> >        patterns.
>> >
>> > testsuite/ChangeLog:
>> >
>> >        * gcc.dg/vect/vect.exp: Define how to run tests named
> fast-math*.c
>> >        * lib/target-supports.exp (check_effective_target_vect_cmp):
> New.
>> >        * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
>> >        * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c: Likewise.
>> >
>> >
>> > (See attached file: minloc.txt)
>> >
>> >>
>> >> I can think of 2 portability problems with your current solution:
>> >>
>> >> (1) SSE4.1 would prefer to use BLEND instructions, which perform
>> >>     that entire (X & M) | (Y & ~M) operation in one insn.
>> >>
>> >> (2) The mips C.cond.PS instruction does *not* produce a bitmask
>> >>     like altivec or sse do.  Instead it sets multiple condition
>> >>     codes.  One then uses MOV[TF].PS to merge the elements based
>> >>     on the individual condition codes.  While there's no direct
>> >>     corresponding instruction that will operate on integers, I
>> >>     don't think it would be too difficult to use MOV[TF].G or
>> >>     BC1AND2[FT] instructions to emulate it.  In any case, this
>> >>     is again a case where you don't want to expose any part of
>> >>     the VEC_COND at the gimple level.
>> >>
>> >>
>> >> r~
>
>



More information about the Gcc-patches mailing list