[patch] Support vectorization of min/max location pattern - take 2

Ira Rosen IRAR@il.ibm.com
Mon Aug 9 10:58:00 GMT 2010



Richard Guenther <richard.guenther@gmail.com> wrote on 09/08/2010 12:50:14
PM:
> > I implemented VEC_COND_EXPR extension in the attached patch.
> >
> > For reduction epilogue I defined new tree codes
> > REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR.
>
> Why do you need new tree codes here?

After vector loop we have two vectors one with four minimums and the second
with four corresponding array indexes. The extraction of the correct index
out of four can be done differently on each platform (including problematic
vector comparisons).

> They btw need
> documentation - just stating the new operand is a vector isn't
> very informative.  They need documentation in generic.texi.

Sorry about that, I'll add documentation for both.

>
> Likewise the new RTX codes (what are they for??)

Probably there is a better way to do that, but I needed to map new vector
comparison instructions that compare floats and return ints.

> need documentation
> in rtl.texi.
>
> Btw, you still don't adjust if-conversion to fold the COND_EXPR
> it generates - that would generate the MIN/MAX expressions
> directly and you wouldn't have to pattern match the COND_EXPR.

I don't see how it can help to avoid pattern matching. We will still need
to match MIN/MAX's arguments with the COND_EXPR arguments.

Thanks,
Ira

>
> Richard.
>
> > Bootstrapped and tested on powerpc64-suse-linux.
> > OK for mainline?
> >
> > Thanks,
> > Ira
> >
> > ChangeLog:
> >
> >        * tree-pretty-print.c (dump_generic_node): Handle new codes.
> >        * optabs.c (optab_for_tree_code): Likewise.
> >        (init_optabs): Initialize new optabs.
> >        (get_vcond_icode): Handle vector condition with different types
> >        of comparison and then/else operands.
> >        (expand_vec_cond_expr_p, expand_vec_cond_expr): Likewise.
> >        (get_vec_reduc_minloc_expr_icode): New function.
> >        (expand_vec_reduc_minloc_expr): New function.
> >        * optabs.h (enum convert_optab_index): Add new optabs.
> >        (vcondc_optab): Define.
> >        (vcondcu_optab, reduc_min_first_loc_optab,
reduc_min_last_loc_optab,
> >        reduc_max_last_loc_optab): Likewise.
> >        (expand_vec_cond_expr_p): Add arguments.
> >        (get_vec_reduc_minloc_expr_code): Declare.
> >        (expand_vec_reduc_minloc_expr): Declare.
> >        * genopinit.c (optabs): Add vcondc_optab, vcondcu_optab,
> >        reduc_min_first_loc_optab, reduc_min_last_loc_optab,
> >        reduc_max_last_loc_optab.
> >        * rtl.def (GEF): New rtx.
> >        (GTF, LEF, LTF, EQF, NEQF): Likewise.
> >        * jump.c (reverse_condition): Handle new rtx.
> >        (swap_condition): Likewise.
> >        * expr.c (expand_expr_real_2): Expand new reduction tree codes.
> >        * gimple-pretty-print.c (dump_binary_rhs): Print new codes.
> >        * tree-vectorizer.h (enum vect_compound_pattern): New.
> >        (struct _stmt_vec_info): Add new field compound_pattern. Add
macro
> >        to access it.
> >        (is_pattern_stmt_p): Return true for compound pattern.
> >        (get_minloc_reduc_epilogue_code): New.
> >        (vectorizable_condition): Add arguments.
> >        (vect_recog_compound_func_ptr): New function-pointer type.
> >        (NUM_COMPOUND_PATTERNS): New.
> >        (vect_compound_pattern_recog): Declare.
> >        * tree-vect-loop.c (vect_determine_vectorization_factor): Fix
assert
> >        for compound patterns.
> >        (vect_analyze_scalar_cycles_1): Fix typo. Detect compound
reduction
> >        patterns. Update comment.
> >        (vect_analyze_scalar_cycles): Update comment.
> >        (destroy_loop_vec_info): Update def stmt for the original
pattern
> >        statement.
> >        (vect_is_simple_reduction_1): Skip compound pattern statements
in
> >        uses check. Add spaces. Skip commutativity and type checks for
> >        minimum location statement. Fix printings.
> >        (vect_model_reduction_cost): Add min/max location pattern cost
> >        computation.
> >        (vect_create_epilog_for_reduction): Don't retrieve the original
> >        statement for compound pattern. Fix comment accordingly. Get
tree
> >        code for reduction epilogue of min/max location computation
> >        according to the comparison operation. Don't expect to find an
> >        exit phi node for min/max statement.
> >        (vectorizable_reduction): Skip check for uses in loop for
compound
> >        patterns. Don't retrieve the original statement for compound
pattern.
> >        Call vectorizable_condition () with additional parameters. Skip
> >        reduction code check for compound patterns. Prepare operands for
> >        min/max location statement vectorization and pass them to
> >        vectorizable_condition ().
> >        (vectorizable_live_operation): Return TRUE for compound
patterns.
> >        * tree.def (REDUC_MIN_FIRST_LOC_EXPR): Define.
> >        (REDUC_MIN_LAST_LOC_EXPR, REDUC_MAX_FIRST_LOC_EXPR,
> >        REDUC_MAX_LAST_LOC_EXPR): Likewise.
> >        * cfgexpand.c (expand_debug_expr): Handle new tree codes.
> >        * tree-vect-patterns.c (vect_recog_min_max_loc_pattern):
Declare.
> >        (vect_recog_compound_func_ptrs): Likewise.
> >        (vect_recog_min_max_loc_pattern): New function.
> >        (vect_compound_pattern_recog): Likewise.
> >        * tree-vect-stmts.c (process_use): Mark compound pattern
statements
> > as
> >        used by reduction.
> >        (vect_mark_stmts_to_be_vectorized): Allow compound pattern
statements
> >        to be used by reduction.
> >        (vectorizable_condition): Update comment, add arguments. Skip
checks
> >        irrelevant for compound pattern. Check that if comparison and
> > then/else
> >        operands are of different types, the size of the types is
equal.Check
> >        that reduction epilogue, if needed, is supported. Prepare
operands
> >        using new arguments.
> >        (vect_analyze_stmt): Allow nested cycle statements to be used by
> >        reduction. Call vectorizable_condition () with additional
arguments.
> >        (vect_transform_stmt): Call vectorizable_condition () with
additional
> >        arguments.
> >        (new_stmt_vec_info): Initialize new fields.
> >        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
> >        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
> >        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
> >        * config/rs6000/rs6000.c (rs6000_emit_vector_compare_inner): Add
> >        argument. Handle new rtx.
> >        (rs6000_emit_vector_compare): Handle the case of result type
> > different
> >        from the operands, update calls to
rs6000_emit_vector_compare_inner
> > ().
> >        (rs6000_emit_vector_cond_expr): Use new codes in case of
different
> >        types.
> >        * config/rs6000/altivec.md (UNSPEC_REDUC_MINLOC): New.
> >        (altivec_gefv4sf): New pattern.
> >        (altivec_gtfv4sf, altivec_eqfv4sf, reduc_min_first_loc_v4sfv4si,
> >        reduc_min_last_loc_v4sfv4si, reduc_max_first_loc_v4sfv4si,
> >        reduc_max_last_loc_v4sfv4si): Likewise.
> >        * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for
compound
> >        patterns.
> >
> > testsuite/ChangeLog:
> >
> >        * gcc.dg/vect/vect.exp: Define how to run tests named
fast-math*.c
> >        * lib/target-supports.exp (check_effective_target_vect_cmp):
New.
> >        * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
> >        * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c: Likewise.
> >
> >
> > (See attached file: minloc.txt)
> >
> >>
> >> I can think of 2 portability problems with your current solution:
> >>
> >> (1) SSE4.1 would prefer to use BLEND instructions, which perform
> >>     that entire (X & M) | (Y & ~M) operation in one insn.
> >>
> >> (2) The mips C.cond.PS instruction does *not* produce a bitmask
> >>     like altivec or sse do.  Instead it sets multiple condition
> >>     codes.  One then uses MOV[TF].PS to merge the elements based
> >>     on the individual condition codes.  While there's no direct
> >>     corresponding instruction that will operate on integers, I
> >>     don't think it would be too difficult to use MOV[TF].G or
> >>     BC1AND2[FT] instructions to emulate it.  In any case, this
> >>     is again a case where you don't want to expose any part of
> >>     the VEC_COND at the gimple level.
> >>
> >>
> >> r~



More information about the Gcc-patches mailing list