This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] [patch] Support vectorization of min/max location pattern


This patch adds vectorization support of min/max location pattern:

  for (i = 0; i < N; i++)
    if (arr[i] < limit)
        pos = i + 1;
        limit = arr[i];

The recognized pattern is compound of two statements (and is called
compound pattern):

  # pos_22 = PHI <pos_1(4), 1(2)>
  # limit_24 = PHI <limit_4(4), 0(2)>
  pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;
  limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;

both statements should be reductions with cond_expr and have the same
condition part. The min/max statement is expected to be of the form "x op
y ? x : y" (where op can be >, <, >= or <=), and the location is expected
to be an induction.

To vectorize min/max location pattern we use a technique described in
"Multimedia vectorization of floating-point MIN/MAX reductions" by
A.J.C.Bik, X.Tian and M.B.Girkar,

Vectorized loop (maxloc, first index):
     vcx[0:vl-1:1] = | x |..| x |;  - vector of max values
     vck[0:vl-1:1] = | k |..| k |;  - vector of positions
     ind[0:vl-1:1] = |vl-1|..| 0 |;
     inc[0:vl-1:1] = | vl |..| vl |;
     for (i = 0; i < N; i += vl) {
       msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]);
       vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) |
                       (vck[0:vl-1:1] & !msk[0:vl-1:1]);
       vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]);
       ind[0:vl-1:1] += inc[0:vl-1:1];
     x = HMAX(vcx[0:vl-1:1]);       - scalar maximum extraction
     msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|);
     vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) |
                     (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]);
     k = HMIN(vck[0:vl-1:1]);       - first position extraction

Vectorization of minloc is supposed to help gas_dyn from Polyhedron as
discussed in PR 31067.

PRs 44710 and 44711 currently prevent the vectorization. PR 44711 can be
bypassed by using -fno-tree-pre. I'll wait for a fix of PR 44710 before I
commit this patch (after I regtest it again).
Also the case of pos = i; instead of pos = i+1; is not supported since in
this case the operands are switched, i.e., we get "x op y ? y : x".

My main question is the implementation of vector comparisons. I understand
that different targets can return different types of results. So instead of
defining new tree codes, I used target builtin which also returns the type
of the result.

Other comments are welcome too.

Bootstrapped and tested on powerpc64-suse-linux.



      * doc/tm.texi (TARGET_VECTORIZE_BUILTIN_VEC_CMP): Document.
      * target.h (struct vectorize): Add new target builtin.
      * tree-vectorizer.h (enum vect_compound_pattern): New.
      (struct _stmt_vec_info): Add new fields compound_pattern and
      reduc_scalar_result_stmt. Add macros to access them.
      (is_pattern_stmt_p): Return true for compound pattern.
      (vectorizable_condition): Add arguments.
      (vect_recog_compound_func_ptr): New function-pointer type.
      (vect_compound_pattern_recog): Declare.
      * tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert
      for compound patterns.
      (vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction
      patterns. Update comment.
      (vect_analyze_scalar_cycles): Update comment.
      (destroy_loop_vec_info): Update def stmt for the original pattern
      (vect_is_simple_reduction_1): Skip compound pattern statements in
      uses check. Add spaces. Skip commutativity and type checks for
      minimum location statement. Fix printings.
      (vect_model_reduction_cost): Add min/max location pattern cost
      (vect_create_epilogue_for_compound_pattern): New function.
      (vect_create_epilog_for_reduction): Don't retrieve the original
      statement for compound pattern. Fix comment accordingly. Store the
      result of vector reduction computation in a variable and use it. Call
      vect_create_epilogue_for_compound_pattern (). Check if optab exists
      before using it. Keep the scalar result computation statement. Use
      either exit phi node result or compound pattern result in scalar
      extraction. Don't expect to find an exit phi node for min/max
      (vectorizable_reduction): Skip check for uses in loop for compound
      patterns. Don't retrieve the original statement for compound pattern.
      Call vectorizable_condition () with additional parameters. Skip
      reduction code check for compound patterns. Prepare operands for
      min/max location statement vectorization and pass them to
      vectorizable_condition ().
      (vectorizable_live_operation): Return TRUE for compound patterns.
      * tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare.
      (vect_recog_compound_func_ptrs): Likewise.
      (vect_recog_min_max_loc_pattern): New function.
      (vect_compound_pattern_recog): Likewise.
      * target-def.h (TARGET_VECTORIZE_BUILTIN_VEC_CMP): New.
      * tree-vect-stmts.c (process_use): Mark compound pattern statements
      used by reduction.
      (vect_mark_stmts_to_be_vectorized): Allow compound pattern statements
      to be used by reduction.
      (vectorize_minmax_location_pattern): New function.
      (vectorizable_condition): Update comment, add arguments. Skip checks
      irrelevant for compound pattern. Check that vector comparisons are
      supported by the target. Prepare operands using new arguments. Call
      (vect_analyze_stmt): Allow nested cycle statements to be used by
      reduction. Call vectorizable_condition () with additional arguments.
      (vect_transform_stmt): Call vectorizable_condition () with additional
      (new_stmt_vec_info): Initialize new fields.
      * config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_VCMPLTFP): New.
      * config/rs6000/rs6000.c (rs6000_builtin_vect_compare): New.
      (struct builtin_description bdesc_2arg): Add altivec_vcmpltfp and
      * config/rs6000/ (altivec_vcmpltfp): New pattern.
      (altivec_vcmplefp): Likewise.
      * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound


      * gcc.dg/vect/vect.exp: Define how to run tests named fast-math*.c
      * lib/target-supports.exp (check_effective_target_vect_cmp): New.
      * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
      * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
      gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c: Likewise.

(See attached file: minloc.txt)(See attached file: minloc-tests.txt)

Attachment: minloc.txt
Description: Text document

Attachment: minloc-tests.txt
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]