This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] PR88497 - Extend reassoc for vector bit_field_ref
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Kewen.Lin" <linkw at linux dot ibm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Bill Schmidt <wschmidt at linux dot ibm dot com>, Segher Boessenkool <segher at kernel dot crashing dot org>, Richard Guenther <rguenther at suse dot de>
- Date: Fri, 8 Mar 2019 11:57:28 +0100
- Subject: Re: [PATCH] PR88497 - Extend reassoc for vector bit_field_ref
- References: <3fc7aa9b-443e-10eb-ab01-c8043567f2e5@linux.ibm.com>
On Fri, Mar 8, 2019 at 2:40 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> As PR88497 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88497),
> when we meet some code pattern like:
>
> V1[0] + V1[1] + ... + V1[k] + V2[0] + ... + V2[k] + ... Vn[k]
> // V1...Vn of VECTOR_TYPE
>
> We can teach reassoc to transform it to:
>
> Vs = (V1 + V2 + ... + Vn)
> Vs[0] + Vs[1] + ... + Vs[k]
>
> It saves addition and bit_field_ref operations and exposes more
> opportunities for downstream passes, I notice that even on one
> target doesn't support vector type and vector type gets expanded
> in veclower, it's still better to have it, since the generated
> sequence is more friendly for widening_mul. (If one more time
> DCE after forwprop, it should be the same. )
>
> Bootstrapped/regtested on powerpc64le-linux-gnu, ok for trunk?
Hmm. You are basically vectorizing the reduction partially here (note
basic-block vectorization doesn't handle reductions yet). So I'm not sure
whether we should eventually just change op sort order to sort
v1[1] after v2[0] to sort v1[0] and v2[0] together. That would be also maybe
an intermediate step that makes implementing the vectorization part
"easier"? I also imagine that the "vectorization" would be profitable even
if there's just two elements of the vectors summed and the real vectors are
larger.
Note that the code you transform contains no vector operations (just
element extracts) and thus you need to check for target support before
creating vector add.
This is for GCC 10. Also it doesn't fix PR88497, does it?
Richard.
> Thanks in advance!
>
>
> gcc/ChangeLog
>
> 2019-03-08 Kewen Lin <linkw@gcc.gnu.org>
>
> PR target/88497
> * tree-ssa-reassoc.c (reassociate_bb): Swap the positions of
> GIMPLE_BINARY_RHS check and gimple_visited_p check, call new
> function undistribute_bitref_for_vector.
> (undistribute_bitref_for_vector): New function.
> (cleanup_vinfo_map): Likewise.
> (unsigned_cmp): Likewise.
>
> gcc/testsuite/ChangeLog
>
> 2019-03-08 Kewen Lin <linkw@gcc.gnu.org>
>
> * gcc.dg/tree-ssa/pr88497.c: New test.
>
> ---
> gcc/testsuite/gcc.dg/tree-ssa/pr88497.c | 18 +++
> gcc/tree-ssa-reassoc.c | 274 +++++++++++++++++++++++++++++++-
> 2 files changed, 287 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c
> new file mode 100644
> index 0000000..4d9ac67
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
> +typedef double v2df __attribute__ ((vector_size (16)));
> +double
> +test (double accumulator, v2df arg1[], v2df arg2[])
> +{
> + v2df temp;
> + temp = arg1[0] * arg2[0];
> + accumulator += temp[0] + temp[1];
> + temp = arg1[1] * arg2[1];
> + accumulator += temp[0] + temp[1];
> + temp = arg1[2] * arg2[2];
> + accumulator += temp[0] + temp[1];
> + temp = arg1[3] * arg2[3];
> + accumulator += temp[0] + temp[1];
> + return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" } } */
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index e1c4dfe..fc0e297 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -1772,6 +1772,263 @@ undistribute_ops_list (enum tree_code opcode,
> return changed;
> }
>
> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
> + - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
> + - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF. */
> +struct v_info
> +{
> + auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
> + auto_vec<unsigned, 32> ops_indexes;
> +};
> +
> +typedef struct v_info *v_info_ptr;
> +
> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets. */
> +static int
> +unsigned_cmp (const void *p_i, const void *p_j)
> +{
> + if (*(const unsigned *) p_i >= *(const unsigned *) p_j)
> + return 1;
> + else
> + return -1;
> +}
> +
> +/* Cleanup hash map for VECTOR information. */
> +static void
> +cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
> +{
> + for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
> + it != info_map.end (); ++it)
> + {
> + v_info_ptr info = (*it).second;
> + delete info;
> + (*it).second = NULL;
> + }
> +}
> +
> +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
> + V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
> + is transformed to
> + Vs = (V1 + V2 + ... + Vn)
> + Vs[0] + Vs[1] + ... + Vs[k]
> +
> + The basic steps are listed below:
> +
> + 1) Check the addition chain *OPS by looking those summands coming from
> + VECTOR bit_field_ref on VECTOR type. Put the information into
> + v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
> +
> + 2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
> + continous, they can cover the whole VECTOR perfectly without any holes.
> + Obtain one VECTOR list which contain candidates to be transformed.
> +
> + 3) Build the addition statements for all VECTOR candidates, generate
> + BIT_FIELD_REFs accordingly.
> +
> + TODO: Now the implementation restrict all candidate VECTORs should have the
> + same VECTOR type, it can be extended into different groups by VECTOR types
> + in future if any profitable cases found. */
> +static bool
> +undistribute_bitref_for_vector (enum tree_code opcode, vec<operand_entry *> *ops,
> + struct loop *loop)
> +{
> + if (ops->length () <= 1 || opcode != PLUS_EXPR)
> + return false;
> +
> + hash_map<tree, v_info_ptr> v_info_map;
> + operand_entry *oe1;
> + unsigned i;
> +
> + /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
> + information into map. */
> + FOR_EACH_VEC_ELT (*ops, i, oe1)
> + {
> + enum tree_code dcode;
> + gimple *oe1def;
> +
> + if (TREE_CODE (oe1->op) != SSA_NAME)
> + continue;
> + oe1def = SSA_NAME_DEF_STMT (oe1->op);
> + if (!is_gimple_assign (oe1def))
> + continue;
> + dcode = gimple_assign_rhs_code (oe1def);
> + if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
> + continue;
> +
> + tree rhs = gimple_op (oe1def, 1);
> + tree op0 = TREE_OPERAND (rhs, 0);
> +
> + if (TREE_CODE (op0) != SSA_NAME
> + || TREE_CODE (TREE_TYPE (op0)) != VECTOR_TYPE)
> + continue;
> +
> + tree op1 = TREE_OPERAND (rhs, 1);
> + tree op2 = TREE_OPERAND (rhs, 2);
> +
> + tree elem_type = TREE_TYPE (TREE_TYPE (op0));
> + unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> + if (size != TREE_INT_CST_LOW (op1))
> + continue;
> +
> + v_info_ptr *info_ptr = v_info_map.get (op0);
> + if (info_ptr)
> + {
> + v_info_ptr info = *info_ptr;
> + info->offsets.safe_push (TREE_INT_CST_LOW (op2));
> + info->ops_indexes.safe_push (i);
> + }
> + else
> + {
> + v_info_ptr info = new v_info;
> + info->offsets.safe_push (TREE_INT_CST_LOW (op2));
> + info->ops_indexes.safe_push (i);
> + v_info_map.put (op0, info);
> + }
> + }
> +
> + /* At least two VECTOR to combine. */
> + if (v_info_map.elements () <= 1)
> + {
> + cleanup_vinfo_map (v_info_map);
> + return false;
> + }
> +
> + /* Use the first VECTOR and its information as the reference.
> + Firstly, we should validate it, that is:
> + 1) sorted offsets are adjacent, no holes.
> + 2) can fill the whole VECTOR perfectly. */
> + hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
> + tree ref_vec = (*it).first;
> + v_info_ptr ref_info = (*it).second;
> + ref_info->offsets.qsort (unsigned_cmp);
> + tree elem_type = TREE_TYPE (TREE_TYPE (ref_vec));
> + unsigned HOST_WIDE_INT elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> + unsigned HOST_WIDE_INT curr;
> + unsigned HOST_WIDE_INT prev = ref_info->offsets[0];
> +
> + FOR_EACH_VEC_ELT_FROM (ref_info->offsets, i, curr, 1)
> + {
> + /* Continous check. */
> + if (curr != (prev + elem_size))
> + {
> + cleanup_vinfo_map (v_info_map);
> + return false;
> + }
> + prev = curr;
> + }
> +
> + /* Check whether fill the whole. */
> + if ((prev + elem_size) != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (ref_vec))))
> + {
> + cleanup_vinfo_map (v_info_map);
> + return false;
> + }
> +
> + auto_vec<tree> vectors (v_info_map.elements ());
> + vectors.quick_push (ref_vec);
> +
> + /* Use the ref_vec to filter others. */
> + for (++it; it != v_info_map.end (); ++it)
> + {
> + tree vec = (*it).first;
> + v_info_ptr info = (*it).second;
> + if (TREE_TYPE (ref_vec) != TREE_TYPE (vec))
> + continue;
> + if (ref_info->offsets.length () != info->offsets.length ())
> + continue;
> + bool same_offset = true;
> + info->offsets.qsort (unsigned_cmp);
> + for (unsigned i = 0; i < ref_info->offsets.length (); i++)
> + {
> + if (ref_info->offsets[i] != info->offsets[i])
> + {
> + same_offset = false;
> + break;
> + }
> + }
> + if (!same_offset)
> + continue;
> + vectors.quick_push (vec);
> + }
> +
> + if (vectors.length () < 2)
> + {
> + cleanup_vinfo_map (v_info_map);
> + return false;
> + }
> +
> + tree tr;
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "The bit_field_ref vector list for summation: ");
> + FOR_EACH_VEC_ELT (vectors, i, tr)
> + {
> + print_generic_expr (dump_file, tr);
> + fprintf (dump_file, " ");
> + }
> + fprintf (dump_file, "\n");
> + }
> +
> + /* Build the sum for all candidate VECTORs. */
> + unsigned idx;
> + gimple *sum = NULL;
> + v_info_ptr info;
> + tree sum_vec = ref_vec;
> + FOR_EACH_VEC_ELT_FROM (vectors, i, tr, 1)
> + {
> + sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
> + info = *(v_info_map.get (tr));
> + unsigned j;
> + FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
> + {
> + gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
> + gimple_set_visited (def, true);
> + (*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
> + (*ops)[idx]->rank = 0;
> + }
> + sum_vec = gimple_get_lhs (sum);
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Generating addition -> ");
> + print_gimple_stmt (dump_file, sum, 0);
> + }
> + }
> +
> + /* Referring to any good shape vector (here using ref_vec), generate the
> + BIT_FIELD_REF statements accordingly. */
> + info = *(v_info_map.get (ref_vec));
> + gcc_assert (sum);
> + FOR_EACH_VEC_ELT (info->ops_indexes, i, idx)
> + {
> + tree dst = make_ssa_name (elem_type);
> + gimple *gs
> + = gimple_build_assign (dst, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, elem_type, sum_vec,
> + TYPE_SIZE (elem_type),
> + bitsize_int (info->offsets[i])));
> + insert_stmt_after (gs, sum);
> + update_stmt (gs);
> + gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
> + gimple_set_visited (def, true);
> + (*ops)[idx]->op = gimple_assign_lhs (gs);
> + (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Generating bit_field_ref -> ");
> + print_gimple_stmt (dump_file, gs, 0);
> + }
> + }
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
> + }
> +
> + cleanup_vinfo_map (v_info_map);
> +
> + return true;
> +}
> +
> /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
> expression, examine the other OPS to see if any of them are comparisons
> of the same values, which we may be able to combine or eliminate.
> @@ -5880,11 +6137,6 @@ reassociate_bb (basic_block bb)
> tree lhs, rhs1, rhs2;
> enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
>
> - /* If this is not a gimple binary expression, there is
> - nothing for us to do with it. */
> - if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
> - continue;
> -
> /* If this was part of an already processed statement,
> we don't need to touch it again. */
> if (gimple_visited_p (stmt))
> @@ -5911,6 +6163,11 @@ reassociate_bb (basic_block bb)
> continue;
> }
>
> + /* If this is not a gimple binary expression, there is
> + nothing for us to do with it. */
> + if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
> + continue;
> +
> lhs = gimple_assign_lhs (stmt);
> rhs1 = gimple_assign_rhs1 (stmt);
> rhs2 = gimple_assign_rhs2 (stmt);
> @@ -5950,6 +6207,13 @@ reassociate_bb (basic_block bb)
> optimize_ops_list (rhs_code, &ops);
> }
>
> + if (undistribute_bitref_for_vector (rhs_code, &ops,
> + loop_containing_stmt (stmt)))
> + {
> + ops.qsort (sort_by_operand_rank);
> + optimize_ops_list (rhs_code, &ops);
> + }
> +
> if (rhs_code == PLUS_EXPR
> && transform_add_to_multiply (&ops))
> ops.qsort (sort_by_operand_rank);
> --
> 2.7.4
>