This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][gimple-interchange] Add reduction validity check
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: gcc-patches List <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 4 Dec 2017 15:13:33 +0000
- Subject: Re: [PATCH][gimple-interchange] Add reduction validity check
- Authentication-results: sourceware.org; auth=none
- References: <alpine.LSU.2.20.1712041404260.12252@zhemvz.fhfr.qr>
On Mon, Dec 4, 2017 at 1:11 PM, Richard Biener <rguenther@suse.de> wrote:
>
> I've noticed we perform FP reduction association without the required
> checks for associative math. I've added
> gcc.dg/tree-ssa/loop-interchange-1b.c to cover this.
>
> I also noticed we happily interchange a loop with a reduction like
>
> sum = a[i] - sum;
>
> where a change in order of elements isn't ok. Unfortunately bwaves
> is exactly a case where single_use != next_def (tried to simply remove
> that case for now), because reassoc didn't have a chance to fix the
> operand order. Thus this patch exports the relevant handling from
> the vectorizer (for stage1 having a separate infrastructure gathering /
> analyzing of reduction/induction infrastructure would be nice...)
> and uses it from interchange. We then don't handle
> gcc.dg/tree-ssa/loop-interchange-4.c anymore (similar vectorizer
> missed-opt is PR65930). I didn't bother to split up the vectorizer
> code further to implement relaxed validity checking but simply XFAILed
> this testcase.
>
> Earlier I simplified allocation stuff in the main loop which is why
> this part is included in this patch.
>
> Bootstrap running on x86_64-unknown-linux-gnu.
>
> I'll see to craft a testcase with the sum = a[i] - sum; mis-handling.
>
> Ok?
>
> Thanks,
> Richard.
>
> 2017-12-04 Richard Biener <rguenther@suse.de>
>
> * tree-vectorizer.h (check_reduction_path): Declare.
> * tree-vect-loop.c (check_reduction_path): New function, split out
> from ...
> (vect_is_simple_reduction): ... here.
> * gimple-loop-interchange.cc: Include tree-vectorizer.h.
> (loop_cand::analyze_iloop_reduction_var): Use single_imm_use.
> Properly check for a supported reduction operation and a
> valid expression if the reduction covers multiple stmts.
> (prepare_perfect_loop_nest): Simpify allocation.
> (pass_linterchange::execute): Likewise.
>
> * gcc.dg/tree-ssa/loop-interchange-1.c: Add fast-math flags.
> * gcc.dg/tree-ssa/loop-interchange-1b.c: New test variant.
> * gcc.dg/tree-ssa/loop-interchange-4.c: XFAIL.
>
>
> Index: gcc/gimple-loop-interchange.cc
> ===================================================================
> --- gcc/gimple-loop-interchange.cc (revision 255375)
> +++ gcc/gimple-loop-interchange.cc (working copy)
> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.
> #include "tree-ssa-loop-ivopts.h"
> #include "tree-ssa-dce.h"
> #include "tree-data-ref.h"
> +#include "tree-vectorizer.h"
>
> /* This pass performs loop interchange: for example, the loop nest
>
> @@ -551,23 +552,29 @@ loop_cand::analyze_iloop_reduction_var (
> in a way that reduction operation is seen as black box. In general,
> we can ignore reassociation of reduction operator; we can handle fake
> reductions in which VAR is not even used to compute NEXT. */
> - FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
> - {
> - stmt = USE_STMT (use_p);
> - if (is_gimple_debug (stmt))
> - continue;
> -
> - if (!flow_bb_inside_loop_p (m_loop, gimple_bb (stmt)))
> - return false;
> -
> - if (single_use != NULL)
> - return false;
> + if (! single_imm_use (var, &use_p, &single_use)
> + || ! flow_bb_inside_loop_p (m_loop, gimple_bb (single_use)))
> + return false;
>
> - single_use = stmt;
> - }
> + /* Check the reduction operation. We require a commutative or
> + left-associative operation. For FP math we also need to be allowed
> + to associate operations. */
> + if (! is_gimple_assign (single_use)
> + || ! (commutative_tree_code (gimple_assign_rhs_code (single_use))
> + || (commutative_ternary_tree_code
> + (gimple_assign_rhs_code (single_use))
> + && (use_p->use == gimple_assign_rhs1_ptr (single_use)
> + || use_p->use == gimple_assign_rhs2_ptr (single_use)))
> + || (gimple_assign_rhs_code (single_use) == MINUS_EXPR
> + && use_p->use == gimple_assign_rhs1_ptr (single_use)))
> + || (FLOAT_TYPE_P (TREE_TYPE (var))
> + && ! flag_associative_math))
> + return false;
>
> + /* Handle and verify a series of stmts feeding the reduction op. */
> if (single_use != next_def
> - && !stmt_dominates_stmt_p (single_use, next_def))
> + && !check_reduction_path (UNKNOWN_LOCATION, m_loop, phi, next,
> + gimple_assign_rhs_code (single_use)))
> return false;
>
> /* Only support cases in which INIT is used in inner loop. */
> @@ -1964,7 +1971,7 @@ prepare_perfect_loop_nest (struct loop *
> vec<data_reference_p> *datarefs, vec<ddr_p> *ddrs)
> {
> struct loop *start_loop = NULL, *innermost = loop;
> - struct loop *outermost = superloop_at_depth (loop, 0);
> + struct loop *outermost = loops_for_fn (cfun)->tree_root;
>
> /* Find loop nest from the innermost loop. The outermost is the innermost
> outer*/
> @@ -1985,21 +1992,26 @@ prepare_perfect_loop_nest (struct loop *
> if (!start_loop || !start_loop->inner)
> return false;
>
> + /* Prepare the data reference vector for the loop nest, pruning outer
> + loops we cannot handle. */
> datarefs->create (20);
> - if ((start_loop = prepare_data_references (start_loop, datarefs)) == NULL
> + start_loop = prepare_data_references (start_loop, datarefs);
> + if (!start_loop
> /* Check if there is no data reference. */
> || datarefs->is_empty ()
> /* Check if there are too many of data references. */
> - || (int) datarefs->length () > MAX_DATAREFS
> - /* Compute access strides for all data references. */
> - || ((start_loop = compute_access_strides (start_loop, innermost,
> - *datarefs)) == NULL)
> - /* Check if loop nest should be interchanged. */
> - || !should_interchange_loop_nest (start_loop, innermost, *datarefs))
> - {
> - free_data_refs_with_aux (*datarefs);
> - return false;
> - }
> + || (int) datarefs->length () > MAX_DATAREFS)
> + return false;
> +
> + /* Compute access strides for all data references, pruning outer
> + loops we cannot analyze refs in. */
> + start_loop = compute_access_strides (start_loop, innermost, *datarefs);
> + if (!start_loop)
> + return false;
> +
> + /* Check if any interchange is profitable in the loop nest. */
> + if (!should_interchange_loop_nest (start_loop, innermost, *datarefs))
> + return false;
>
> /* Check if data dependences can be computed for loop nest starting from
> start_loop. */
> @@ -2029,14 +2041,15 @@ prepare_perfect_loop_nest (struct loop *
> return true;
> }
>
> + /* ??? We should be able to adjust DDRs we already computed by
> + truncating distance vectors. */
Note it could be hard enough to adjust DDRs for stripped outer loop.
Dependence can become non-dep, even worse, we may result in wrong
"invalid dep" in case of:
DDR<a,b>:
(<, <, =)
(<, >, =)
because after simply stripping, it becomes:
DDR<a,b>:
(<, =)
(>, =)
While the correct result is "no-dep".
Thanks,
bin