This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PR81165] discount killed stmts when sizing blocks for threading
- From: Jeff Law <law at redhat dot com>
- To: Alexandre Oliva <aoliva at redhat dot com>, gcc-patches at gcc dot gnu dot org
- Cc: richard dot guenther at gmail dot com
- Date: Mon, 11 Dec 2017 13:00:21 -0700
- Subject: Re: [PR81165] discount killed stmts when sizing blocks for threading
- Authentication-results: sourceware.org; auth=none
- References: <ortvx2oiat.fsf@lxoliva.fsfla.org>
On 12/07/2017 05:04 AM, Alexandre Oliva wrote:
> We limit the amount of copying for jump threading based on counting
> stmts. This counting is overly pessimistic, because we will very
> often delete stmts as a consequence of jump threading: when the final
> conditional jump of a block is removed, earlier SSA names computed
> exclusively for use in that conditional are killed. Furthermore, PHI
> nodes in blocks with only two predecessors are trivially replaced with
> their now-single values after threading.
>
> This patch scans blocks to be copied in the path constructed so far
> and estimates the number of stmts that will be removed in the copies,
> bumping up the stmt count limit.
>
> Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install?
>
>
> for gcc/ChangeLog
>
> * tree-ssa-threadedge.c (uses_in_bb): New.
> (estimate_threading_killed_stmts): New.
> (estimate_threading_killed_stmts): New overload.
> (record_temporary_equivalences_from_stmts_at_dest): Add path
> parameter; adjust caller. Expand limit when it's hit.
>
> for gcc/testsuite/ChangeLog
>
> * gcc.dg/pr81165.c: New.
> ---
> gcc/testsuite/gcc.dg/pr81165.c | 59 ++++++++++++
> gcc/tree-ssa-threadedge.c | 189 +++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 245 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/pr81165.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr81165.c b/gcc/testsuite/gcc.dg/pr81165.c
> new file mode 100644
> index 000000000000..8508d893bed6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr81165.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-not " \[/%\] " "optimized" } } */
> +
> +/* Testcase submitted for PR81165, with its main function removed as
> + it's turned into a compile test. We want to make sure that all of
> + the divide/remainder computations are removed by tree optimizers.
> +
> + We can figure out that we don't need to compute at runtime even the
> + condition to enter the loop: the initial i==0 would have to be
> + greater than the sum of two small unsigned values: 1U>>t1 is in the
> + range 0..1, whereas the char value is bounded by the range 0..127,
> + being 128 % a positive number (zero would invoke undefined
> + behavior, so we can assume it doesn't happen). (We know it's
> + nonnegative because it's 10 times a number that has no more than
> + the bits for 16, 8 and 1 set.)
> +
> + We don't realize that the loop is useless right away: jump
> + threading helps remove some of the complexity, particularly of the
> + computation within the loop: t1 is compared with 1, but it can
> + never be 1. (We could assume as much, since its being 1 would
> + divide by zero, but we don't.)
> +
> + If we don't enter the conditional block, t1 remains at 2; if we do,
> + it's set to either -1. If we jump thread at the end of the
> + conditional block, we can figure out the ranges exclude 1 and the
> + jump body is completely optimized out. However, we used to fail to
> + consider the block for jump threading due to the amount of
> + computation in it, without realizing most of it would die in
> + consequence of the threading.
> +
> + We now take the dying code into account when deciding whether or
> + not to try jump threading. That might enable us to optimize the
> + function into { if (x2 != 0 || (x1 & 1) == 0) abort (); }. At the
> + time of this writing, with the patch, we get close, but the test on
> + x2 only gets as far as ((1 >> x2) == 0). Without the patch, some
> + of the loop remains. */
> +
> +short x0 = 15;
> +
> +void func (){
> + volatile int x1 = 1U;
> + volatile char x2 = 0;
> + char t0 = 0;
> + unsigned long t1 = 2LU;
> + int i = 0;
> +
> + if(1>>x2) {
> + t0 = -1;
> + t1 = (1&(short)(x1^8U))-1;
> + }
> +
> + while(i > (int)((1U>>t1)+(char)(128%(10*(25LU&(29%x0)))))) {
> + i += (int)(12L/(1!=(int)t1));
> + }
> +
> + if (t0 != -1) __builtin_abort();
> + if (t1 != 0L) __builtin_abort();
> +}
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index 536c4717b725..25ccac2a3ecc 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> +
> +/* Starting from the final control flow stmt in BB, assuming it will
> + be removed, follow uses in to-be-removed stmts back to their defs
> + and count how many defs are to become dead and be removed as
> + well. */
> +
> +static int
> +estimate_threading_killed_stmts (basic_block bb)
> +{
> + int killed_stmts = 0;
> + hash_map<tree, int> ssa_remaining_uses;
> + auto_vec<gimple *, 4> dead_worklist;
> +
> + /* If the block has only two predecessors, threading will turn phi
> + dsts into either src, so count them as dead stmts. */
> + bool drop_all_phis = EDGE_COUNT (bb->preds) == 2;
> +
> + if (drop_all_phis)
> + for (gphi_iterator gsi = gsi_start_phis (bb);
> + !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> + gphi *phi = gsi.phi ();
> + tree dst = gimple_phi_result (phi);
> +
> + /* We don't count virtual PHIs as stmts in
> + record_temporary_equivalences_from_phis. */
> + if (virtual_operand_p (dst))
> + continue;
> +
> + killed_stmts++;
> + }
> +
> + if (gsi_end_p (gsi_last_bb (bb)))
> + return killed_stmts;
> +
> + gimple *stmt = gsi_stmt (gsi_last_bb (bb));
> + if (gimple_code (stmt) != GIMPLE_COND
> + && gimple_code (stmt) != GIMPLE_GOTO
> + && gimple_code (stmt) != GIMPLE_SWITCH)
> + return killed_stmts;
> +
> + dead_worklist.quick_push (stmt);
> + while (!dead_worklist.is_empty ())
> + {
> + stmt = dead_worklist.pop ();
> +
> + ssa_op_iter iter;
> + use_operand_p use_p;
> + FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
> + {
> + tree t = USE_FROM_PTR (use_p);
> + if (SSA_NAME_DEF_STMT (t)
> + && (gimple_code (SSA_NAME_DEF_STMT (t)) == GIMPLE_ASSIGN
> + || (gimple_code (SSA_NAME_DEF_STMT (t)) == GIMPLE_PHI
> + && !virtual_operand_p (t)
> + && !drop_all_phis))
> + && gimple_bb (SSA_NAME_DEF_STMT (t)) == bb)
> + {
> + int *usesp = ssa_remaining_uses.get (t);
> + int uses;
> +
> + if (usesp)
> + uses = *usesp;
> + else
> + uses = uses_in_bb (t, bb);
> +
> + gcc_assert (uses);
> +
> + /* Don't bother recording the expected use count if we
> + won't find any further uses within BB. */
> + if (!usesp && (uses < -1 || uses > 1))
> + {
> + usesp = &ssa_remaining_uses.get_or_insert (t);
> + *usesp = uses;
> + }
> +
> + if (uses < 0)
> + continue;
> +
> + --uses;
> + if (usesp)
> + *usesp = uses;
> +
> + if (!uses)
> + {
> + killed_stmts++;
> + if (usesp)
> + ssa_remaining_uses.remove (t);
> + if (gimple_code (SSA_NAME_DEF_STMT (t)) != GIMPLE_PHI)
> + dead_worklist.safe_push (SSA_NAME_DEF_STMT (t));
> + }
> + }
> + }
> + }
> +
> + if (dump_file)
> + fprintf (dump_file, "threading bb %i kills %i stmts\n",
> + bb->index, killed_stmts);
> +
> + return killed_stmts;
> +}
> +
> +/* Estimate killed statements over all blocks in the PATH, or in E.
> + If PATH is not empty, E must be its last entry. */
> +
> +static int
> +estimate_threading_killed_stmts (edge e, vec<jump_thread_edge *> *path)
> +{
> + gcc_assert (path->length () == 0 || path->last ()->e == e);
> + if (path->length () == 0)
> + return estimate_threading_killed_stmts (e->dest);
> +
> + int total = 0;
> + int i = 0;
> + jump_thread_edge *je;
> + FOR_EACH_VEC_ELT (*path, i, je)
> + switch (je->type)
> + {
> + case EDGE_START_JUMP_THREAD:
> + case EDGE_COPY_SRC_BLOCK:
> + case EDGE_COPY_SRC_JOINER_BLOCK:
> + total += estimate_threading_killed_stmts (je->e->dest);
> + break;
> +
> + default:
> + gcc_unreachable ();
> +
> + case EDGE_FSM_THREAD:
> + case EDGE_NO_COPY_SRC_BLOCK:
> + break;
> + }
> +
> + return total;
> +}
So I'd mark je->e->src rather than je->e->dest. And you only need this
handling for EDGE_COPY_SRC_BLOCK. So I don't think you need a switch,
just a simple je->type == EDGE_COPY_SRC_BLOCK.
While we do make a copy for EDGE_COPY_SRC_JOINER_BLOCK, the conditional
at the end of that block is not dead, so we can't really do anything
special with it.
I think with that fix this should be OK.
jeff