This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFA][PATCH] Optimize PHIs with constant arguments better
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 2 May 2018 13:23:42 +0200
- Subject: Re: [RFA][PATCH] Optimize PHIs with constant arguments better
- References: <f1e10f4b-dd84-a39b-8561-856a9a14ffd7@redhat.com> <a69be5e1-73d5-fa23-82af-f943bbaea16f@redhat.com>
On Wed, May 2, 2018 at 12:44 AM, Jeff Law <law@redhat.com> wrote:
>
> Time to come back to this now that the trunk is open again...
>
>
> -------- Forwarded Message --------
> Subject: [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better
> Date: Thu, 30 Nov 2017 16:24:33 -0700
> From: Jeff Law <law@redhat.com>
> To: gcc-patches <gcc-patches@gcc.gnu.org>
>
>
>
> This addresses some code quality regressions with some work scheduled
> for gcc-9. However, if anyone is aware of BZs that could be helped with
> this patch, don't hesitate to pass them along. Depending on their
> severity we could always re-evaluate the plan for this patch.
>
> The patch optimizes sequences like this:
>
>
> # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
> _30 = (unsigned char) iftmp.4_16;
>
> ie, we have a PHI where all its arguments are known (differing)
> constants. The PHI feeds an operation which can be compile-time
> evaluated for all the PHI's arguments.
>
> This arises most often where the result of the PHI is type converted via
> NOP_EXPR. But it also happens for a boolean comparisons, arithmetic
> and logicals and other type conversions.
>
> We can optimize away the statement by creating a new PHI node holding
> the result of the statement applied to each of the original PHI's
> constant operands. So for the fragment above we'd generate:
>
> # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
> # _30 = PHI <0(7), 1(8), 0(9)>
>
> These kind of backward propagations can cascade. Here's an example I
> saw during analysis of test results:
>
> # m_5 = PHI <10(5), 12(6), 14(7)>
> <L13>:
> _2 = (long unsigned int) m_5;
> _3 = _2 * 32;
> goto <bb 12>; [100.00%]
>
> After this patch the two statements have been eliminated in favor of
> generating PHIs with constant arguments:
>
> And after PHI propagation we have:
> # m_5 = PHI <10(5), 12(6), 14(7)>
> # _2 = PHI <10(5), 12(6), 14(7)>
> # _3 = PHI <320(5), 384(6), 448(7)>
> <L13>:
> goto <bb 12>; [100.00%]
>
> DCE will come along and wipe out m_5 and _2 if they are unused.
>
> I initially covered just NOP_EXPR in the proof of concept. But it's
> trivial to extend to cover other unaries, and binary/comparisons where
> one operand was already a constant, so I did that. This applies
> surprisingly often during a bootstrap. An earlier version (which didn't
> handle multiple uses of the result of the PHI) triggered over 6000 times
> during a bootstrap:
>
> NOP_EXPR 5161
> LT_EXPR 658
> GE_EXPR 504
> NE_EXPR 310
> BIT_NOT_EXPR 295
> BIT_AND_EXPR 94
> PLUS_EXPR 77
> NEGATE_EXPR 48
> LSHIFT_EXPR 40
> EQ_EXPR 34
> GT_EXPR 16
> BIT_IOR_EXPR 10
> MINUS_EXPR 8
>
> There's actually other cases we could handle were
>
> x = PHI (a, 0);
> y = a & x;
>
> Turns into
>
> x = PHI (a, 0);
> y = PHI (a, 0);
>
>
> I'm not sure how often these non-constant cases happen -- I haven't
> tried to support this or gain any data on how often this kind of case
> might happen.
>
> FWIW, there are cases were the PHI arguments are constant, but we still
> can't simplify. For example:
>
>
> x = PHI (0, 1, 2)
>
> y = 1234 / x;
>
>
> When we process this we'll try to simplify 1234 / 0 to a constant which
> fails. We have to gracefully remove the partially initialized PHI node
> in that case. This is tested by existing tests in the suite.
>
> You'll also see that certain tests in the suite such as pr61839_3,
> ssa-pre-4.c are twiddled. I've suppressed phi backwards propagation in
> the original test so that it's not compromised. THere's also a variant
> of each test which verifies that the transformation is handled by phi
> back propagation.
>
> Bootstrapped and regression tested on x86, both in isolation and in a
> larger patch series.
>
> I see there's a couple whitespace issues in the patch. I'll fix those.
> I'd like to get comments/review on the meat of the patch though.
As a general comment I'd note what this patch does is sth that
PRE does as well (and quite aggressively so). So I'd say if we
kind-of duplicate this transform we should tame it down (see comments
below).
Given it duplicates functionality I guess that you placed it inside
phiprop was mostly due to pass ordering as you mention jump
threading can happen earlier? phiopt would have been another
natural choice.
>
>
> Jeff
>
>
>
> * tree-ssa-phiprop.c (propagate_with_phi_1): Renamed from
> propagate_with_phi.
> (stmt_likely_produces_constant_result): New function.
> (fold_use_into_phi, propagate_with_phi_2): Likewise.
> (pass_phiprop::execute): Corresponding changes. Call
> propagate_with_phi_2.
>
> * gcc.dg/tree-ssa/pr61743-1.c: Use -fno-tree-phiprop.
> * gcc.dg/tree-ssa/pr61839_1.c: Likewise.
> * gcc.dg/tree-ssa/pr61839_3.c: Likewise.
> * gcc.dg/tree-ssa/ssa-pre-4.c: Likewise.
>
> * gcc.dg/tree-ssa/pr61743-1a.c: New test.
> * gcc.dg/tree-ssa/pr61839_1a.c: Likewise.
> * gcc.dg/tree-ssa/pr61839_3a.c: Likewise.
> * gcc.dg/tree-ssa/ssa-pre-4a.c: New test.
>
> diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
> index 494158b..f2ec2b3 100644
> --- a/gcc/tree-ssa-phiprop.c
> +++ b/gcc/tree-ssa-phiprop.c
> @@ -259,8 +259,8 @@ chk_uses (tree, tree *idx, void *data)
> with aliasing issues as we are moving memory reads. */
>
> static bool
> -propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
> - size_t n)
> +propagate_with_phi_1 (basic_block bb, gphi *phi, struct phiprop_d *phivn,
> + size_t n)
> {
> tree ptr = PHI_RESULT (phi);
> gimple *use_stmt;
> @@ -440,6 +440,207 @@ next:;
> return phi_inserted;
> }
>
> +/* Return TRUE if USE_STMT, which uses PHI_RESULT will likely produce
> + a constant result if PHI_RESULT is itself a constant. Else return
> + FALSE.
> +
> + It is safe for this routine to always return TRUE. If the result
> + is not a constant it will be detected and properly handled later.
> +
> + This is overly conservative. If USE_STMT were to produce an SSA_NAME,
> + then that would still work for our purposes. */
> +
> +static bool
> +stmt_likely_produces_constant_result (gimple *use_stmt, tree phi_result)
> +{
> + enum tree_code code = gimple_assign_rhs_code (use_stmt);
> + switch (TREE_CODE_CLASS (code))
> + {
> + /* With few exceptions, a unary operator applied to a constant
> + will result in a constant. So they are OK without digging
> + deeper into the operator/operands. */
> + case tcc_unary:
> + return true;
> +
> + /* For binary and comparisons we'll generally be able to generate
> + a constant if only one operand is an SSA_NAME. */
> + case tcc_binary:
> + case tcc_comparison:
> + {
> + tree rhs1 = gimple_assign_rhs1 (use_stmt);
> + tree rhs2 = gimple_assign_rhs2 (use_stmt);
> +
> + /* We need to check for RES op CONST and CONST op RES. Consider
> + MINUS_EXPR or MIN/MAX where RES could be the first or the second
> + argument. */
> + if (rhs1 == phi_result
> + && TREE_CODE_CLASS (TREE_CODE (rhs2)) == tcc_constant)
> + return true;
> +
> + if (rhs2 == phi_result
> + && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_constant)
> + return true;
> +
> + /* We do not try to handle two SSA_NAME operands. */
> + return false;
> + }
> +
> + /* There might be some tcc_reference or tcc_exceptional types we
> + could handle with deeper investigation. COND_EXPR comes to mind. */
> + default:
> + return false;
> + }
> +}
> +
> +/* PHI in block BB produces RESULT. PHI_RESULT is used in USE_STMT.
> +
> + All of PHIs arguments are simple constants. See if propagating
> + the PHI arguments into USE_STMT produces a constant. If that is
> + the case for all the PHI arguments, then STMT is replaced with a
> + new PHI and we return TRUE. Else make no changes to the IL and
> + return FALSE.
> +
> + When applied this transformation reduces runtime computations and
> + replaces them with either constant initializations. This also enables
> + jump threading to occur earlier in the optimization pipeline. */
> +
> +static bool
> +fold_use_into_phi (gphi *phi, gimple *use_stmt,
> + basic_block bb, tree phi_result)
> +{
> + /* Now verify the use is an assigment to an SSA_NAME. */
> + if (!is_gimple_assign (use_stmt)
> + || TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME)
> + return false;
> +
> + /* If USE_STMT does not likely produce a constant result, then
> + do not try to fold this use. Note this is overly conservative
> + as we could handle USE_STMT simplifying to an SSA_NAME rather than
> + just constants. */
> + if (!stmt_likely_produces_constant_result (use_stmt, phi_result))
> + return false;
> +
> + /* Build a new PHI node to replace the definition of the USE_STMT's LHS. */
> + gphi *new_phi = create_phi_node (NULL_TREE, bb);
> +
> + /* Now fill in its arguments by applying the operator to each
> + argument of the original PHI. */
So rather than creating the PHI immediately please do sth like
auto_vec<tree, 8> new_args;
new_args.reserve_exact (gimple_phi_num_args (phi));
FOR_EACH_EDGE (...)
push-to-new-args-vec-or bail out
alloc PHI and add args in second sweep over edges.
> + edge e;
> + edge_iterator ei;
> + tree use_result = gimple_assign_lhs (use_stmt);
> + tree type = TREE_TYPE (use_result);
> + enum tree_code code = gimple_assign_rhs_code (use_stmt);
> + FOR_EACH_EDGE (e, ei, bb->preds)
> + {
> + tree old_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> + tree new_arg;
> +
> + switch (TREE_CODE_CLASS (code))
> + {
> + case tcc_unary:
> + new_arg = fold_unary_to_constant (code, type, old_arg);
> + break;
> +
> + case tcc_binary:
> + case tcc_comparison:
> + {
> + tree rhs1 = gimple_assign_rhs1 (use_stmt);
> + tree rhs2 = gimple_assign_rhs2 (use_stmt);
> + if (rhs1 == phi_result)
> + new_arg = fold_binary (code, type,
> + old_arg, gimple_assign_rhs2 (use_stmt));
> + else if (rhs2 == phi_result)
> + new_arg = fold_binary (code, type,
> + gimple_assign_rhs1 (use_stmt), old_arg);
> + else
> + gcc_unreachable ();
> + break;
> + }
So I wonder if we don't want to be more aggressive here and not care
about PHI arg constant-ness or other operator state of the operation.
You should be able to do sth like (with the single-use constraint!)
use_operand_p use_p;
gimple *use_stmt;
single_imm_use (gimple_phi_result (phi), &use_p, &use_stmt);
tree orig_arg = USE_FROM_PTR (use_p);
in the loop then do
*(use_p->use) = phi-arg; // do not use SET_USE to keep immediate
uses intact
tree res = gimple_fold_stmt_to_constant (USE_STMT (use_p), NULL);
if (is_gimple_min_invariant (res))
OK, record it;
else
{
*(use_p->use) = orig_arg;
return false;
}
I suspect the cases where we do have the first N edges to optimize to a constant
but a later one will fail the optimization isn't common. As a early
bail-out we probably
want to avoid this transform on loop headers (thus when backedges are involved).
Note that for the case where we have n > 1 edges with constant
propagation result
it might be profitable to create a forwarder for the constant or
non-constant parts
thus end up with
cst_result_1 = PHI< .... >;
goto merge;
orig_phi_2 = PHI< edges with non-cst-result... >;
noncst_result_3 = OP (orig_phi_2);
res_4 = PHI <cst_result_1, noncst_result_3>
merge:
where of course whether we may hoist OP () needs to be checked.
> + default:
> + gcc_unreachable ();
> + }
> +
> + /* We did not get a constant. This can happen (imagine division
> + by zero). We have to remove the PHI from the IL, as the PHI
> + is only partially constructed. */
> + if (!new_arg)
> + {
> + /* Set the PHI_RESULT to a new, throw-away SSA_NAME. This avoids
> + problems unlinking the "uses" of a currently NULL PHI_RESULT. */
> + gimple_phi_set_result (new_phi, make_ssa_name (type));
> +
> + /* Actually remove the PHI from the IL. It is safe to remove the
> + PHI if some of its PHI arguments are still uninitialized. */
> + gimple_stmt_iterator gsi = gsi_for_stmt (new_phi);
> + remove_phi_node (&gsi, true);
> + return false;
> + }
> +
> + source_location locus = gimple_phi_arg_location_from_edge (phi, e);
> + add_phi_arg (new_phi, new_arg, e, locus);
> + }
> +
> + gimple_phi_set_result (new_phi, use_result);
> +
> + /* At this point we have our new PHI installed where we want it.
> + Delete the old PHI and USE_STMT. */
> + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> + gsi_remove (&gsi, true);
> + return true;
> +}
> +
> +/* Look for sequences like this:
> +
> + # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
> + _30 = (unsigned char) iftmp.4_16;
> +
> + We can "hoist" the conversion into the PHI and get it for free, generating:
> +
> + # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
> + # _30 = PHI <0(7), 1(8), 0(9)>
> +
> + DCE will eliminate the first PHI. In addition to getting the conversion
> + for free, removal of the conversion improves backwards threading. */
> +
> +static bool
> +propagate_with_phi_2 (basic_block bb, gphi *phi)
> +{
> + /* First verify that all the arguments of the PHI are constants.
> +
> + This is overly conservative. Consider:
> +
> + y = PHI (x, -1, 0);
> + z = y & x;
> +
> + We could transform that into:
> +
> + y = PHI (x, -1, 0);
> + z = PHI (x, x, 0);
> +
> + It's not clear how important those cases are and we punt them
> + for now. */
> + use_operand_p arg_p;
> + ssa_op_iter i;
> + FOR_EACH_PHI_ARG (arg_p, phi, i, SSA_OP_USE)
> + {
> + tree arg = USE_FROM_PTR (arg_p);
> + if (TREE_CODE_CLASS (TREE_CODE (arg)) != tcc_constant)
> + return false;
> + }
> +
> + /* Not iterate over each immediate use to see if the statement can
> + be implemented as a PHI rather than a real statement. */
> + gimple *use_stmt;
> + imm_use_iterator ui;
> + bool retval = false;
> + tree phi_result = PHI_RESULT (phi);
> + FOR_EACH_IMM_USE_STMT (use_stmt, ui, phi_result)
> + retval |= fold_use_into_phi (phi, use_stmt, bb, phi_result);
So you say DCE will remove the PHI but then you handle multiple
uses where some of them might fail to transform.
Thus should we restrict this to single-uses? That way we also avoid
to end up with too many PHIs and we can remove the original PHI
immediately.
> + return retval;
> +}
> +
> /* Main entry for phiprop pass. */
>
> namespace {
> @@ -492,7 +693,10 @@ pass_phiprop::execute (function *fun)
> single_succ (ENTRY_BLOCK_PTR_FOR_FN (fun)));
> FOR_EACH_VEC_ELT (bbs, i, bb)
> for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
> + {
> + did_something |= propagate_with_phi_1 (bb, gsi.phi (), phivn, n);
> + did_something |= propagate_with_phi_2 (bb, gsi.phi ());
> + }
>
> if (did_something)
> gsi_commit_edge_inserts ();
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
> index a5c83cf..7771044 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */
> +/* { dg-options "-O3 -fno-tree-phiprop -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */
>
> #define N 8
> #define M 14
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
> new file mode 100644
> index 0000000..dd64472
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fdump-tree-phiprop -fno-peel-loops" } */
> +
> +#include "pr61743-1.c"
> +
> +/* { dg-final { scan-tree-dump-times "PHI <(\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
> +
> +/* The code with PHI back propagation is simpler and we unswitch more. */
> +/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 10 "cunroll" } } */
> +/* { dg-final { scan-tree-dump-times "loop with 9 iterations completely unrolled" 2 "cunrolli" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
> index 9f8168a..5bf1982 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
> @@ -1,6 +1,6 @@
> /* PR tree-optimization/61839. */
> /* { dg-do run } */
> -/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-phiprop -fdump-tree-optimized" } */
> /* { dg-require-effective-target int32plus } */
>
> __attribute__ ((noinline))
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
> new file mode 100644
> index 0000000..99a9016
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/61839. */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-phiprop" } */
> +/* { dg-require-effective-target int32plus } */
> +
> +#include "pr61839_1.c"
> +
> +/* Scan for c = 972195717) >> [0, 1] in function foo. */
> +/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1 "vrp1" } } */
> +/* Scan for c = 972195717) >> [2, 3] in function bar. */
> +/* { dg-final { scan-tree-dump-times "PHI <(?:243048929|121524464)\\(\[0-9\]+\\), (?:243048929|121524464)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
> index 5ceb073..ad908c0 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
> @@ -1,6 +1,6 @@
> /* PR tree-optimization/61839. */
> /* { dg-do run } */
> -/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-vrp1 -fdump-tree-optimized" } */
>
> __attribute__ ((noinline))
> int foo (int a, unsigned b)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
> new file mode 100644
> index 0000000..3a4d800
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/61839. */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-phiprop -fdump-tree-optimized" } */
> +#include "pr61839_3.c"
> +
> +/* Scan for a PHI with (12, 13) << 8 in function foo. */
> +/* { dg-final { scan-tree-dump-times "PHI <\(3072|3328\)\\(\[0-9\]+\\), \(3072|3328\)\\(\[0-9\
> +]+\\)>" 1 "phiprop"} } */
> +/* { dg-final { scan-tree-dump-times "3072" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
> index 5b4fd54..0bc5f73 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-pre-stats" } */
> +/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-pre-stats" } */
> int foo(void)
> {
> int x, c, y;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
> new file mode 100644
> index 0000000..70e8b16
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-phiprop" } */
> +#include "ssa-pre-4.c"
> +/* We should eliminate the x+1 computation from this routine, replacing
> + it with a phi of 3, 4 */
> +/* { dg-final { scan-tree-dump-times "PHI <\[34\]\\(\[0-9\]+\\), \[34\]\\(\[0-9\]+\\)>" 1 "phiprop"} } */
>