This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: Ajit Kumar Agarwal <ajit dot kumar dot agarwal at xilinx dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Vinod Kathail <vinodk at xilinx dot com>, Shail Aditya Gupta <shailadi at xilinx dot com>, Vidhumouli Hunsigida <vidhum at xilinx dot com>, Nagaraju Mekala <nmekala at xilinx dot com>
- Date: Fri, 13 Nov 2015 11:13:25 +0100
- Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
- Authentication-results: sourceware.org; auth=none
- References: <37378DC5BCD0EE48BA4B082E0B55DFAA41F3F56C at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <CAFiYyc1wXMQ_69b=Q9V=bz5KAiLAp8C9VugpfyOZqt9iiDMzXQ at mail dot gmail dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F41150 at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <CAFiYyc0T+8CeeruKp-S2byhyBXW41ub7esqin+OuAYAy8q1WJA at mail dot gmail dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F42541 at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <CAFiYyc23ZRxFv4S=XKQUX2VgT113GDZVVjOYViSoYy4yP11cdg at mail dot gmail dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295155D at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295ADCB at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <55D4F921 dot 2020708 at redhat dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA4297704C at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <5643A732 dot 4040707 at redhat dot com> <CAFiYyc07fxKy=pxo4t81M8WVOx9PLnzmMthkbBm52wub93dErQ at mail dot gmail dot com> <5644C6CC dot 90203 at redhat dot com> <5644DB59 dot 9040809 at redhat dot com> <56450B62 dot 4090404 at redhat dot com>
On Thu, Nov 12, 2015 at 10:57 PM, Jeff Law <law@redhat.com> wrote:
> On 11/12/2015 11:32 AM, Jeff Law wrote:
>>
>> On 11/12/2015 10:05 AM, Jeff Law wrote:
>>>>
>>>> But IIRC you mentioned it should enable vectorization or so? In this
>>>> case
>>>> that's obviously too late.
>>>
>>> The opposite. Path splitting interferes with if-conversion &
>>> vectorization. Path splitting mucks up the CFG enough that
>>> if-conversion won't fire and as a result vectorization is inhibited. It
>>> also creates multi-latch loops, which isn't a great situation either.
>>>
>>> It *may* be the case that dropping it that far down in the pipeline and
>>> making the modifications necessary to handle simple latches may in turn
>>> make the path splitting code play better with if-conversion and
>>> vectorization and avoid creation of multi-latch loops. At least that's
>>> how it looks on paper when I draw out the CFG manipulations.
>>>
>>> I'll do some experiments.
>>
>> It doesn't look too terrible to ravamp the recognition code to work
>> later in the pipeline with simple latches. Sadly that doesn't seem to
>> have fixed the bad interactions with if-conversion.
>>
>> *But* that does open up the possibility of moving the path splitting
>> pass even deeper in the pipeline -- in particular we can move it past
>> the vectorizer. Which is may be a win.
>>
>> So the big question is whether or not we'll still see enough benefits
>> from having it so late in the pipeline. It's still early enough that we
>> get DOM, VRP, reassoc, forwprop, phiopt, etc.
>>
>> Ajit, I'll pass along an updated patch after doing some more testing.
>
> So here's what I'm working with. It runs after the vectorizer now.
>
> Ajit, if you could benchmark this it would be greatly appreciated. I know
> you saw significant improvements on one or more benchmarks in the past.
> It'd be good to know that the updated placement of the pass doesn't
> invalidate the gains you saw.
>
> With the updated pass placement, we don't have to worry about switching the
> pass on/off based on whether or not the vectorizer & if-conversion are
> enabled. So that hackery is gone.
>
> I think I've beefed up the test to identify the diamond patterns we want so
> that it's stricter in what we accept. The call to ignore_bb_p is a part of
> that test so that we're actually looking at the right block in a world where
> we're doing this transformation with simple latches.
>
> I've also put a graphical comment before perform_path_splitting which
> hopefully shows the CFG transformation we're making a bit clearer.
>
> This bootstraps and regression tests cleanly on x86_64-linux-gnu.
>
>
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 34d2356..6613e83 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1474,6 +1474,7 @@ OBJS = \
> tree-ssa-loop.o \
> tree-ssa-math-opts.o \
> tree-ssa-operands.o \
> + tree-ssa-path-split.o \
gimple-ssa-path-split please.
> tree-ssa-phionlycprop.o \
> tree-ssa-phiopt.o \
> tree-ssa-phiprop.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 757ce85..3e946ca 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2403,6 +2403,10 @@ ftree-vrp
> Common Report Var(flag_tree_vrp) Init(0) Optimization
> Perform Value Range Propagation on trees.
>
> +ftree-path-split
fsplit-paths
> +Common Report Var(flag_tree_path_split) Init(0) Optimization
> +Perform Path Splitting on trees for loop backedges.
> +
> funit-at-a-time
> Common Report Var(flag_unit_at_a_time) Init(1)
> Compile whole compilation unit at a time.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 213a9d0..b1e95da 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -354,6 +354,7 @@ Objective-C and Objective-C++ Dialects}.
> -fdump-tree-fre@r{[}-@var{n}@r{]} @gol
> -fdump-tree-vtable-verify @gol
> -fdump-tree-vrp@r{[}-@var{n}@r{]} @gol
> +-fdump-tree-path-split@r{[}-@var{n}@r{]} @gol
> -fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol
> -fdump-final-insns=@var{file} @gol
> -fcompare-debug@r{[}=@var{opts}@r{]} -fcompare-debug-second @gol
> @@ -462,7 +463,7 @@ Objective-C and Objective-C++ Dialects}.
> -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta
> @gol
> -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
> -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
> --ftree-vectorize -ftree-vrp @gol
> +-ftree-vectorize -ftree-vrp @gol -ftree-path-split @gol
> -funit-at-a-time -funroll-all-loops -funroll-loops @gol
> -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops
> @gol
> -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
> @@ -7169,6 +7170,11 @@ output on to @file{stderr}. If two conflicting dump
> filenames are
> given for the same pass, then the latter option overrides the earlier
> one.
>
> +@item path-split
> +@opindex fdump-tree-path-split
> +Dump each function after path splitting. The file name is made by
> +appending @file{.path-split} to the source file name.
> +
> @item all
> Turn on all options, except @option{raw}, @option{slim}, @option{verbose}
> and @option{lineno}.
> @@ -7811,6 +7817,7 @@ also turns on the following optimization flags:
> -ftree-switch-conversion -ftree-tail-merge @gol
> -ftree-pre @gol
> -ftree-vrp @gol
> +-ftree-path-split @gol
> -fipa-ra}
>
> Please note the warning under @option{-fgcse} about
> @@ -8819,7 +8826,7 @@ currently enabled, but may be enabled by @option{-O2}
> in the future.
>
> @item -ftree-sink
> @opindex ftree-sink
> -Perform forward store motion on trees. This flag is
> +Perform forward store motion on trees. This flag is
> enabled by default at @option{-O} and higher.
>
> @item -ftree-bit-ccp
> @@ -9125,6 +9132,13 @@ enabled by default at @option{-O2} and higher. Null
> pointer check
> elimination is only done if @option{-fdelete-null-pointer-checks} is
> enabled.
>
> +@item -ftree-path-split
> +@opindex ftree-path-split
> +Perform Path Splitting on trees. When the two execution paths of a
> +if-then-else merge at the loop latch node, try to duplicate the
> +merge node into two paths. This is enabled by default at @option{-O2}
> +and above.
> +
I think if we go into the detail of the transform we should mention the
effective result (creating a loop nest with disambiguation figuring out
which is the "better" inner loop).
> @item -fsplit-ivs-in-unroller
> @opindex fsplit-ivs-in-unroller
> Enables expression of values of induction variables in later iterations
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 9a3fbb3..9a0b27c 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -509,6 +509,7 @@ static const struct default_options
> default_options_table[] =
> { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1
> },
> { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
> { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
> + { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 },
Is this transform a good idea for -Os?
>
> /* -O3 optimizations. */
> { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> diff --git a/gcc/passes.def b/gcc/passes.def
> index c0ab6b9..4c9ef5f 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -274,6 +274,7 @@ along with GCC; see the file COPYING3. If not see
> POP_INSERT_PASSES ()
> NEXT_PASS (pass_simduid_cleanup);
> NEXT_PASS (pass_lower_vector_ssa);
> + NEXT_PASS (pass_path_split);
> NEXT_PASS (pass_cse_reciprocals);
> NEXT_PASS (pass_reassoc);
> NEXT_PASS (pass_strength_reduction);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c
> new file mode 100644
> index 0000000..c7e9515
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-path-split-details " } */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +#define RGBMAX 255
> +
> +int
> +test()
> +{
> + int i, Pels;
> + unsigned char sum = 0;
> + unsigned char xr, xg, xb;
> + unsigned char xc, xm, xy, xk;
> + unsigned char *ReadPtr, *EritePtr;
> +
> + ReadPtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100);
> + EritePtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100);
> +
> + for (i = 0; i < 100;i++)
> + {
> + ReadPtr[i] = 100 - i;
> + }
> +
> + for (i = 0; i < 100; i++)
> + {
> + xr = *ReadPtr++;
> + xg = *ReadPtr++;
> + xb = *ReadPtr++;
> +
> + xc = (unsigned char) (RGBMAX - xr);
> + xm = (unsigned char) (RGBMAX - xg);
> + xy = (unsigned char) (RGBMAX - xb);
> +
> + if (xc < xm)
> + {
> + xk = (unsigned char) (xc < xy ? xc : xy);
> + }
> + else
> + {
> + xk = (unsigned char) (xm < xy ? xm : xy);
> + }
> +
> + xc = (unsigned char) (xc - xk);
> + xm = (unsigned char) (xm - xk);
> + xy = (unsigned char) (xy - xk);
> +
> + *EritePtr++ = xc;
> + *EritePtr++ = xm;
> + *EritePtr++ = xy;
> + *EritePtr++ = xk;
> + sum += *EritePtr;
> + }
> + return sum;
> +}
> +
> +int
> +main()
> +{
> + if (test() != 33)
> + abort();
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "Duplicating join block" "path-split" } } */
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index b429faf..3dba68b 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -300,3 +300,4 @@ DEFTIMEVAR (TV_LINK , "link JIT code")
> DEFTIMEVAR (TV_LOAD , "load JIT result")
> DEFTIMEVAR (TV_JIT_ACQUIRING_MUTEX , "acquiring JIT mutex")
> DEFTIMEVAR (TV_JIT_CLIENT_CODE , "JIT client code")
> +DEFTIMEVAR (TV_TREE_PATH_SPLIT , "tree path split")
> diff --git a/gcc/tracer.c b/gcc/tracer.c
> index 941dc20..c7b5150 100644
> --- a/gcc/tracer.c
> +++ b/gcc/tracer.c
> @@ -51,9 +51,9 @@
> #include "tree-inline.h"
> #include "cfgloop.h"
> #include "fibonacci_heap.h"
> +#include "tracer.h"
>
> static int count_insns (basic_block);
> -static bool ignore_bb_p (const_basic_block);
> static bool better_p (const_edge, const_edge);
> static edge find_best_successor (basic_block);
> static edge find_best_predecessor (basic_block);
> @@ -85,7 +85,7 @@ bb_seen_p (basic_block bb)
> }
>
> /* Return true if we should ignore the basic block for purposes of tracing.
> */
> -static bool
> +bool
> ignore_bb_p (const_basic_block bb)
> {
> if (bb->index < NUM_FIXED_BLOCKS)
> @@ -226,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace)
> return i;
> }
>
> +/* Duplicate block BB2, placing it after BB in the CFG. Return the
> + newly created block. */
> +basic_block
> +transform_duplicate (basic_block bb, basic_block bb2)
> +{
> + edge e;
> + basic_block copy;
> +
> + e = find_edge (bb, bb2);
> +
> + copy = duplicate_block (bb2, e, bb);
> + flush_pending_stmts (e);
> +
> + add_phi_args_after_copy (©, 1, NULL);
> +
> + return (copy);
> +}
> +
> /* Look for basic blocks in frequency order, construct traces and tail
> duplicate
> if profitable. */
>
> @@ -321,17 +339,8 @@ tail_duplicate (void)
> entries or at least rotate the loop. */
> && bb2->loop_father->header != bb2)
> {
> - edge e;
> - basic_block copy;
> -
> - nduplicated += counts [bb2->index];
> -
> - e = find_edge (bb, bb2);
> -
> - copy = duplicate_block (bb2, e, bb);
> - flush_pending_stmts (e);
> -
> - add_phi_args_after_copy (©, 1, NULL);
> + nduplicated += counts [bb2->index];
> + basic_block copy = transform_duplicate (bb, bb2);
>
> /* Reconsider the original copy of block we've duplicated.
> Removing the most common predecessor may make it to be
> diff --git a/gcc/tracer.h b/gcc/tracer.h
> new file mode 100644
> index 0000000..cd1792a
> --- /dev/null
> +++ b/gcc/tracer.h
> @@ -0,0 +1,26 @@
> +/* Header file for Tracer.
> + Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> + for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +#ifndef GCC_TRACER_H
> +#define GCC_TRACER_H
> +
> +extern basic_block transform_duplicate (basic_block bb, basic_block bb2);
> +extern bool ignore_bb_p (const_basic_block bb);
> +
> +#endif /* GCC_TRACER_H */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 49e22a9..6963acc 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -390,6 +390,7 @@ extern gimple_opt_pass *make_pass_tree_loop_done
> (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_path_split (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-path-split.c b/gcc/tree-ssa-path-split.c
> new file mode 100644
> index 0000000..9f61bd4
> --- /dev/null
> +++ b/gcc/tree-ssa-path-split.c
> @@ -0,0 +1,275 @@
> +/* Support routines for Path Splitting.
> + Copyright (C) 2015 Free Software Foundation, Inc.
> + Contributed by Ajit Kumar Agarwal <ajitkum@xilinx.com>.
> +
> + This file is part of GCC.
> +
> + GCC is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3, or (at your option)
> + any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "cfganal.h"
> +#include "cfgloop.h"
> +#include "gimple-iterator.h"
> +#include "tracer.h"
> +
> +/* Given LATCH, the latch block in a loop, see if the shape of the
> + path reaching LATCH is suitable for path splitting. If so, return
> + the block that will be duplicated into its predecessor paths. Else
> + return NULL. */
> +
> +static basic_block
> +find_block_to_duplicate_for_path_splitting (basic_block latch)
> +{
> + /* We should have simple latches at this point. So the latch should
> + have a single successor. This implies the predecessor of the latch
> + likely has the loop exit. And it's that predecessor we're most
> + interested in. To keep things simple, we're going to require that
> + the latch have a single predecessor too. */
> + if (single_succ_p (latch) && single_pred_p (latch))
> + {
> + basic_block bb = get_immediate_dominator (CDI_DOMINATORS, latch);
> + gcc_assert (single_pred_edge (latch)->src == bb);
> +
> + /* If BB has been marked as not to be duplicated, then honor that
> + request. */
> + if (ignore_bb_p (bb))
> + return NULL;
> +
> + gimple *last = gsi_stmt (gsi_last_nondebug_bb (bb));
> + /* The immediate dominator of the latch must end in a conditional.
> */
> + if (!last || gimple_code (last) != GIMPLE_COND)
> + return NULL;
> +
> + /* We're hoping that BB is a join point for an IF-THEN-ELSE diamond
> + region. Verify that it is.
> +
> + First, verify that BB has two predecessors (each arm of the
> + IF-THEN-ELSE) and two successors (the latch and exit). */
> + if (EDGE_COUNT (bb->preds) == 2 && EDGE_COUNT (bb->succs) == 2)
> + {
> + /* Now verify that BB's immediate dominator ends in a
> + conditional as well. */
> + basic_block bb_idom = get_immediate_dominator (CDI_DOMINATORS,
> bb);
> + gimple *last = gsi_stmt (gsi_last_nondebug_bb (bb_idom));
> + if (!last || gimple_code (last) != GIMPLE_COND)
> + return NULL;
> +
> + /* And that BB's immediate dominator's successors are the
> + the predecessors of BB. */
> + if (!find_edge (bb_idom, EDGE_PRED (bb, 0)->src)
> + || !find_edge (bb_idom, EDGE_PRED (bb, 1)->src))
> + return NULL;
> +
> + /* So at this point we have a simple diamond for an IF-THEN-ELSE
> + construct starting at BB_IDOM, with a join point at BB. BB
> + pass control outside the loop or to the loop latch.
> +
> + We're going to want to create two duplicates of BB, one for
> + each successor of BB_IDOM. */
> + return bb;
> + }
> + }
> + return NULL;
> +}
> +
> +/* Return TRUE if BB is a reasonable block to duplicate by examining
> + its size, false otherwise. BB will always be a loop latch block.
> +
> + Should this use the same tests as we do for jump threading? */
> +
> +static bool
> +is_feasible_trace (basic_block bb)
> +{
> + int num_stmt = 0;
> + gimple_stmt_iterator gsi;
> +
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> + gimple *stmt = gsi_stmt (gsi);
> + if (!is_gimple_debug (stmt))
> + num_stmt++;
> + }
> +
> + /* We may want to limit how many statements we copy. */
> + if (num_stmt > 1)
> + return true;
> +
> + return false;
> +}
> +
> +/* If the immediate dominator of the latch of the loop is
> + block with conditional branch, then the loop latch is
> + duplicated to its predecessors path preserving the SSA
> + semantics.
> +
> + CFG before transformation.
> +
> + 2
> + |
> + |
> + +---->3
> + | / \
> + | / \
> + | 4 5
> + | \ /
> + | \ /
> + | 6
> + | / \
> + | / \
> + | 8 7
> + | | |
> + ---+ E
> +
> +
> +
> + Block 8 is the latch. We're going to make copies of block 6 (9 & 10)
> + and wire things up so they look like this:
> +
> + 2
> + |
> + |
> + +---->3
> + | / \
> + | / \
> + | 4 5
> + | | |
> + | | |
> + | 9 10
> + | |\ /|
> + | | \ / |
> + | | 7 |
> + | | | |
> + | | E |
> + | | |
> + | \ /
> + | \ /
> + +-----8
> +
> +
> + Blocks 9 and 10 will get merged into blocks 4 & 5 respectively which
> + enables CSE, DCE and other optimizations to occur on a larger block
> + of code. */
> +
> +static bool
> +perform_path_splitting ()
> +{
> + bool changed = false;
> + loop_p loop;
> +
> + loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
> + initialize_original_copy_tables ();
> + calculate_dominance_info (CDI_DOMINATORS);
> +
> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> + {
> + /* See if there is a block that we can duplicate to split the
> + path to the loop latch. */
> + basic_block bb = find_block_to_duplicate_for_path_splitting
> (loop->latch);
> +
> + /* BB is the merge point for an IF-THEN-ELSE we want to transform.
> +
> + Essentially we want to create two duplicates of BB and append
> + a duplicate to the THEN and ELSE clauses. This will split the
> + path leading to the latch. BB will be unreachable and removed. */
> + if (bb && is_feasible_trace (bb))
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> + "Duplicating join block %d into predecessor paths\n",
> + bb->index);
> + basic_block pred0 = EDGE_PRED (bb, 0)->src;
> + basic_block pred1 = EDGE_PRED (bb, 1)->src;
> + transform_duplicate (pred0, bb);
> + transform_duplicate (pred1, bb);
> + changed = true;
> + }
> + }
> +
> + loop_optimizer_finalize ();
> + free_original_copy_tables ();
> + return changed;
> +}
> +
> +/* Main entry point for path splitting. Returns TODO_cleanup_cfg if any
> + paths where split, otherwise return zero. */
> +
> +static unsigned int
> +execute_path_split (void)
> +{
> + /* If we don't have at least 2 real blocks and backedges in the
> + CFG, then there's no point in trying to perform path splitting. */
> + if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
> + || !mark_dfs_back_edges ())
> + return 0;
> +
> + bool changed = perform_path_splitting();
> + if (changed)
> + {
> + free_dominance_info (CDI_DOMINATORS);
> + /* If we changed the CFG schedule loops for fixup by cleanup_cfg. */
> + if (current_loops)
> + loops_state_set (LOOPS_NEED_FIXUP);
> + }
> +
> + return changed ? TODO_cleanup_cfg : 0;
> +}
> +
> +static bool
> +gate_path_split (void)
> +{
> + return flag_tree_path_split;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_path_split =
> +{
> + GIMPLE_PASS, /* type */
> + "path-split", /* name */
> + OPTGROUP_NONE, /* optinfo_flags */
> + TV_TREE_PATH_SPLIT, /* tv_id */
> + PROP_ssa, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_update_ssa, /* todo_flags_finish */
> +};
> +
> +class pass_path_split : public gimple_opt_pass
> +{
> + public:
> + pass_path_split (gcc::context *ctxt)
> + : gimple_opt_pass (pass_data_path_split, ctxt)
> + {}
> + /* opt_pass methods: */
> + opt_pass * clone () { return new pass_path_split (m_ctxt); }
> + virtual bool gate (function *) { return gate_path_split (); }
> + virtual unsigned int execute (function *) { return execute_path_split
> (); }
> +
> +}; // class pass_path_split
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_path_split (gcc::context *ctxt)
> +{
> + return new pass_path_split (ctxt);
> +}
>