This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation


All:

Please find the updated patch with suggestion and feedback incorporated.

Thanks Jeff and Richard for the review comments.

Following changes were done based on the feedback on RFC comments. and the review for the previous patch.

1. Both tracer and path splitting pass are separate passes so  that two instances of the pass will run in the end, one doing path splitting
 and one doing  tracing, at different times in the optimization pipeline.
2. Transform code is shared for tracer and path splitting pass. The common code in extracted in a given function transform_duplicate
And place the function in tracer.c and the path splitting pass uses the transform code.
3. Analysis for the basic block population and traversing the basic block using the Fibonacci heap is commonly used. This cannot be
Factored out into new function as the tracer pass does more analysis based on the profile and the different heuristics is used in tracer
And path splitting pass.
4. The include headers is minimal and presence of what is required for the path splitting pass.
5. The earlier patch does the SSA updating  with replace function to preserve the SSA representation required to move the loop latch node same as join
Block to its predecessors and the loop latch node is just forward block. Such replace function are not required as suggested by the Jeff. Such replace
Function goes away with this patch and the transformed code is factored into a given function which is shared between tracer and path splitting pass.   

Bootstrapping with i386 and Microblaze target works fine. No regression is seen in Deja GNU tests for Microblaze. There
are lesser failures. Mibench/EEMBC benchmarks were run for Microblaze target and the gain of
9.3% is seen in rgbcmy_lite the EEMBC benchmarks.

SPEC 2000 benchmarks were run with i386 target and the following performance number is achieved.

INT benchmarks with path splitting(ratio) Vs INT benchmarks without path splitting(ratio) = 3661.225091 vs 3621.520572
FP benchmarks with path splitting(ratio) Vs FP benchmarks without path splitting(ratio )  =  4339.986209 vs 4339.775527

Maximum gains achieved with 252.eon INT benchmarks = 9.03%.

ChangeLog:
2015-08-15  Ajit Agarwal  <ajitkum@xilinx.com>

        * gcc/Makefile.in: Add the build of the new file
        tree-ssa-path-split.c
        * gcc/common.opt (ftree-path-split): Add the new flag.
        * gcc/opts.c (OPT_ftree_path_split) : Add an entry for
        Path splitting pass with optimization flag greater and
        equal to O2.
        * gcc/passes.def (path_split): add new path splitting pass.
        * gcc/timevar.def (TV_TREE_PATH_SPLIT): New.
        * gcc/tree-pass.h (make_pass_path_split): New declaration.
       * gcc/tree-ssa-path-split.c: New.
        * gcc/tracer.c (transform_duplicate): New.
        * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New.
        * gcc/testsuite/gcc.dg/path-split-1.c: New.
        * gcc/doc/invoke.texi
        (ftree-path-split): Document.
        (fdump-tree-path_split): Document.

Signed-off-by:Ajit Agarwal ajitkum@xilinx.com.

Thanks & Regards
Ajit

-----Original Message-----
From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org] On Behalf Of Ajit Kumar Agarwal
Sent: Wednesday, July 29, 2015 10:13 AM
To: Richard Biener; Jeff Law
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation



-----Original Message-----
From: Richard Biener [mailto:richard.guenther@gmail.com]
Sent: Thursday, July 16, 2015 4:30 PM
To: Ajit Kumar Agarwal
Cc: law@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

On Tue, Jul 7, 2015 at 3:22 PM, Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com> wrote:
>
>
> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Tuesday, July 07, 2015 2:21 PM
> To: Ajit Kumar Agarwal
> Cc: law@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
> Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
> tree ssa representation
>
> On Sat, Jul 4, 2015 at 2:39 PM, Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com> wrote:
>>
>>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: Tuesday, June 30, 2015 4:42 PM
>> To: Ajit Kumar Agarwal
>> Cc: law@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
>> Vidhumouli Hunsigida; Nagaraju Mekala
>> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass 
>> on tree ssa representation
>>
>> On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com> wrote:
>>> All:
>>>
>>> The below patch added a new path Splitting optimization pass on SSA 
>>> representation. The Path Splitting optimization Pass moves the join 
>>> block of if-then-else same as loop latch to its predecessors and get merged with the predecessors Preserving the SSA representation.
>>>
>>> The patch is tested for Microblaze and i386 target. The 
>>> EEMBC/Mibench benchmarks is run with the Microblaze target And the 
>>> performance gain of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for Mircroblaze Target and no regression is seen for Microblaze target and the new testcase attached are passed.
>>>
>>> For i386 bootstrapping goes through fine and the Spec cpu2000 
>>> benchmarks is run with this patch. Following observation were seen with spec cpu2000 benchmarks.
>>>
>>> Ratio of path splitting change vs Ratio of not having path splitting change is 3653.353 vs 3652.14 for INT benchmarks.
>>> Ratio of path splitting change vs Ratio of not having path splitting change is  4353.812 vs 4345.351 for FP benchmarks.
>>>
>>> Based on comments from RFC patch following changes were done.
>>>
>>> 1. Added a new pass for path splitting changes.
>>> 2. Placed the new path  Splitting Optimization pass before the copy propagation pass.
>>> 3. The join block same as the Loop latch is wired into its 
>>> predecessors so that the CFG Cleanup pass will merge the blocks Wired together.
>>> 4. Copy propagation routines added for path splitting changes is not 
>>> needed as suggested by Jeff. They are removed in the patch as The copy propagation in the copied join blocks will be done by the existing copy propagation pass and the update ssa pass.
>>> 5. Only the propagation of phi results of the join block with the 
>>> phi argument is done which will not be done by the existing update_ssa Or copy propagation pass on tree ssa representation.
>>> 6. Added 2 tests.
>>>     a) compilation check  tests.
>>>    b) execution tests.
>>> 7. Refactoring of the code for the feasibility check and finding the join block same as loop latch node.
>>>
>>>     [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation.
>>>
>>>     Added a new pass on path splitting on tree SSA representation. The path
>>>     splitting optimization does the CFG transformation of join block of the
>>>     if-then-else same as the loop latch node is moved and merged with the
>>>     predecessor blocks after preserving the SSA representation.
>>>
>>>     ChangeLog:
>>>     2015-06-30  Ajit Agarwal  <ajitkum@xilinx.com>
>>>
>>>         * gcc/Makefile.in: Add the build of the new file
>>>         tree-ssa-path-split.c
>>>         * gcc/common.opt: Add the new flag ftree-path-split.
>>>         * gcc/opts.c: Add an entry for Path splitting pass
>>>         with optimization flag greater and equal to O2.
>>>         * gcc/passes.def: Enable and add new pass path splitting.
>>>         * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT.
>>>         * gcc/tree-pass.h: Extern Declaration of make_pass_path_split.
>>>         * gcc/tree-ssa-path-split.c: New file for path splitting pass.
>>>         * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase.
>>>         * gcc/testsuite/gcc.dg/path-split-1.c: New testcase.
>>
>>>>I'm not 100% sure I understand the transform but what I see from the testcases it tail-duplicates from a conditional up to a loop latch block (not sure if it >>includes it and thus ends up creating a loop nest or not).
>>
>>>>An observation I have is that the pass should at least share the transform stage to some extent with the existing tracer pass (tracer.c) which essentially does >>the same but not restricted to loops in any way.
>>
>> The following piece of code from tracer.c can be shared with the existing path splitting pass.
>>
>> {
>>              e = find_edge (bb, bb2);
>>
>>               copy = duplicate_block (bb2, e, bb);
>>               flush_pending_stmts (e);
>>
>>               add_phi_args_after_copy (&copy, 1, NULL); }
>>
>> Sharing the above code of the transform stage of tracer.c with the path splitting pass has the following limitation.
>>
>> 1. The duplicated loop latch node is wired to its predecessors and 
>> the existing phi node in the loop latch node with the Phi arguments 
>> from its corresponding predecessors is moved to the duplicated loop latch node that is wired into its predecessors. Due To this, the duplicated loop latch nodes wired into its predecessors will not be merged with the original predecessors by CFG cleanup phase .
>>
>>>> So I wonder if your pass could be simply another heuristic to compute paths to trace in the existing tracer pass.
>>
>> Sorry, I am not very clear when you say the above.  I am trying to 
>> figure out whether you expect the existing pass of tracer.c should be modified Or the path splitting pass should coexist.
>
>>>Yes, I was wondering whether tracer.c could be simply modified.  Both transforms are doing something very similar.
>>>Yes, your pass would simply compute extra traces based on the new heuristic.
>
> I have observed the following with the tracer pass optimization.
>
> Tracer Pass:
>
> 1. The tracer pass is FDO optimizations  and is not enabled by default at O2. This optimization is enabled with -fprofile-use.
> 2. The -ftracer flag is used to enable the optimization explicitly in the absence of FDO optimization.
> 3. The tracer pass optimizations is placed at only place before the 
> dominator_pass. Moving the tracer pass before copy_prop Pass gives the following error. " Error : pass tracer does not support cloning".

This is an error from the pass manager, you added a second tracer pass instead of moving it.

> 4. The code for tracer pass is totally based on the FDO related information and the logic is based on the profile data.

>>Yes.

> Path Spliiting pass:
>
> 1. Having the path splitting as a separate pass is enabled by default at  >= O2.

>>Well, it's debatable on whether you want to enable it at -O2.  I seriously doubt that.

> 2. No FDO information is required in the path splitting pass.
> 3. The Path Splitting pass can be placed anywhere well before any 
> optimizations pass. I have placed the path splitting pass before Copy_prop and it works. Also placing before the dominator also works fine.

>>Same for tracer - see above.

> 4. The code for path splitting as a separate pass is purely based on non profile and Non FDO data.
> 5. Placing the path splitting pass as a separate pass  can be placed 
> anywhere in the optimizations. The optimizations that got benefitted with the Path splitting pass are PRE , CCP, DCE and can be placed well before the optimizations.
> 6. At the first phase of path splitting pass I am duplicating the loop 
> latch node to its predecessor and make it SSA and loop structure 
> Preserved. Making the path splitting as separate pass I would like to extend this pass to the multiple latch node with a forwarding block and the multiple latch nodes are edged towards the forwarder block making it one Loop latch edge.
>
> With the above observation, Do you think the existing tracer pass 
> should be modified and the path splitting pass should be incorporated On top of the existing tracer pass.

>>I still think both passes do a fundamentally similar transform and thus should be able to share data structures ('path' representation), parts of analysis (can >>this path be duplicated?) and the transform stage.  Yes, it might be that two instances of the pass will run in the end, one doing path splitting and one doing >>tracing, at different times in the optimization pipeline.

>>We have another similar transform with similar needs on data structures and analysis / transform.  Jump threading.

Thanks. I am going to make the following change and send for review.

 Both path splitting and traces should be a separate pass and the common code between them  should Be abstracted out and the path splitting pass and the tracer pass should use this abstracted common piece of code.

In the next phase of the path splitting transformation, I would like to make the basic block that dominates the IF node and has the successor of the given block is IF-node then the given block is duplicated to THEN and ELSE path similar to head duplication thus making the scope of path splitting to enable more of PRE,DCE and CCP optimizations.

Please let me know if there is any concern.

Thanks & Regards
Ajit

>>It would be nice to have a common machinery here, for example to assess cost of doing a trace.

Richard.

> Kindly give your feedback.
>
> Thanks & Regards
> Ajit
>
> Richard.
>
>> Thanks & Regards
>> Ajit
>>
>> Thanks,
>> Richard.
>>
>>>     Signed-off-by:Ajit Agarwal ajitkum@xilinx.com.
>>>
>>> gcc/Makefile.in                              |   1 +
>>>  gcc/common.opt                               |   4 +
>>>  gcc/opts.c                                   |   1 +
>>>  gcc/passes.def                               |   1 +
>>>  gcc/testsuite/gcc.dg/path-split-1.c          |  65 ++++
>>>  gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c |  62 ++++
>>>  gcc/timevar.def                              |   1 +
>>>  gcc/tree-pass.h                              |   1 +
>>>  gcc/tree-ssa-path-split.c                    | 462 +++++++++++++++++++++++++++
>>>  9 files changed, 598 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
>>>  create mode 100644 gcc/tree-ssa-path-split.c
>>>
>>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
>>> 5f9261f..35ac363
>>> 100644
>>> --- a/gcc/Makefile.in
>>> +++ b/gcc/Makefile.in
>>> @@ -1476,6 +1476,7 @@ OBJS = \
>>>         tree-vect-slp.o \
>>>         tree-vectorizer.o \
>>>         tree-vrp.o \
>>> +        tree-ssa-path-split.o \
>>>         tree.o \
>>>         valtrack.o \
>>>         value-prof.o \
>>> diff --git a/gcc/common.opt b/gcc/common.opt index e104269..c63b100
>>> 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -2328,6 +2328,10 @@ ftree-vrp
>>>  Common Report Var(flag_tree_vrp) Init(0) Optimization  Perform 
>>> Value Range Propagation on trees
>>>
>>> +ftree-path-split
>>> +Common Report Var(flag_tree_path_split) Init(0) Optimization 
>>> +Perform Path Splitting
>>> +
>>>  funit-at-a-time
>>>  Common Report Var(flag_unit_at_a_time) Init(1) Optimization Compile 
>>> whole compilation unit at a time diff --git a/gcc/opts.c 
>>> b/gcc/opts.c index 8a16116..31947ff 100644
>>> --- a/gcc/opts.c
>>> +++ b/gcc/opts.c
>>> @@ -508,6 +508,7 @@ static const struct default_options default_options_table[] =
>>>      { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 },
>>>      { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
>>>      { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
>>> +    { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 },
>>>
>>>      /* -O3 optimizations.  */
>>>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL,
>>> 1 }, diff --git a/gcc/passes.def b/gcc/passes.def index 
>>> c0ddee4..43618eb
>>> 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3.  If not see
>>>        NEXT_PASS (pass_ccp);
>>>        /* After CCP we rewrite no longer addressed locals into SSA
>>>          form if possible.  */
>>> +      NEXT_PASS (pass_path_split);
>>>        NEXT_PASS (pass_copy_prop);
>>>        NEXT_PASS (pass_complete_unrolli);
>>>        NEXT_PASS (pass_phiprop);
>>> diff --git a/gcc/testsuite/gcc.dg/path-split-1.c
>>> b/gcc/testsuite/gcc.dg/path-split-1.c
>>> new file mode 100644
>>> index 0000000..075dc87
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/path-split-1.c
>>> @@ -0,0 +1,65 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2 " } */
>>> +
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +
>>> +#define RGBMAX 255
>>> +
>>> +int
>>> +test()
>>> +{
>>> +  int i, Pels;
>>> +  unsigned char sum = 0;
>>> +  unsigned char xr, xg, xb;
>>> +  unsigned char xc, xm, xy, xk;
>>> +  unsigned char *ReadPtr, *EritePtr;
>>> +
>>> +  ReadPtr = ( unsigned char *) malloc (sizeof (unsigned char) * 
>>> + 100); EritePtr = ( unsigned char *) malloc (sizeof (unsigned char)
>>> + * 100);
>>> +
>>> +  for (i = 0; i < 100;i++)
>>> +     {
>>> +       ReadPtr[i] = 100 - i;
>>> +     }
>>> +
>>> +  for (i = 0; i < 100; i++)
>>> +     {
>>> +       xr = *ReadPtr++;
>>> +       xg = *ReadPtr++;
>>> +       xb = *ReadPtr++;
>>> +
>>> +       xc = (unsigned char) (RGBMAX - xr);
>>> +       xm = (unsigned char) (RGBMAX - xg);
>>> +       xy = (unsigned char) (RGBMAX - xb);
>>> +
>>> +       if (xc < xm)
>>> +         {
>>> +           xk = (unsigned char) (xc < xy ? xc : xy);
>>> +         }
>>> +       else
>>> +        {
>>> +          xk = (unsigned char) (xm < xy ? xm : xy);
>>> +        }
>>> +
>>> +       xc = (unsigned char) (xc - xk);
>>> +       xm = (unsigned char) (xm - xk);
>>> +       xy = (unsigned char) (xy - xk);
>>> +
>>> +       *EritePtr++ = xc;
>>> +       *EritePtr++ = xm;
>>> +       *EritePtr++ = xy;
>>> +       *EritePtr++ = xk;
>>> +       sum += *EritePtr;
>>> +    }
>>> +  return sum;
>>> +}
>>> +
>>> +int
>>> +main()
>>> +{
>>> +  if (test() != 33)
>>> +    abort();
>>> +
>>> +  return 0;
>>> +}
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
>>> b/gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
>>> new file mode 100644
>>> index 0000000..19f277c
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
>>> @@ -0,0 +1,62 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-path_split" } */
>>> +
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +
>>> +#define RGBMAX 255
>>> +
>>> +int
>>> +test()
>>> +{
>>> +  int i, Pels;
>>> +  unsigned char sum = 0;
>>> +  unsigned char xr, xg, xb;
>>> +  unsigned char xc, xm, xy, xk;
>>> +  unsigned char *ReadPtr, *EritePtr;
>>> +
>>> +  ReadPtr = (unsigned char *) malloc (sizeof (unsigned char) * 
>>> + 100); EritePtr = ( unsigned char *) malloc (sizeof (unsigned char)
>>> + * 100);
>>> +
>>> +  for (i = 0; i < 100;i++)
>>> +     {
>>> +       ReadPtr[i] = 100 - i;
>>> +     }
>>> +
>>> +  for (i = 0; i < 100; i++)
>>> +     {
>>> +       xr = *ReadPtr++;
>>> +       xg = *ReadPtr++;
>>> +       xb = *ReadPtr++;
>>> +
>>> +       xc = ( unsigned char) (RGBMAX - xr);
>>> +       xm = ( unsigned char) (RGBMAX - xg);
>>> +       xy = ( unsigned char) (RGBMAX - xb);
>>> +
>>> +       if (xc < xm)
>>> +         {
>>> +           xk = ( unsigned char) (xc < xy ? xc : xy);
>>> +         }
>>> +       else
>>> +         {
>>> +           xk = ( unsigned char) (xm < xy ? xm : xy);
>>> +         }
>>> +
>>> +       xc = (unsigned char) (xc - xk);
>>> +       xm = (unsigned char) (xm - xk);
>>> +       xy = (unsigned char) (xy - xk);
>>> +
>>> +       *EritePtr++ = xc;
>>> +       *EritePtr++ = xm;
>>> +       *EritePtr++ = xy;
>>> +       *EritePtr++ = xk;
>>> +       sum += *EritePtr;
>>> +    }
>>> +  return sum;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "xc_[0-9][0-9]* -> { xc_[0-9][0-9]* }"
>>> +"path_split"} } */
>>> +/* { dg-final { scan-tree-dump "xm_[0-9][0-9]* -> { xm_[0-9][0-9]* }"
>>> +"path_split"} } */
>>> +/* { dg-final { scan-tree-dump "xy_[0-9][0-9]* -> { xy_[0-9][0-9]* }"
>>> +"path_split"} } */
>>> +/* { dg-final { scan-tree-dump "Merging blocks" "path_split"} } */
>>> +/* { dg-final { cleanup-tree-dump "path_split" } } */
>>> diff --git a/gcc/timevar.def b/gcc/timevar.def index 
>>> 711bbed..6217a8e
>>> 100644
>>> --- a/gcc/timevar.def
>>> +++ b/gcc/timevar.def
>>> @@ -288,3 +288,4 @@ DEFTIMEVAR (TV_JIT_REPLAY        , "replay of JIT client activity")
>>>  DEFTIMEVAR (TV_ASSEMBLE             , "assemble JIT code")
>>>  DEFTIMEVAR (TV_LINK                 , "link JIT code")
>>>  DEFTIMEVAR (TV_LOAD                 , "load JIT result")
>>> +DEFTIMEVAR (TV_TREE_PATH_SPLIT  , "tree path_split")
>>> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 
>>> 398ab83..e00639e
>>> 100644
>>> --- a/gcc/tree-pass.h
>>> +++ b/gcc/tree-pass.h
>>> @@ -379,6 +379,7 @@ extern gimple_opt_pass *make_pass_iv_optimize 
>>> (gcc::context *ctxt);  extern gimple_opt_pass 
>>> *make_pass_tree_loop_done (gcc::context *ctxt);  extern 
>>> gimple_opt_pass *make_pass_ch (gcc::context *ctxt);  extern 
>>> gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
>>> +extern gimple_opt_pass *make_pass_path_split (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context 
>>> *ctxt);  extern gimple_opt_pass *make_pass_build_ssa (gcc::context 
>>> *ctxt);  extern gimple_opt_pass *make_pass_build_alias (gcc::context 
>>> *ctxt); diff --git a/gcc/tree-ssa-path-split.c 
>>> b/gcc/tree-ssa-path-split.c new file mode 100644 index
>>> 0000000..3da7791
>>> --- /dev/null
>>> +++ b/gcc/tree-ssa-path-split.c
>>> @@ -0,0 +1,462 @@
>>> +/* Support routines for Path Splitting.
>>> +   Copyright (C) 2015 Free Software Foundation, Inc.
>>> +   Contributed by Ajit Kumar Agarwal <ajitkum@xilinx.com>.
>>> +
>>> + This file is part of GCC.
>>> +
>>> + GCC is free software; you can redistribute it and/or modify it 
>>> + under the terms of the GNU General Public License as published by 
>>> + the Free Software Foundation; either version 3, or (at your
>>> + option) any later version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT 
>>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 
>>> +or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>>> +License for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License 
>>> +along with GCC; see the file COPYING3.  If not see 
>>> +<http://www.gnu.org/licenses/>.  */
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "tm.h"
>>> +#include "flags.h"
>>> +#include "tree.h"
>>> +#include "stor-layout.h"
>>> +#include "calls.h"
>>> +#include "predict.h"
>>> +#include "vec.h"
>>> +#include "hashtab.h"
>>> +#include "hash-set.h"
>>> +#include "machmode.h"
>>> +#include "hard-reg-set.h"
>>> +#include "input.h"
>>> +#include "function.h"
>>> +#include "dominance.h"
>>> +#include "cfg.h"
>>> +#include "cfganal.h"
>>> +#include "basic-block.h"
>>> +#include "tree-ssa-alias.h"
>>> +#include "internal-fn.h"
>>> +#include "gimple-fold.h"
>>> +#include "tree-eh.h"
>>> +#include "gimple-expr.h"
>>> +#include "is-a.h"
>>> +#include "gimple.h"
>>> +#include "gimple-iterator.h"
>>> +#include "gimple-walk.h"
>>> +#include "gimple-ssa.h"
>>> +#include "tree-cfg.h"
>>> +#include "tree-phinodes.h"
>>> +#include "ssa-iterators.h"
>>> +#include "stringpool.h"
>>> +#include "tree-ssanames.h"
>>> +#include "tree-ssa-loop-manip.h"
>>> +#include "tree-ssa-loop-niter.h"
>>> +#include "tree-ssa-loop.h"
>>> +#include "tree-into-ssa.h"
>>> +#include "tree-ssa.h"
>>> +#include "tree-pass.h"
>>> +#include "tree-dump.h"
>>> +#include "gimple-pretty-print.h"
>>> +#include "diagnostic-core.h"
>>> +#include "intl.h"
>>> +#include "cfgloop.h"
>>> +#include "tree-scalar-evolution.h"
>>> +#include "tree-ssa-propagate.h"
>>> +#include "tree-chrec.h"
>>> +#include "tree-ssa-threadupdate.h"
>>> +#include "expr.h"
>>> +#include "insn-codes.h"
>>> +#include "optabs.h"
>>> +#include "tree-ssa-threadedge.h"
>>> +#include "wide-int.h"
>>> +
>>> +/* Replace_uses_phi function propagates the phi results with the
>>> +   first phi argument into each of the copied join blocks wired into
>>> +   its predecessors. This function is called from the replace_uses_phi
>>> +   to replace the uses of first phi arguments with the second
>>> +   phi arguments in the next copy of join block.  */
>>> +
>>> +static void
>>> +replace_use_phi_operand1_with_operand2 (basic_block b,
>>> +                                        tree use1,
>>> +                                        tree use2) {
>>> +  use_operand_p use;
>>> +  ssa_op_iter iter;
>>> +  gimple_stmt_iterator gsi;
>>> +
>>> +  for (gsi = gsi_start_bb (b); !gsi_end_p (gsi);)
>>> +     {
>>> +       gimple stmt = gsi_stmt (gsi);
>>> +       FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE)
>>> +       {
>>> +         tree tuse = USE_FROM_PTR (use);
>>> +          if (use1 == tuse || use1 == NULL_TREE)
>>> +            {
>>> +              propagate_value (use, use2);
>>> +              update_stmt(stmt);
>>> +            }
>>> +        }
>>> +       gsi_next(&gsi);
>>> +     }
>>> +}
>>> +
>>> +/* This function propagates the phi result into the use points with
>>> +   the phi arguments. The join block is copied and wired into the
>>> +   predecessors. Since the use points of the phi results will be same
>>> +   in the each of the copy join blocks in the  predecessors, it
>>> +   propagates the phi arguments in the copy of the join blocks wired
>>> +   into its predecessor.  */
>>> +
>>> +static
>>> +void replace_uses_phi (basic_block b, basic_block temp_bb) {
>>> +  gimple_seq phis = phi_nodes (b);
>>> +  gimple phi = gimple_seq_first_stmt (phis);
>>> +  tree def = gimple_phi_result (phi), use = gimple_phi_arg_def 
>>> +(phi,0);
>>> +  tree use2 = gimple_phi_arg_def (phi,1);
>>> +
>>> +  if (virtual_operand_p (def))
>>> +    {
>>> +      imm_use_iterator iter;
>>> +      use_operand_p use_p;
>>> +      gimple stmt;
>>> +
>>> +      FOR_EACH_IMM_USE_STMT (stmt, iter, def)
>>> +        FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
>>> +          SET_USE (use_p, use);
>>> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (def))
>>> +        SSA_NAME_OCCURS_IN_ABNORMAL_PHI (use) = 1;
>>> +    }
>>> +   else
>>> +     replace_uses_by (def, use);
>>> +   replace_use_phi_operand1_with_operand2 (temp_bb, use, use2); }
>>> +
>>> +/* Returns true if the block bb has label or call statements.
>>> +   Otherwise return false.  */
>>> +
>>> +static bool
>>> +is_block_has_label_call (basic_block bb) {
>>> +  gimple_stmt_iterator gsi;
>>> +
>>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>> +     {
>>> +       gimple stmt = gsi_stmt(gsi);
>>> +       if (dyn_cast <glabel *> (stmt))
>>> +         {
>>> +           return true;
>>> +         }
>>> +       if (is_gimple_call (stmt))
>>> +         return true;
>>> +     }
>>> +  return false;
>>> +}
>>> +
>>> +/* This function performs the feasibility tests for path splitting
>>> +   to perform. Return false if the feasibility for path splitting
>>> +   is not done and returns true if the feasbility for path splitting
>>> +   is done. Following feasibility tests are performed.
>>> +
>>> +   1. Return false if the join block has call gimple statements.
>>> +   2. Return false if the join block has rhs casting for assign
>>> +      gimple statements.
>>> +   3. If the number of phis is greater than 1 or the phi node in
>>> +      the join block has virtual operand return false.
>>> +   4. Return false if the number of sequential statements is
>>> +      greater than 2.
>>> +   5. If the predecessors blocks has labels and call statements
>>> +      return false.
>>> +   6. If the phi result in the phi node of the join block is not
>>> +      used inside the same join block return false.
>>> +   7. Otherwise returns true.  */
>>> +
>>> +static bool
>>> +is_feasible_path_splitting (basic_block join_node, basic_block pred1,
>>> +                           basic_block pred2) {
>>> +  int num_stmt = 0, num_phis = 0;
>>> +  gimple_stmt_iterator psi, gsi;
>>> +
>>> +  for (gsi = gsi_start_bb (join_node); !gsi_end_p (gsi); gsi_next (&gsi))
>>> +     {
>>> +       gimple stmt = gsi_stmt(gsi);
>>> +
>>> +       if (gimple_assign_cast_p (stmt))
>>> +         return false;
>>> +
>>> +       if (is_gimple_call (stmt))
>>> +         return false;
>>> +
>>> +       if (!is_gimple_debug(stmt))
>>> +         {
>>> +           num_stmt++;
>>> +         }
>>> +     }
>>> +
>>> +   if (pred1 && pred2 && (num_stmt > 2))
>>> +     {
>>> +       bool found_virtual_result = false;
>>> +
>>> +       for (psi = gsi_start_phis (join_node); !gsi_end_p (psi); )
>>> +          {
>>> +            use_operand_p use_p;
>>> +            imm_use_iterator iter;
>>> +            gimple stmt = gsi_stmt(psi);
>>> +
>>> +            if (!virtual_operand_p (gimple_phi_result (stmt)))
>>> +              num_phis++;
>>> +            else
>>> +              found_virtual_result = true;
>>> +
>>> +            FOR_EACH_IMM_USE_FAST (use_p, iter, gimple_phi_result (stmt))
>>> +            {
>>> +              gimple use_stmt = USE_STMT (use_p);
>>> +
>>> +              if (gimple_bb (use_stmt) != join_node)
>>> +                return false;
>>> +            }
>>> +
>>> +            gsi_next(&psi);
>>> +         }
>>> +
>>> +       if ((num_phis >1) || found_virtual_result)
>>> +          return false;
>>> +
>>> +       if(is_block_has_label_call(pred1) || is_block_has_label_call(pred2))
>>> +         return false;
>>> +
>>> +       return true;
>>> +    }
>>> +  return false;
>>> +}
>>> +
>>> +/* Update the statements in the basic block with the basic
>>> +   basic block.  */
>>> +
>>> +static void
>>> +update_stmt_bb(basic_block b)
>>> +{
>>> +  gimple_stmt_iterator gsi;
>>> +  for(gsi = gsi_start_bb(b); !gsi_end_p(gsi); gsi_next(&gsi))
>>> +   {
>>> +     gimple stmt = gsi_stmt(gsi);
>>> +     gimple_set_bb(stmt,b);
>>> +   }
>>> +}
>>> +
>>> +/* This function gets the join blocks same as the source
>>> +   node of the loop latch nodes and the predecessors of
>>> +   the join block is updated in the pred1 and pred2 passed
>>> +   as the reference arguments into the function. Return
>>> +   the join block.  */
>>> +
>>> +static basic_block
>>> +get_join_blk_same_as_loop_latch (basic_block bb,
>>> +                                 basic_block &pred1,
>>> +                                 basic_block &pred2) {
>>> +  vec<basic_block> bbs;
>>> +  basic_block bb1;
>>> +  unsigned int i;
>>> +  edge_iterator ei;
>>> +  edge e1;
>>> +  bool found = false ,found1;
>>> +  bbs = get_all_dominated_blocks (CDI_DOMINATORS,
>>> +                                  bb );
>>> +  FOR_EACH_VEC_ELT (bbs, i, bb1)
>>> +  {
>>> +    found1 = false;
>>> +    FOR_EACH_EDGE (e1, ei, bb->succs)
>>> +    {
>>> +      if ( bb1 == e1->dest)
>>> +        {
>>> +          found = true;
>>> +          found1 = true;
>>> +        }
>>> +    }
>>> +    if (!found1 && found)
>>> +      {
>>> +        found = false;
>>> +        FOR_EACH_EDGE (e1, ei, bb1->succs)
>>> +        {
>>> +          if (e1->flags & (EDGE_DFS_BACK))
>>> +            found = true;
>>> +        }
>>> +
>>> +        if (found && EDGE_COUNT(bb1->preds) == 2)
>>> +          {
>>> +            unsigned int k = 0;
>>> +            FOR_EACH_EDGE (e1, ei, bb1->preds)
>>> +            {
>>> +              if ((e1->flags & (EDGE_DFS_BACK)))
>>> +                continue;
>>> +
>>> +              if ( k == 1)
>>> +                {
>>> +                  if (single_succ_p(e1->src) &&
>>> +                      single_succ_edge (e1->src)->flags & EDGE_FALLTHRU)
>>> +                    {
>>> +                      pred2 = e1->src;
>>> +                    }
>>> +                }
>>> +                else
>>> +                  {
>>> +                    if (single_succ_p(e1->src) &&
>>> +                        single_succ_edge (e1->src)->flags & EDGE_FALLTHRU)
>>> +                      {
>>> +                        pred1 = e1->src;
>>> +                      }
>>> +                  }
>>> +                k++;
>>> +            }
>>> +            bbs.release();
>>> +            return bb1;
>>> +          }
>>> +       }
>>> +   }
>>> +   bbs.release();
>>> +   return NULL;
>>> +}
>>> +
>>> +/* This is the core function to perform path splitting. The join
>>> +   same as the source of the loop latch node is identified along
>>> +   with their predecessors. Based on the feasibility tests for
>>> +   path splitting the path splitting is performed by wiring the
>>> +   copy of join blocks into the predecessors and propagating the phi
>>> +   result with the corresponding phi arguments into each of the copy
>>> +   of join blocks wired with the original predecessors of the join
>>> +   block.
>>> +
>>> +   The  tree-cfg-cleanup will merge the blocks in the predecessors
>>> +   path and the update-ssa will update the ssa representation after
>>> +   the path splitting is performed.  */
>>> +
>>> +static void
>>> +perform_path_splitting (basic_block bb) {
>>> +  basic_block pred1 = NULL, pred2 = NULL, join_block = NULL;
>>> +
>>> +  join_block = get_join_blk_same_as_loop_latch (bb, pred1, pred2);
>>> +
>>> +  if (join_block  &&
>>> +      is_feasible_path_splitting (join_block, pred1, pred2))
>>> +    {
>>> +      basic_block new_bb1 = NULL, new_bb2 = NULL;
>>> +      gimple_stmt_iterator last;
>>> +      basic_block temp_bb = NULL;
>>> +      edge_iterator ei;
>>> +      edge e1;
>>> +
>>> +      temp_bb = duplicate_block (join_block, NULL, NULL);
>>> +
>>> +      FOR_EACH_EDGE (e1, ei, pred1->succs)
>>> +        new_bb1 = split_edge (e1);
>>> +
>>> +      FOR_EACH_EDGE (e1, ei, pred2->succs)
>>> +        new_bb2 = split_edge (e1);
>>> +
>>> +      last = gsi_start_bb (new_bb1);
>>> +      gsi_insert_seq_after (&last, bb_seq (join_block), GSI_NEW_STMT);
>>> +      last = gsi_start_bb (new_bb2);
>>> +      gsi_insert_seq_after (&last, bb_seq (temp_bb), GSI_NEW_STMT);
>>> +      update_stmt_bb (new_bb1);
>>> +      update_stmt_bb (new_bb2);
>>> +
>>> +      replace_uses_phi (join_block, new_bb2);
>>> +
>>> +      set_bb_seq (join_block, NULL);
>>> +      set_bb_seq(temp_bb,NULL);
>>> +      delete_basic_block (temp_bb);
>>> +      return;
>>> +    }
>>> +}
>>> +
>>> +static unsigned int
>>> +execute_path_split (void)
>>> +{
>>> +  basic_block bb;
>>> +
>>> +  loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS); 
>>> + initialize_original_copy_tables();
>>> +
>>> +  calculate_dominance_info (CDI_DOMINATORS); 
>>> + calculate_dominance_info (CDI_POST_DOMINATORS);
>>> +
>>> +  mark_dfs_back_edges ();
>>> +
>>> +  FOR_EACH_BB_FN (bb, cfun)
>>> +  {
>>> +    gimple last;
>>> +
>>> +    /* We only care about blocks ending in a COND_EXPR. */
>>> +
>>> +    last = gsi_stmt (gsi_last_bb (bb));
>>> +
>>> +    /* We're basically looking for a switch or any kind of conditional with
>>> +       integral or pointer type arguments.  Note the type of the second
>>> +       argument will be the same as the first argument, so no need to
>>> +       check it explicitly.  */
>>> +    if ((last && (gimple_code (last) == GIMPLE_COND
>>> +            && TREE_CODE (gimple_cond_lhs (last)) == SSA_NAME
>>> +            && (INTEGRAL_TYPE_P (TREE_TYPE (gimple_cond_lhs (last)))
>>> +            || POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (last))))
>>> +            && (TREE_CODE (gimple_cond_rhs (last)) == SSA_NAME
>>> +            || is_gimple_min_invariant (gimple_cond_rhs (last))))))
>>> +      {
>>> +
>>> +         if (gimple_code(last) == GIMPLE_COND)
>>> +           {
>>> +              perform_path_splitting (bb);
>>> +           }
>>> +      }
>>> +   }
>>> +
>>> +   loop_optimizer_finalize ();
>>> +   free_original_copy_tables ();
>>> +   free_dominance_info (CDI_DOMINATORS);
>>> +   free_dominance_info (CDI_POST_DOMINATORS);
>>> +   return 0;
>>> +}
>>> +
>>> +namespace {
>>> +
>>> +const pass_data pass_data_path_split = {
>>> +   GIMPLE_PASS, /* type */
>>> +   "path_split", /* name */
>>> +    OPTGROUP_NONE, /* optinfo_flags */
>>> +    TV_TREE_PATH_SPLIT, /* tv_id */
>>> +    PROP_ssa, /* properties_required */
>>> +    0, /* properties_provided */
>>> +    0, /* properties_destroyed */
>>> +    0, /* todo_flags_start */
>>> +    ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */ 
>>> +};
>>> +
>>> +class pass_path_split : public gimple_opt_pass {
>>> +   public:
>>> +    pass_path_split (gcc::context *ctxt)
>>> +      : gimple_opt_pass (pass_data_path_split, ctxt)
>>> +    {}
>>> +
>>> +   /* opt_pass methods: */
>>> +   opt_pass * clone () { return new pass_path_split (m_ctxt); }
>>> +   virtual bool gate (function *) { return flag_tree_path_split != 0; }
>>> +   virtual unsigned int execute (function *) { return 
>>> + execute_path_split (); }
>>> +
>>> +}; // class pass_path_split
>>> +
>>> +} // anon namespace
>>> +
>>> +gimple_opt_pass *
>>> +make_pass_path_split (gcc::context *ctxt) {
>>> +  return new pass_path_split (ctxt); }
>>> --
>>> 1.8.2.1
>>>
>>> Thanks & Regards
>>> Ajit

Attachment: path-splitting.patch
Description: path-splitting.patch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]