[PATCH] local-factoring on TREE-SSA

Tue Jul 20 23:31:00 GMT 2004

jasy@inf.u-szeged.hu writes:

> Hello,
> 
> as Gabor Loki mentioned at the GCC Summit, we are trying to implement our
> local factoring algorithm on TREE-SSA. (Earlier we sent our
> algoritm, which works on RTL:
> http://gcc.gnu.org/ml/gcc-patches/2004-03/msg01907.html)
> 
> The attached patch contains a new TREE-SSA optimization phase, which
> carries out statement hoisting (but currently it deals only with the
> simpliest cases.) In the future we plan to add lots of improvements (e.g.
> code sinking), but first we are interested in your opinions
> about the current form of the algorithm.
> 
> Let's see the current results:
> arm-elf: 0.143%
> i686-linux: 0.085%
> ppc-elf: 0.120%
> 
> (Measured on the CSiBE benchmark (v1.x.x). The figures represent code size
> reduction compared to the mainline.)
> 
> The patch was regtested on {arm,ppc}-elf and i686-linux with no new failures.
> 
> 
> So, what do you think? Is this patch OK for the mainline?
> 
> Regards,
>   Judit Jasz
> 
> 
> 
> 2004-07-16 Judit Jasz <jasy@inf.u-szeged.hu>
> 
>  * tree-ssa-lfact.c: New file.
>  * Makefile.in: (tree-ssa-lfact.o) Add.
>  * common.opt: (ftree-lfact) New flag.
>  * flags.h: (flag_tree_lfact): Declare.
>  * opts.c: (decode_options): Set.
>  * tree-optimize.c: (init_tree_optimization_passes): Add pass_tree_lfact.
> * tree-pass.h: (pass_tree_lfact): Declare.
> 
> Index: flags.h
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/flags.h,v
> retrieving revision 1.144
> diff -u -r1.144 flags.h
> --- flags.h	29 Jun 2004 01:53:02 -0000	1.144
> +++ flags.h	16 Jul 2004 09:42:47 -0000
> @@ -379,14 +379,14 @@
>  extern int flag_schedule_speculative_load_dangerous;
>  
>  /* The following flags have an effect during scheduling after register
> -   allocation:   
> +   allocation:
>  
>     sched_stalled_insns means that insns can be moved prematurely from the queue
>     of stalled insns into the ready list.
>  
> -   sched_stalled_insns_dep controls how many recently scheduled cycles will 
> +   sched_stalled_insns_dep controls how many recently scheduled cycles will
>     be examined for a dependency on a stalled insn that is candidate for
> -   premature removal from the queue of stalled insns into the ready list (has 
> +   premature removal from the queue of stalled insns into the ready list (has
>     an effect only if the flag 'sched_stalled_insns' is set).  */

 These seem to be random white space changes.  There are lots of these in
this file.  Perhaps you could just file a separate patch and make this one
shorter.

> +/* Nonzero means enable all factoring algorithms.  */
> +extern int flag_tree_lfact;
> +

 I don't think this is needed anymore since this flag is defined in a
generated file.

> Index: tree-ssa-lfact.c
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/tree-ssa-lfact.c,v
> diff -u -N tree-ssa-lfact.c
> --- /dev/null	2004-06-04 12:53:11.000000000 +0200
> +++ tree-ssa-lfact.c	2004-07-16 11:51:41.000000000 +0200
> @@ -0,0 +1,887 @@
> +/* Local factoring (code hoisting/sinking) on SSA trees.
> +   Copyright (C) 2004 Free Software Foundation, Inc.
> +   Contributed by Judit Jasz <jasy@inf.u-szeged.hu>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 2, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING.  If not, write to
> +the Free Software Foundation, 59 Temple Place - Suite 330,
> +Boston, MA 02111-1307, USA.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "rtl.h"
> +#include "basic-block.h"
> +#include "resource.h"
> +#include "ggc.h"
> +#include "flags.h"
> +#include "tree.h"
> +#include "tree-flow.h"
> +#include "tree-inline.h"
> +#include "tree-pass.h"
> +#include "tree-ssa-operands.h"
> +
> +/* This file provides the statement hoisting on TREE SSA.  The steps 
> +   of the algoritm:
> +     1. We collect the basic blocks, which have same parents, and 
> +        the parents haven't got any child in addition.  The basic 
> +	blocks in the same set are brothers, and these blocks are the 
> +	components of the same 'brother' set, and chained by the 
> +	next_brother and prev_brother fields of bb_decorator.
> +     2. Apart from that cases where a potantial hoisting is not 
> +        economic, we analyze the different 'brother' sets.
> +          2.a.) First we collect the potential hoistable statements 
> +	        from the investigated set.
> +	  2.b.) We try to hoist the potential hoistable statements.
> +     3. We are repeating the algoritm from step 2., while we can hoist
> +        any statement.  */
> +

 I think a picture of showing what is and isn't a brother set may help.
How about a description like:

 The algorithm is:
  1) Collect basic blocks that have the same parents into brother sets.

     e.g. block0
          if (...) { block1 } else { block2 }
     Block's 1 and 2 are in a brother set.
  2) Find potential hoistable statements.
  3) Try to hoist the potential hoistable statements, ensuring that hoisting
     the statement is a valuable transformation.
  4) Do step 3 for each potentially hoistable statement in a brother statement.
  5) Do steps 2-4 for each brother set.

> +static int gain;
> +static tree global_tree;
> +
> +extern bool is_factoring (void);
> +
> +typedef struct bb_decorator_def
> +{
> +  struct bb_decorator_def *next_brother;        /* pointer to the next basic
> +                                                   block in the 'brother' set 
> +                                                 */
> +  struct bb_decorator_def *prev_brother;
> +  basic_block curr;             /* the decorated basic block */
> +  struct bb_decorator_def *prev_decorator, *next_decorator;
> +  block_stmt_iterator moveable_node;
> +  int last_depth;               /* the depth of the last investigated stmt.  */
> +} *bb_decorator;
> +
> +/* Local factoring - hoisting  */
> +static void collect_full_pred_brother (bb_decorator);
> +static int compare_pred_blocks (basic_block, basic_block);
> +static int tree_hoist (bb_decorator);
> +static tree tree_find_first_from_begin (bb_decorator decorator, int);
> +static int tree_find_insn_from_begin (tree, bb_decorator, int);
> +static int simple_stmt (tree);
> +static int stmt_hoist (bb_decorator);
> +static int tree_moveable_from_begin (basic_block, tree);
> +
> +/* Common functions  */
> +static int commutable_stmts (tree, tree);
> +static void get_stmts_of_bb (bb_decorator);
> +static int compare_vars (tree, tree);
> +static int compare_stmts (tree, tree);
> +static int num_of_succ_blocks (basic_block);
> +static int num_of_pred_blocks (basic_block);
> +static bb_decorator init_factoring (bb_decorator);
> +static void free_bb_decorator_list (bb_decorator);
> +static void collect_brother (bb_decorator,
> +                             int (*f) (basic_block, basic_block));
> +static void cost_analyzer (bb_decorator, int (*f) (basic_block));
> +static void cost_analyzer_1 (bb_decorator, int (*f) (basic_block));
> +static void delete_brothers (bb_decorator);
> +void tree_ssa_local_factoring (void);
> +static void replace_immediate_uses (tree, def_optype, tree);
> +static int find_stmt_in_set (htab_t *, tree);
> +static int is_assignment_stmt (tree);
> +static void replace_def_vars (bb_decorator, tree);
> +
> +/* Dumps  */
> +static void dump_brothers (FILE *, bb_decorator);
> +
> +
> +/* Global variables  */
> +static htab_t stmt_table1;
> +static htab_t *table1_p;
> +static htab_t stmt_table2;
> +static htab_t *table2_p;
> +
> +static int
> +is_assignment_stmt (tree stmt)
> +{
> +  if (TREE_CODE (stmt) == MODIFY_EXPR)
> +    return 1;
> +  return 0;
> +}
> +

 Is this actually easier to use than TREE_CODE (stmt) == MODIFY_EXPR
directly?

> +/* Auxiliary functions for hashing statements  */
> +static hashval_t
> +stmt_hash (const void *item)
> +{
> +  tree stmt = (tree) item;
> +  if (!is_assignment_stmt (stmt))
> +    return (iterative_hash_expr (stmt, 0));
> +  return (iterative_hash_expr (TREE_OPERAND (stmt, 2), 0));
> +}
> +
> +static int
> +stmt_equal (const void *p1, const void *p2)
> +{
> +  tree t1 = (tree) p1;
> +  tree t2 = (tree) p2;
> +
> +  enum tree_code code = TREE_CODE (t1);
> +

 CODE isn't used again, why not propagate it into the if stmt.

> +  if (TREE_CODE (t2) != code || TREE_TYPE (t1) != TREE_TYPE (t2))
> +    return 0;
> +
> +  if (simple_cst_equal (t1, t2) != 1)
> +    return 0;
> +  if (stmt_hash (p1) != stmt_hash (p2))
> +    abort ();
> +
> +  return 1;
> +}
> +
> +static int
> +find_stmt_in_set (htab_t * table, tree stmt)
                            ^^ extra space
> +{
> +  if (is_assignment_stmt (stmt))
> +    {
> +      if (htab_find (*table, TREE_OPERAND (stmt, 1)))
> +        return 1;
> +    }
> +  return 0;
> +}
> +
> +/* Collection of the statments, which are included by all basic blocks
> +   chained from decorator.  The DECORATOR is the first element of the
> +   'brother' set, and we can travel the whole set by the help of the 
> +   'next_brother' field of bb_decorator.  In the result *table1_p 
> +   contains the common statements of the investigated 'brother' set.  */
> +static void
> +get_stmts_of_bb (bb_decorator decorator)
> +{
> +  basic_block bb = decorator->curr;
> +  bb_decorator temp;
> +  block_stmt_iterator si;
> +  void **slot;
> +
> +  htab_empty (stmt_table1);
> +  htab_empty (stmt_table2);
> +
> +  /* Collection of the statements of the first block.  */
> +  for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
> +    {
> +      tree stmt = bsi_stmt (si);
> +      if (is_assignment_stmt (stmt))
> +        {
> +          if (!htab_find (stmt_table1, TREE_OPERAND (stmt, 1)))
> +            {
> +              slot =
> +                htab_find_slot (stmt_table1, TREE_OPERAND (stmt, 1), INSERT);
> +              *slot = TREE_OPERAND (stmt, 1);
> +            }
> +        }
> +    }
> +  table1_p = &stmt_table1;
> +  table2_p = &stmt_table2;
> +
> +  /* Collection of the common statements of the next block and till now
> +     investigated blocks.  */
> +  for (temp = decorator->next_brother; temp; temp = temp->next_brother)
> +    {
> +      basic_block bb_temp = temp->curr;
> +      htab_t *temp_table;
> +      for (si = bsi_start (bb_temp); !bsi_end_p (si); bsi_next (&si))
> +        {
> +          tree stmt = bsi_stmt (si);
> +          if (is_assignment_stmt (stmt))
> +            {
> +              /* if we find a stmt, which is included by the table pointed by
> +                 table1_p, we insert it into the other table.  */
> +              if (htab_find (*table1_p, TREE_OPERAND (stmt, 1)))
> +                {
> +                  slot =
> +                    htab_find_slot (*table2_p, TREE_OPERAND (stmt, 1),
> +                                    INSERT);
> +                  *slot = TREE_OPERAND (stmt, 1);
> +                }
> +            }
> +        }
> +
> +      htab_empty (*table1_p);
> +      temp_table = table1_p;
> +      table1_p = table2_p;
> +      table2_p = temp_table;
> +    }
> +}
> +
> +/* The number of the succ basic blocks of bb.  */
> +static int
> +num_of_succ_blocks (basic_block bb)
> +{
> +  int num = 0;
> +  edge e;
> +  for (e = bb->succ; e; e = e->succ_next)
> +    num++;
> +  return num;
> +}
> +
> +/* The number of the pred basic blocks of bb. */
> +static int
> +num_of_pred_blocks (basic_block bb)
> +{
> +  int num = 0;
> +  edge e;
> +  for (e = bb->pred; e; e = e->pred_next)
> +    num++;
> +  return num;
> +}
> +
> +/* Find the basic block 'what' between the pred blocks of 'where'.  */
> +static int
> +find_bb_in_pred_blocks (basic_block what, basic_block where)
> +{
> +  edge e;
> +  for (e = where->pred; e; e = e->pred_next)
> +    if (what == e->src)
> +      return 1;
> +  return 0;
> +}
> +
> +/* Compare the set of pred blocks of bb1 and bb2. If it isn't different
> +   return 1.  */
> +static int
> +compare_pred_blocks (basic_block bb1, basic_block bb2)
> +{
> +  edge e;
> +  if (num_of_pred_blocks (bb1) != num_of_pred_blocks (bb2))
> +    return 0;
> +  for (e = bb1->pred; e; e = e->pred_next)
> +    if (!find_bb_in_pred_blocks (e->src, bb2))
> +      return 0;

 Do you need to do this this way?  Can't you just go the predecessors of
bb1 and bb2 at the same time?  This looks to be 2n + n^2 for the true
case and at least n + m for the false case.

> +  return 1;
> +}
> +
> +/* Initialization of the chain of bb_decorator! For each bb there is a bb_decorator in the chain.  */

 This line seems a bit long. Many of the functions after this point also have
comments use more than 80 characters per line.

> +static bb_decorator
> +init_factoring (bb_decorator decorator)
> +{
> +  basic_block bb;
> +  int out_of_mem = 0;
> +
> +  bb_decorator last = NULL;
> +  FOR_EACH_BB (bb)
> +  {
> +    bb_decorator temp;
> +
> +    temp =
> +      (struct bb_decorator_def *) xmalloc (sizeof (struct bb_decorator_def));

 You can use calloc instead of malloc + memset.

> +    if (!temp)
> +      {
> +        out_of_mem = 1;
> +        break;
> +      }
> +    memset (temp, 0, sizeof (*temp));

-- 
Thanks,
Jim

http://www.student.cs.uwaterloo.ca/~ja2morri/
http://phython.blogspot.com
http://open.nit.ca/wiki/?page=jim