[PATCH] local-factoring on TREE-SSA
James A. Morrison
ja2morri@csclub.uwaterloo.ca
Tue Jul 20 23:31:00 GMT 2004
jasy@inf.u-szeged.hu writes:
> Hello,
>
> as Gabor Loki mentioned at the GCC Summit, we are trying to implement our
> local factoring algorithm on TREE-SSA. (Earlier we sent our
> algoritm, which works on RTL:
> http://gcc.gnu.org/ml/gcc-patches/2004-03/msg01907.html)
>
> The attached patch contains a new TREE-SSA optimization phase, which
> carries out statement hoisting (but currently it deals only with the
> simpliest cases.) In the future we plan to add lots of improvements (e.g.
> code sinking), but first we are interested in your opinions
> about the current form of the algorithm.
>
> Let's see the current results:
> arm-elf: 0.143%
> i686-linux: 0.085%
> ppc-elf: 0.120%
>
> (Measured on the CSiBE benchmark (v1.x.x). The figures represent code size
> reduction compared to the mainline.)
>
> The patch was regtested on {arm,ppc}-elf and i686-linux with no new failures.
>
>
> So, what do you think? Is this patch OK for the mainline?
>
> Regards,
> Judit Jasz
>
>
>
> 2004-07-16 Judit Jasz <jasy@inf.u-szeged.hu>
>
> * tree-ssa-lfact.c: New file.
> * Makefile.in: (tree-ssa-lfact.o) Add.
> * common.opt: (ftree-lfact) New flag.
> * flags.h: (flag_tree_lfact): Declare.
> * opts.c: (decode_options): Set.
> * tree-optimize.c: (init_tree_optimization_passes): Add pass_tree_lfact.
> * tree-pass.h: (pass_tree_lfact): Declare.
>
> Index: flags.h
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/flags.h,v
> retrieving revision 1.144
> diff -u -r1.144 flags.h
> --- flags.h 29 Jun 2004 01:53:02 -0000 1.144
> +++ flags.h 16 Jul 2004 09:42:47 -0000
> @@ -379,14 +379,14 @@
> extern int flag_schedule_speculative_load_dangerous;
>
> /* The following flags have an effect during scheduling after register
> - allocation:
> + allocation:
>
> sched_stalled_insns means that insns can be moved prematurely from the queue
> of stalled insns into the ready list.
>
> - sched_stalled_insns_dep controls how many recently scheduled cycles will
> + sched_stalled_insns_dep controls how many recently scheduled cycles will
> be examined for a dependency on a stalled insn that is candidate for
> - premature removal from the queue of stalled insns into the ready list (has
> + premature removal from the queue of stalled insns into the ready list (has
> an effect only if the flag 'sched_stalled_insns' is set). */
These seem to be random white space changes. There are lots of these in
this file. Perhaps you could just file a separate patch and make this one
shorter.
> +/* Nonzero means enable all factoring algorithms. */
> +extern int flag_tree_lfact;
> +
I don't think this is needed anymore since this flag is defined in a
generated file.
> Index: tree-ssa-lfact.c
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/tree-ssa-lfact.c,v
> diff -u -N tree-ssa-lfact.c
> --- /dev/null 2004-06-04 12:53:11.000000000 +0200
> +++ tree-ssa-lfact.c 2004-07-16 11:51:41.000000000 +0200
> @@ -0,0 +1,887 @@
> +/* Local factoring (code hoisting/sinking) on SSA trees.
> + Copyright (C) 2004 Free Software Foundation, Inc.
> + Contributed by Judit Jasz <jasy@inf.u-szeged.hu>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 2, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING. If not, write to
> +the Free Software Foundation, 59 Temple Place - Suite 330,
> +Boston, MA 02111-1307, USA. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "rtl.h"
> +#include "basic-block.h"
> +#include "resource.h"
> +#include "ggc.h"
> +#include "flags.h"
> +#include "tree.h"
> +#include "tree-flow.h"
> +#include "tree-inline.h"
> +#include "tree-pass.h"
> +#include "tree-ssa-operands.h"
> +
> +/* This file provides the statement hoisting on TREE SSA. The steps
> + of the algoritm:
> + 1. We collect the basic blocks, which have same parents, and
> + the parents haven't got any child in addition. The basic
> + blocks in the same set are brothers, and these blocks are the
> + components of the same 'brother' set, and chained by the
> + next_brother and prev_brother fields of bb_decorator.
> + 2. Apart from that cases where a potantial hoisting is not
> + economic, we analyze the different 'brother' sets.
> + 2.a.) First we collect the potential hoistable statements
> + from the investigated set.
> + 2.b.) We try to hoist the potential hoistable statements.
> + 3. We are repeating the algoritm from step 2., while we can hoist
> + any statement. */
> +
I think a picture of showing what is and isn't a brother set may help.
How about a description like:
The algorithm is:
1) Collect basic blocks that have the same parents into brother sets.
e.g. block0
if (...) { block1 } else { block2 }
Block's 1 and 2 are in a brother set.
2) Find potential hoistable statements.
3) Try to hoist the potential hoistable statements, ensuring that hoisting
the statement is a valuable transformation.
4) Do step 3 for each potentially hoistable statement in a brother statement.
5) Do steps 2-4 for each brother set.
> +static int gain;
> +static tree global_tree;
> +
> +extern bool is_factoring (void);
> +
> +typedef struct bb_decorator_def
> +{
> + struct bb_decorator_def *next_brother; /* pointer to the next basic
> + block in the 'brother' set
> + */
> + struct bb_decorator_def *prev_brother;
> + basic_block curr; /* the decorated basic block */
> + struct bb_decorator_def *prev_decorator, *next_decorator;
> + block_stmt_iterator moveable_node;
> + int last_depth; /* the depth of the last investigated stmt. */
> +} *bb_decorator;
> +
> +/* Local factoring - hoisting */
> +static void collect_full_pred_brother (bb_decorator);
> +static int compare_pred_blocks (basic_block, basic_block);
> +static int tree_hoist (bb_decorator);
> +static tree tree_find_first_from_begin (bb_decorator decorator, int);
> +static int tree_find_insn_from_begin (tree, bb_decorator, int);
> +static int simple_stmt (tree);
> +static int stmt_hoist (bb_decorator);
> +static int tree_moveable_from_begin (basic_block, tree);
> +
> +/* Common functions */
> +static int commutable_stmts (tree, tree);
> +static void get_stmts_of_bb (bb_decorator);
> +static int compare_vars (tree, tree);
> +static int compare_stmts (tree, tree);
> +static int num_of_succ_blocks (basic_block);
> +static int num_of_pred_blocks (basic_block);
> +static bb_decorator init_factoring (bb_decorator);
> +static void free_bb_decorator_list (bb_decorator);
> +static void collect_brother (bb_decorator,
> + int (*f) (basic_block, basic_block));
> +static void cost_analyzer (bb_decorator, int (*f) (basic_block));
> +static void cost_analyzer_1 (bb_decorator, int (*f) (basic_block));
> +static void delete_brothers (bb_decorator);
> +void tree_ssa_local_factoring (void);
> +static void replace_immediate_uses (tree, def_optype, tree);
> +static int find_stmt_in_set (htab_t *, tree);
> +static int is_assignment_stmt (tree);
> +static void replace_def_vars (bb_decorator, tree);
> +
> +/* Dumps */
> +static void dump_brothers (FILE *, bb_decorator);
> +
> +
> +/* Global variables */
> +static htab_t stmt_table1;
> +static htab_t *table1_p;
> +static htab_t stmt_table2;
> +static htab_t *table2_p;
> +
> +static int
> +is_assignment_stmt (tree stmt)
> +{
> + if (TREE_CODE (stmt) == MODIFY_EXPR)
> + return 1;
> + return 0;
> +}
> +
Is this actually easier to use than TREE_CODE (stmt) == MODIFY_EXPR
directly?
> +/* Auxiliary functions for hashing statements */
> +static hashval_t
> +stmt_hash (const void *item)
> +{
> + tree stmt = (tree) item;
> + if (!is_assignment_stmt (stmt))
> + return (iterative_hash_expr (stmt, 0));
> + return (iterative_hash_expr (TREE_OPERAND (stmt, 2), 0));
> +}
> +
> +static int
> +stmt_equal (const void *p1, const void *p2)
> +{
> + tree t1 = (tree) p1;
> + tree t2 = (tree) p2;
> +
> + enum tree_code code = TREE_CODE (t1);
> +
CODE isn't used again, why not propagate it into the if stmt.
> + if (TREE_CODE (t2) != code || TREE_TYPE (t1) != TREE_TYPE (t2))
> + return 0;
> +
> + if (simple_cst_equal (t1, t2) != 1)
> + return 0;
> + if (stmt_hash (p1) != stmt_hash (p2))
> + abort ();
> +
> + return 1;
> +}
> +
> +static int
> +find_stmt_in_set (htab_t * table, tree stmt)
^^ extra space
> +{
> + if (is_assignment_stmt (stmt))
> + {
> + if (htab_find (*table, TREE_OPERAND (stmt, 1)))
> + return 1;
> + }
> + return 0;
> +}
> +
> +/* Collection of the statments, which are included by all basic blocks
> + chained from decorator. The DECORATOR is the first element of the
> + 'brother' set, and we can travel the whole set by the help of the
> + 'next_brother' field of bb_decorator. In the result *table1_p
> + contains the common statements of the investigated 'brother' set. */
> +static void
> +get_stmts_of_bb (bb_decorator decorator)
> +{
> + basic_block bb = decorator->curr;
> + bb_decorator temp;
> + block_stmt_iterator si;
> + void **slot;
> +
> + htab_empty (stmt_table1);
> + htab_empty (stmt_table2);
> +
> + /* Collection of the statements of the first block. */
> + for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
> + {
> + tree stmt = bsi_stmt (si);
> + if (is_assignment_stmt (stmt))
> + {
> + if (!htab_find (stmt_table1, TREE_OPERAND (stmt, 1)))
> + {
> + slot =
> + htab_find_slot (stmt_table1, TREE_OPERAND (stmt, 1), INSERT);
> + *slot = TREE_OPERAND (stmt, 1);
> + }
> + }
> + }
> + table1_p = &stmt_table1;
> + table2_p = &stmt_table2;
> +
> + /* Collection of the common statements of the next block and till now
> + investigated blocks. */
> + for (temp = decorator->next_brother; temp; temp = temp->next_brother)
> + {
> + basic_block bb_temp = temp->curr;
> + htab_t *temp_table;
> + for (si = bsi_start (bb_temp); !bsi_end_p (si); bsi_next (&si))
> + {
> + tree stmt = bsi_stmt (si);
> + if (is_assignment_stmt (stmt))
> + {
> + /* if we find a stmt, which is included by the table pointed by
> + table1_p, we insert it into the other table. */
> + if (htab_find (*table1_p, TREE_OPERAND (stmt, 1)))
> + {
> + slot =
> + htab_find_slot (*table2_p, TREE_OPERAND (stmt, 1),
> + INSERT);
> + *slot = TREE_OPERAND (stmt, 1);
> + }
> + }
> + }
> +
> + htab_empty (*table1_p);
> + temp_table = table1_p;
> + table1_p = table2_p;
> + table2_p = temp_table;
> + }
> +}
> +
> +/* The number of the succ basic blocks of bb. */
> +static int
> +num_of_succ_blocks (basic_block bb)
> +{
> + int num = 0;
> + edge e;
> + for (e = bb->succ; e; e = e->succ_next)
> + num++;
> + return num;
> +}
> +
> +/* The number of the pred basic blocks of bb. */
> +static int
> +num_of_pred_blocks (basic_block bb)
> +{
> + int num = 0;
> + edge e;
> + for (e = bb->pred; e; e = e->pred_next)
> + num++;
> + return num;
> +}
> +
> +/* Find the basic block 'what' between the pred blocks of 'where'. */
> +static int
> +find_bb_in_pred_blocks (basic_block what, basic_block where)
> +{
> + edge e;
> + for (e = where->pred; e; e = e->pred_next)
> + if (what == e->src)
> + return 1;
> + return 0;
> +}
> +
> +/* Compare the set of pred blocks of bb1 and bb2. If it isn't different
> + return 1. */
> +static int
> +compare_pred_blocks (basic_block bb1, basic_block bb2)
> +{
> + edge e;
> + if (num_of_pred_blocks (bb1) != num_of_pred_blocks (bb2))
> + return 0;
> + for (e = bb1->pred; e; e = e->pred_next)
> + if (!find_bb_in_pred_blocks (e->src, bb2))
> + return 0;
Do you need to do this this way? Can't you just go the predecessors of
bb1 and bb2 at the same time? This looks to be 2n + n^2 for the true
case and at least n + m for the false case.
> + return 1;
> +}
> +
> +/* Initialization of the chain of bb_decorator! For each bb there is a bb_decorator in the chain. */
This line seems a bit long. Many of the functions after this point also have
comments use more than 80 characters per line.
> +static bb_decorator
> +init_factoring (bb_decorator decorator)
> +{
> + basic_block bb;
> + int out_of_mem = 0;
> +
> + bb_decorator last = NULL;
> + FOR_EACH_BB (bb)
> + {
> + bb_decorator temp;
> +
> + temp =
> + (struct bb_decorator_def *) xmalloc (sizeof (struct bb_decorator_def));
You can use calloc instead of malloc + memset.
> + if (!temp)
> + {
> + out_of_mem = 1;
> + break;
> + }
> + memset (temp, 0, sizeof (*temp));
--
Thanks,
Jim
http://www.student.cs.uwaterloo.ca/~ja2morri/
http://phython.blogspot.com
http://open.nit.ca/wiki/?page=jim
More information about the Gcc-patches
mailing list