This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] analysis of global statics and removal associatedvdefs and vuses


The patch looks OK in general, but it touches parts of the
compiler that I do not maintain.  A global or cgraph maintainer
should look at that.

Regarding the compile time degradation, I think we need to change
the way we query the global static bitmaps.  See below for my
proposal.  If that doesn't fix it, more detailed profiling will
be necessary.

The analysis itself does not seem expensive.  The call graph
analysis times are almost identical.  Also, the optimization of
call-clobbered variables triggers very rarely, if at all (a 0.2%
reduction in virtual operands for cc1-i-files and a 0% reduction
for PR8361), so I don't expect register pressure to be a problem
for now.

>     * tree-ssa-operands.c (get_call_expr_operands): Change. Added parm
>     to add_call_clobber_ops and add_call_read_ops.
>     (add_call_clobber_ops, add_call_read_ops): Change.  Added code to
>     reduce the number of vdefs and vuses inserted based on analysis of
>     global variables across calls.
>     * tree-dfa.c (find_referenced_vars): Change.  Needed to reset
>     static var maps before each function is compiled.
>
No need to say 'Change', just describe what the change is.

>     * cgraphunit.c:
>     (static_vars_to_consider_by_tree,static_vars_to_consider_by_uid,
>     static_vars_info,functions_to_static_vars_info,module_statics_escape,
>     all_module_statics,searchc_env,dfs_info): New fields to support
>     analysis of static global static variables.
>
static global variables.

>     (print_order
>     convert_UIDs_in_bitmap, cgraph_reset_static_var_maps
>     get_static_vars_info, get_local_static_vars_info
>     get_statics_not_read_for_function_call
>     get_statics_not_written_for_function_call
>     get_statics_read_in_function, get_statics_written_in_function
>     searchc, cgraph_reduced_inorder, has_proper_scope_for_analysis
>     check_rhs_var, check_lhs_var, get_asm_expr_operands
>     process_call_for_static_vars, scan_for_static_refs
>     cgraph_characterize_statics_local
>     cgraph_get_static_name_by_uid, clear_static_vars_maps
>     cgraph_propagate_bits, cgraph_characterize_statics):
>     New. Functions to support analysis of static global variables.

>     (cgraph_mark_local_functions): Renamed to:
>     (cgraph_mark_local_and_external_functions)
>
The custom is to say it the other way around

      (cgraph_mark_local_and_external_function): Renamed from
      cgraph_mark_local_functions.

>     (cgraph_expand_all_functions): Changed to call
>     cgraph_mark_local_and_external_functions.
>
No.  The change removed the call to
cgraph_mark_functions_to_output.

>     (cgraph_optimize): Changed. Added driver to analyze
>     static variables.
>
Better to be more specific

	(cgraph_optimize): Call cgraph_characterize_statics.
	Move call to cgraph_mark_functions_to_output from ...
	(cgraph_expand_all_functions): ... here.

>     * cgraph.h (struct cgraph_local_info, GTY): Added statics_read,
>     statics_written, local, calls_read_all, calls_write_all,
>     for_functions_valid.
>     (struct cgraph_node): Added next_cycle.
>     * cgraph.c (dump_cgraph_node): Added print routines for new fields.
> 
You are missing a ChangeLog entry for changes in Makefile.in.

> *** gcc/Makefile.in	26 Aug 2004 17:10:46 -0000	1.1361
> --- gcc/Makefile.in	26 Aug 2004 21:18:44 -0000
> *************** INTEGRATE_H = integrate.h varray.h
> *** 697,702 ****
> --- 697,703 ----
>   LOOP_H = loop.h varray.h bitmap.h
>   CFGLAYOUT_H = cfglayout.h $(BASIC_BLOCK_H)
>   CFGLOOP_H = cfgloop.h $(BASIC_BLOCK_H) $(RTL_H)
> + CGRAPH_H = cgraph.h bitmap.h tree.h $(HASHTAB_H)
>   DF_H = df.h bitmap.h sbitmap.h $(BASIC_BLOCK_H)
>   DDG_H = ddg.h sbitmap.h $(DF_H)
>   GCC_H = gcc.h version.h
>
These seem fine and almost obvious, but I'd like a Makefile
maintainer to go over them.  This is out of my area.

> + 
> +   /* Local statics that are read and written by this function and
> +      indexed by DECL_UID.  */
> +   bitmap statics_read;
> +   bitmap statics_written;
> + 
> +   /* Same as above except indexed by var_ann (VAR_DECL)->uid.  */
> +   bitmap statics_read_for_function;
> +   bitmap statics_written_for_function;
> + 
>
These names are a bit confusing.  Perhaps a better naming scheme
would be something like:

statics_read_by_decl_uid
statics_written_by_decl_uid

statics_read_by_ann_uid
statics_written_by_ann_uid

Also, add a FIXME note about future changes that will allow us to
go back to just two bitmaps.  If we built the call graph with
functions already in SSA form (or at least low GIMPLE), we would
not need this trickery.


> + struct static_vars_info GTY(()) 
> + {
> +   bitmap statics_read;
> +   bitmap statics_written;
> +   bitmap statics_not_read;
> +   bitmap statics_not_written;
> +   bitmap statics_not_read_for_function;
> +   bitmap statics_not_written_for_function;
> + 
Similarly to the other bitmaps.  The '_by_decl_uid' and
'_by_ann_uid' naming convention seems less confusing.

Now, why do we have similarly looking bitmaps in two places?  I'm
not terribly familiar with the cgraph node.  The versions in
cgraph_local_info appear to be write-only.

> +   n = splay_tree_min (functions_to_static_vars_info);
> +   while(n) 
>
while (n)

> + bitmap 
> + get_statics_read_in_function (tree fn)
>
Not used anywhere.

> + bitmap 
> + get_statics_written_in_function (tree fn)
>
Likewise.

> + 
> + /* This is an implementation of tarjan's strongly connected region
                                    Tarjan's

[ OT: Don't we have several incarnations of this in GCC?  oh, well. ]


> +    finder as reprinted in AHU75 pages 192-193.  The env parameter is
                                                        ENV

AHU75 will be meaningless to most people.  Could you include a
full citation?  A standard algorithm book should be fine and I'm
sure there's hundreds of online references.


> +    because it is recursive and there are no nested functions here.  
> +    This function should only be called from cgraph_reduced_inorder. */
>
Blank line after comment.  Two spaces after final period.

> + /* Topsort the call graph by caller relation.  Put the result in order.
                                                                     ORDER
> + 
> +    The Reduce flag is true if you want the cycles reduced to single
           REDUCE

> +    nodes.  Only consider nodes that have the output bit set. */
>
Blank line after comment.  Two spaces after final period.


> + /* FIXME this needs to be enhanced.  If we are compiling a single
> +    module this returns true if the variable is a module level static,
> +    but if we are doing whole program compilation, this could return
> +    true if TREE_PUBLIC is true. */
> + 
> + static inline
> + int has_proper_scope_for_analysis (tree t)
>
Predicates return 'bool'.  An FE or IMA maintainer should comment
on this FIXME note.

> + 
> + /* Check to see if t is an assignement to a static var we are
                       T

> +    interrested in analyzing.  fn is passed in to get access to its bit
                                  FN

> + /* This is a scaled down version of get_asm_expr_operands from
> +    tree_ssa_operands.c.  The version there runs much later and assumes
> +    that aliasing information is already available. Here we are just
> +    trying to find if the set of inputs and outputs contain references
> +    or address of operations to local static variables.  fn is the
> +    function being analyzed and stmt is the actual asm statement.
> + 
> +    It is possible that someone who knew the code better could combine
> +    these two functions, but the magic in the other version is too
> +    subtle for me, fkz */
>
No.  The other function assumes that we are already inside the
tree optimizers, referenced variables have been discovered, we
are in low GIMPLE, etc.  What will happen is that when we have
the whole callgraph in low GIMPLE we won't need this traversal to
discover referenced variables.

> +   
> +   /* this call was moved here from cgraph_expand_all_functions so that
> +      cgraph_characterize_statics could use the output flag of the cgraph
> +      node. */
> +   
Capitalize 'this'.  Two spaces after final period.


> +       /* Get info for module level statics.  There is a bit set for
> + 	 each static if the call being processed does not read or
> + 	 write that variable.  */
> + 
> +       bitmap not_read_b = callee 
> + 	? get_statics_not_read_for_function_call (callee) : NULL; 
> +       bitmap not_written_b = callee 
> + 	? get_statics_not_written_for_function_call (callee) : NULL; 
> + 
> + 
>
Hmm, these two calls inside the operand scanner are bound to be
expensive.  You are looking up a splay tree inside a pretty hot
code path.

This may explain the rather significant slow downs I described
before on x86 and x86_64.  I think we should change this.

How about the following?

1- We create a new structure to hold not_read_b and not_written_b
   for every CALL_EXPR and add a pointer to this structure in
   stmt_ann_d.

2- In find_referenced_vars, we fill in that structure for every
   CALL_EXPR that we find.

3- In add_call_clobber_ops and add_call_read_ops we just query
   these bitmaps.



Diego.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]