cgraph code documentation
Jan Hubicka
jh@suse.cz
Thu Sep 11 15:25:00 GMT 2003
> I see, I should also add introductonary comment into cgraphunit.c as the
> code concentrates quite a lot of complexity about how and when things
> are decided. I will do that tomorrow.
OK, it ended up to be longer than the comment should be so I am not
quite sure where it should go, but I tried to write both the status of
current code and what I would like to do in next month or two. Hope it
will be usefull for something.
The module cgraphunit can be used to drive compilation process of front-ends
that do use tree-based function-at-a-time compilation. The decisions on what
to compile and when are made here as well as some intra-procedural optimization
and memory management.
The API:
>From back-end programmer perspective important is the call-graph data structure:
The call-graph is data structure designed for intra-procedural optimization
but it is also used in non-unit-at-a-time compilation to allow easier code
sharing.
The call-graph consist of nodes and edges represented via linked lists.
Each function (external or not) corresponds to the unique node (in
contrast to tree DECL nodes where we can have multiple nodes for each
function).
??? Built-in functions are not inserted into the data structure that is not
good idea as we can't inline them then. Fix that. (requires revising of
test-suite as some test-cases does define builtin body but does not expect
it to be inlined)
The mapping from declarations to call-graph nodes is done using hash table
based on DECL_ASSEMBLER_NAME, so it is essential for assembler name to
not change once the declaration is inserted into the call-graph.
The call-graph nodes are created lazily using cgraph_node function when
called for unknown declaration.
??? Fix handling on asm("new_name") GCC feature. On the Hammer branch
there is set_decl_assembler_name function allowing the decl to be renamed,
but it would be probably better to disallow the later renaming. Glibc
currently depend on it in some special cases that can be dealt with
but are ugly.
When built, there is edge from node A and B when A may call B directly
(ie not via reference). It is possible that the reference will be later
optimized out. The call-graph is built conservatively in order to make
conservative data flow analysis possible.
In addition each node can be accessed in different way - from unknown
code using external linkage, via indirect call or such. Such nodes to
have NEEDED field set and data flow is supposed to mark them as accessible
from indirect calls, asm statements and as entry point of compilation unit.
??? It would be nice to add -fwhole-program flag that will mark only main
as needed so we get local optimization on global symbols. The global
symbols not marked as needed should be turned to local symbols that seems
to be problem (shall we rewrite DECLs or add new predicates on whether
symbol should be output as static even thought it is public in language
point of view)
The intra-procedural optimization is organized in three passes - an local
pass analyzing each function, global pass propagating the information
and compilation pass using information gathered for the optimization.
Such a organization is essential to allow function nodes to be stored
to external store in between parsing and local analysis and compilation.
Data maintained in call-graph are divided accordingly. Sub-structure LOCAL
is used for local data (for instance fact whether function is inlinable),
GLOBAL for global data (the fact whether function is accessed via reference,
for instance) and RTL for data propagated at expansion time (such as stack
alignment required by function stack frame).
The varpool data structure:
Varpool is used to maintain variables in similar manner as call-graph
is used for functions. Most of the API is symmetric replacing cgraph
function prefix by cgraph_varpool
Front-end is supposed to use following API:
- cgraph_finalize_function
This function is called once front-end has parsed whole body of function
and it is certain that the function body nor the declaration will change.
There is one exception needed for implementing GCC extern inline function.
While parsing the following test-case:
extern int foo (void)
{ body1(); }
int foo (void)
{ body2(); }
The cgraph_finalize_function is called twice for the same declaration
and it will update data structures accordingly.
- cgraph_varpool_finalize_variable
This function has same behaviour as the above but is used for static
variables.
- cgraph_finalize_compilation_unit
This function is called once compilation unit is finalized and it will
no longer change.
In the unit-at-a-time the call-graph construction and local function
analysis takes place here. Bodies of unreachable functions are released
to conserve memory usage.
??? The compilation unit in this point of view should be compilation
unit as defined by the language - for instance C frontend allows multiple
compilation units to be parsed at once and it should call function each
time parsing is done so we save memory.
- cgraph_optimize
In this unit-at-a-time compilation the intra procedural analysis takes
place here. In particular the static functions whose address is never
taken are marked as local. Backend can then use this information to
modify calling conventions, do better inlining or similar optimizations.
- cgraph_assemble_pending_functions
- cgraph_varpool_assemble_pending_variables
In non-unit-at-a-time mode these functions can be used to force compilation
of functions or variables that are known to be needed at given stage
of compilation
- cgraph_mark_needed_node
- cgraph_varpool_mark_needed_node
When function or variable is referenced by some hidden way (for instance
via assembly code and marked by attribute "used"), the call-graph data structure
must be updated accordingly by this function.
??? This is also used by objc frontend to emit functions later referenced by
type information data structures currently emit behind the cgraph code back.
This should be cleaned up
??? C++ frontend has notion of COMDAT functions/variables that need to be output
each time. We should have tree flag for that.
- analyze_function callback
This function is responsible for lowering tree nodes not understood by
generic code into understandable ones or alternatively marking
callgraph and varpool nodes referenced by the as needed.
??? On the tree-ssa genericizing should take place here and we will avoid
need for these hooks (replacing them by genericizing hook)
- expand_function callback
This function is used to expand function and pass it into RTL back-end.
Front-end should not make any assumptions about when this function can be
called. In particular cgraph_assemble_pending_functions,
cgraph_varpool_assemble_pending_variables, cgraph_finalize_function,
cgraph_varpool_finalize_function, cgraph_optimize can cause arbitrarily
previously finalized functions to be expanded.
??? In C++ front end this causes problems with garbage collection and
expansion recursing on itself. We need to add flag on whether the
expansion can take place
Some implementation details
Each function or variable can enter several stages:
- unfinalized
function whose node has been created but it has not been finalized yet.
(it's body or initializer has not been parsed yet or is not available
because it is external function/variable)
- finalized
function has body and will not change in future so it is ready to be
output.
- needed
function has address taken or is public or is marked via "used" flag.
??? perhaps entry_point is better name but does not work very well for
variables.
- reachable
function was added into cgraph_nodes_queue. When finalized function is
makred needed or needed function finalized, it becomes reachable
automatically.
We implement two compilation modes.
- unit-at-a-time: In this mode analyzing of all functions is deferred
to cgraph_finalize_compilation_unit and expansion into cgraph_optimize.
In cgraph_finalize_compilation_unit the reachable functions are analyzed.
During analysis the call-graph edges from reachable functions are constructed
and their destinations are marked as reachable. References to functions and
variables are discovered too and variables found to be needed output to
the assembly file. Via mark_referenced call in assemble_variable functions
referenced by static variables are noticed too.
??? We should only mark variables as needed and analyze them and defer
actual expansion to the very end testing whether the symbol really has been
referenced so we don't emit variables whose references has been optimized
out.
??? What to do with -fkeep-inline-functions?
The intra-procedural information is produced and it's existence indicated
by global_info_ready. Once this flag is set it is impossible to change
function from !reachable to reachable and thus assemble_variable no longer
call mark_referenced.
??? We probably should add sanity check into mark_referenced to verify this
Finally the call-graph is topologically sorted and all reachable functions
that has not been completely inlined or are not external are output.
??? It is possible that reference to function or variable is optimized out.
We can not deal with this nicely because topological order is not suitable
for it. For tree-ssa we may consider another pass doing optimization and
re-discovering reachable functions.
- non-unit-at-a-time
All functions are variables are output as early as possible to conserve
memory consumption. This may or may not result in less memory used but
it is still needed for some legacy code that rely on particular ordering
of things output from the compiler.
Varpool data structures are not used and variables are output directly.
??? This should be fixed in future.
Functions are output early using call of cgraph_assemble_pending_function
from cgraph_finalize_function. The decision on whether function is needed
is made more conservative so uninlininable static functions are needed too.
During the call-graph construction the edge destinations are not marked as
reachable and it is completely relied upn assemble_variable to mark them.
??? There is nasty side case with external functions with body. These
functions can be marked as reachable and processed by
cgraph_assemble_pending_function. The body is not expanded because it is
external. Later function may stop to be external, but it won't be
re-considered as it has been marked as reachable and is not re-queued.
cgraph_finalize_functio thus contain hack for re-queueing but perhaps we
can add another flag or avoid functions from being queued.
More information about the Gcc-patches
mailing list