cgraph code documentation

Thu Sep 11 15:25:00 GMT 2003

> I see, I should also add introductonary comment into cgraphunit.c as the
> code concentrates quite a lot of complexity about how and when things
> are decided.  I will do that tomorrow.
OK, it ended up to be longer than the comment should be so I am not
quite sure where it should go, but I tried to write both the status of
current code and what I would like to do in next month or two.  Hope it
will be usefull for something.

The module cgraphunit can be used to drive compilation process of front-ends
that do use tree-based function-at-a-time compilation.  The decisions on what
to compile and when are made here as well as some intra-procedural optimization
and memory management.

The API:

>From back-end programmer perspective important is the call-graph data structure:

    The call-graph is data structure designed for intra-procedural optimization
    but it is also used in non-unit-at-a-time compilation to allow easier code
    sharing.

    The call-graph consist of nodes and edges represented via linked lists.
    Each function (external or not) corresponds to the unique node (in
    contrast to tree DECL nodes where we can have multiple nodes for each
    function).

    ??? Built-in functions are not inserted into the data structure that is not
    good idea as we can't inline them then.  Fix that.  (requires revising of
    test-suite as some test-cases does define builtin body but does not expect
    it to be inlined)

    The mapping from declarations to call-graph nodes is done using hash table
    based on DECL_ASSEMBLER_NAME, so it is essential for assembler name to
    not change once the declaration is inserted into the call-graph.
    The call-graph nodes are created lazily using cgraph_node function when
    called for unknown declaration.

    ??? Fix handling on asm("new_name") GCC feature.  On the Hammer branch
    there is set_decl_assembler_name function allowing the decl to be renamed,
    but it would be probably better to disallow the later renaming.  Glibc
    currently depend on it in some special cases that can be dealt with
    but are ugly.

    When built, there is edge from node A and B when A may call B directly
    (ie not via reference).  It is possible that the reference will be later
    optimized out.  The call-graph is built conservatively in order to make
    conservative data flow analysis possible.

    In addition each node can be accessed in different way - from unknown
    code using external linkage, via indirect call or such.  Such nodes to
    have NEEDED field set and data flow is supposed to mark them as accessible
    from indirect calls, asm statements and as entry point of compilation unit.

    ??? It would be nice to add -fwhole-program flag that will mark only main
    as needed so we get local optimization on global symbols.  The global
    symbols not marked as needed should be turned to local symbols that seems
    to be problem (shall we rewrite DECLs or add new predicates on whether
    symbol should be output as static even thought it is public in language
    point of view)

    The intra-procedural optimization is organized in three passes - an local
    pass analyzing each function, global pass propagating the information
    and compilation pass using information gathered for the optimization.
    Such a organization is essential to allow function nodes to be stored
    to external store in between parsing and local analysis and compilation.

    Data maintained in call-graph are divided accordingly.  Sub-structure LOCAL
    is used for local data (for instance fact whether function is inlinable),
    GLOBAL for global data (the fact whether function is accessed via reference,
    for instance) and RTL for data propagated at expansion time (such as stack
    alignment required by function stack frame).

The varpool data structure:

    Varpool is used to maintain variables in similar manner as call-graph
    is used for functions.  Most of the API is symmetric replacing cgraph
    function prefix by cgraph_varpool

Front-end is supposed to use following API:

  - cgraph_finalize_function

    This function is called once front-end has parsed whole body of function
    and it is certain that the function body nor the declaration will change.

    There is one exception needed for implementing GCC extern inline function.
    While parsing the following test-case:
    extern int foo (void)
      { body1(); }
    int foo (void)
      { body2(); }
    The cgraph_finalize_function is called twice for the same declaration
    and it will update data structures accordingly.

  - cgraph_varpool_finalize_variable

    This function has same behaviour as the above but is used for static
    variables.

  - cgraph_finalize_compilation_unit

    This function is called once compilation unit is finalized and it will
    no longer change.

    In the unit-at-a-time the call-graph construction and local function
    analysis takes place here.  Bodies of unreachable functions are released
    to conserve memory usage.

    ???  The compilation unit in this point of view should be compilation
    unit as defined by the language - for instance C frontend allows multiple
    compilation units to be parsed at once and it should call function each
    time parsing is done so we save memory.

  - cgraph_optimize

    In this unit-at-a-time compilation the intra procedural analysis takes
    place here.  In particular the static functions whose address is never
    taken are marked as local.  Backend can then use this information to
    modify calling conventions, do better inlining or similar optimizations.

  - cgraph_assemble_pending_functions
  - cgraph_varpool_assemble_pending_variables

    In non-unit-at-a-time mode these functions can be used to force compilation
    of functions or variables that are known to be needed at given stage
    of compilation

  - cgraph_mark_needed_node
  - cgraph_varpool_mark_needed_node

    When function or variable is referenced by some hidden way (for instance
    via assembly code and marked by attribute "used"), the call-graph data structure
    must be updated accordingly by this function.

    ??? This is also used by objc frontend to emit functions later referenced by
    type information data structures currently emit behind the cgraph code back.
    This should be cleaned up

    ??? C++ frontend has notion of COMDAT functions/variables that need to be output
    each time.  We should have tree flag for that.

  - analyze_function callback

    This function is responsible for lowering tree nodes not understood by
    generic code into understandable ones or alternatively marking
    callgraph and varpool nodes referenced by the as needed.

    ??? On the tree-ssa genericizing should take place here and we will avoid
    need for these hooks (replacing them by genericizing hook)

  - expand_function callback

    This function is used to expand function and pass it into RTL back-end.
    Front-end should not make any assumptions about when this function can be
    called.  In particular cgraph_assemble_pending_functions,
    cgraph_varpool_assemble_pending_variables, cgraph_finalize_function,
    cgraph_varpool_finalize_function, cgraph_optimize can cause arbitrarily
    previously finalized functions to be expanded.

    ??? In C++ front end this causes problems with garbage collection and
    expansion recursing on itself.  We need to add flag on whether the
    expansion can take place

Some implementation details

    Each function or variable can enter several stages:
  - unfinalized
    function whose node has been created but it has not been finalized yet.
    (it's body or initializer has not been parsed yet or is not available
     because it is external function/variable)
  - finalized
    function has body and will not change in future so it is ready to be
    output.
  - needed
    function has address taken or is public or is marked via "used" flag.
    ??? perhaps entry_point is better name but does not work very well for
    variables.
  - reachable
    function was added into cgraph_nodes_queue.  When finalized function is
    makred needed or needed function finalized, it becomes reachable
    automatically.

    We implement two compilation modes.

  - unit-at-a-time:  In this mode analyzing of all functions is deferred
    to cgraph_finalize_compilation_unit and expansion into cgraph_optimize.

    In cgraph_finalize_compilation_unit the reachable functions are analyzed.
    During analysis the call-graph edges from reachable functions are constructed
    and their destinations are marked as reachable.  References to functions and
    variables are discovered too and variables found to be needed output to
    the assembly file.  Via mark_referenced call in assemble_variable functions
    referenced by static variables are noticed too.

    ??? We should only mark variables as needed and analyze them and defer
    actual expansion to the very end testing whether the symbol really has been
    referenced so we don't emit variables whose references has been optimized
    out.
    ??? What to do with -fkeep-inline-functions?

    The intra-procedural information is produced and it's existence indicated
    by global_info_ready.  Once this flag is set it is impossible to change
    function from !reachable to reachable and thus assemble_variable no longer
    call mark_referenced.

    ??? We probably should add sanity check into mark_referenced to verify this

    Finally the call-graph is topologically sorted and all reachable functions
    that has not been completely inlined or are not external are output.

    ??? It is possible that reference to function or variable is optimized out.
    We can not deal with this nicely because topological order is not suitable
    for it.  For tree-ssa we may consider another pass doing optimization and
    re-discovering reachable functions.

  - non-unit-at-a-time

    All functions are variables are output as early as possible to conserve
    memory consumption.  This may or may not result in less memory used but
    it is still needed for some legacy code that rely on particular ordering
    of things output from the compiler.

    Varpool data structures are not used and variables are output directly.
    ??? This should be fixed in future.

    Functions are output early using call of cgraph_assemble_pending_function
    from cgraph_finalize_function.  The decision on whether function is needed
    is made more conservative so uninlininable static functions are needed too.
    During the call-graph construction the edge destinations are not marked as
    reachable and it is completely relied upn assemble_variable to mark them.

    ??? There is nasty side case with external functions with body.  These
    functions can be marked as reachable and processed by
    cgraph_assemble_pending_function.  The body is not expanded because it is
    external.  Later function may stop to be external, but it won't be
    re-considered as it has been marked as reachable and is not re-queued.
    cgraph_finalize_functio thus contain hack for re-queueing but perhaps we
    can add another flag or avoid functions from being queued.