This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fix -fwhole-program on LTO


On Tue, 6 Oct 2009, Jan Hubicka wrote:

> Hi,
> this is first part of changes needed to get cgraph/varpool right in LTO
> and WHOPR.  The patch also makes -fwhole-program effective when it is
> passed to the lto1 (so patch I sent earlier today is needed).
> 
> The patch went through significand snowballing effect, but I don't think
> I can decompose it to incremental parts. There are several things we get
> wrong and fixing one reveals more problems.
> 
> Following are the main changes:
> 
>   1) needed flag was wrong WRT -fwhole-program.
>      The node->needed flag marks functions that needs to be output to final
>      program for other reason than direct calls from other functions output
>      to final program.  The reasons can be external visibility, fact that
>      address was taken, used attribute or many other side cases.
>      Once node becomes needed it always stays so, since reason for it
>      becoming needed is lost and thus it must stay so.
> 
>      This does not work well with -fwhole-program:  -fwhole-program switch
>      is ignored at compile time when functions are marked needed based on
>      visibility.  So even if -fwhole-program switch is on at link time
>      everything is already needed and there is nothing to do.
> 
>      The patch solves the problem by making decide_is_function_needed to behave
>      as if whole program mode when LTO/WHOPR is active.  This decision is later
>      revisited by new whole-program pass schedule as first IPA pass run after
>      LTO read in.
> 
>      I am working in direction of removing need for this flag and replacing it
>      with detailed reason why node is needed.  address_taken is there to mark
>      nodes with addresses taken and I now added new predicates
>      cgraph_only_called_directly_p used by IPA passes that originally tested
>      "needed" flag to decide if it can assume that all calls to function are
>      seen; cgraph_can_remove_if_no_direct_calls_p is predicate used by inliner
>      and dead function removal to work out if function can be removed.
> 
>      The flags differs for COMDAT; these functions can be removed if they
>      are not needed for other reasons but can not be considered to be always
>      only called directly because if they stay they might be called from
>      elsewhere.
> 
>   2) externally_visible was wrong WRT whole-program
>      Similar problem to needed flag.  We decide on external visibility during
>      compile time and we re-run visibility pass at link time.  However at this
>      time everythign is externally visible (and must be so to make compie time
>      small IPA passes not take overly agressive assumptions about externally
>      visible things) so whole program has no effect.
> 
>      I've moved all decisions on external visiblity to the visibility
>      pass and it is now run twice; once as whole-program pass early.
>      First time we compute external visibility as needed for compile time
>      and later we revisit it and bring stuff local at link time.
> 
>   3) Inline clones was messed up.
>      In whopr we read back the inline clones and confuse them with real
>      functions doing random stuff like marking them as address taken or
>      as needed.  I've added couple asserts for this and modified
>      varpool to not re-analyze initializers when reading back.
> 
> There are several issues I am aware of that are still wrong.  I've added FIXMEs
> for those and intend to look into them incrementally.  They affect primarily
> WHOPR: here we pass wrong callgraph to the ltrans stage and we mess up info
> about what is needed.  As a result lto/lto.c is re-deciding what is needed but
> it can not do so correctly.  Also optimization queue is restarted from wrong
> point so we re-run IPA passes in ltrans and mess up summaries.
> 
> Also we are not making any attempt to properly store varpool, simply
> dump the nodes and re-build it.  This won't work for WHOPR.

Btw, compared to before your patch we now have (compile & link status
is the same):

CINT2006
400.perlbench      --      0.242            RE
403.gcc            --      0.008            RE
445.gobmk          --      0.373            RE
471.omnetpp        --      0.243            RE
473.astar          --      9.39             VE

CFP2006
433.milc           --      0.029            RE  (formerly VE)
436.cactusADM      --      0.514            RE
482.sphinx3        --      3.95             RE

thus, miscompiles.  The RE are exits with non-zero exit code (usually
segfaults).

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]