This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Fix -fwhole-program on LTO
- From: Richard Guenther <rguenther at suse dot de>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: gcc-patches at gcc dot gnu dot org, dnovillo at google dot com
- Date: Wed, 7 Oct 2009 13:58:17 +0200 (CEST)
- Subject: Re: Fix -fwhole-program on LTO
- References: <20091006200322.GB19212@kam.mff.cuni.cz>
On Tue, 6 Oct 2009, Jan Hubicka wrote:
> Hi,
> this is first part of changes needed to get cgraph/varpool right in LTO
> and WHOPR. The patch also makes -fwhole-program effective when it is
> passed to the lto1 (so patch I sent earlier today is needed).
>
> The patch went through significand snowballing effect, but I don't think
> I can decompose it to incremental parts. There are several things we get
> wrong and fixing one reveals more problems.
>
> Following are the main changes:
>
> 1) needed flag was wrong WRT -fwhole-program.
> The node->needed flag marks functions that needs to be output to final
> program for other reason than direct calls from other functions output
> to final program. The reasons can be external visibility, fact that
> address was taken, used attribute or many other side cases.
> Once node becomes needed it always stays so, since reason for it
> becoming needed is lost and thus it must stay so.
>
> This does not work well with -fwhole-program: -fwhole-program switch
> is ignored at compile time when functions are marked needed based on
> visibility. So even if -fwhole-program switch is on at link time
> everything is already needed and there is nothing to do.
>
> The patch solves the problem by making decide_is_function_needed to behave
> as if whole program mode when LTO/WHOPR is active. This decision is later
> revisited by new whole-program pass schedule as first IPA pass run after
> LTO read in.
>
> I am working in direction of removing need for this flag and replacing it
> with detailed reason why node is needed. address_taken is there to mark
> nodes with addresses taken and I now added new predicates
> cgraph_only_called_directly_p used by IPA passes that originally tested
> "needed" flag to decide if it can assume that all calls to function are
> seen; cgraph_can_remove_if_no_direct_calls_p is predicate used by inliner
> and dead function removal to work out if function can be removed.
>
> The flags differs for COMDAT; these functions can be removed if they
> are not needed for other reasons but can not be considered to be always
> only called directly because if they stay they might be called from
> elsewhere.
>
> 2) externally_visible was wrong WRT whole-program
> Similar problem to needed flag. We decide on external visibility during
> compile time and we re-run visibility pass at link time. However at this
> time everythign is externally visible (and must be so to make compie time
> small IPA passes not take overly agressive assumptions about externally
> visible things) so whole program has no effect.
>
> I've moved all decisions on external visiblity to the visibility
> pass and it is now run twice; once as whole-program pass early.
> First time we compute external visibility as needed for compile time
> and later we revisit it and bring stuff local at link time.
>
> 3) Inline clones was messed up.
> In whopr we read back the inline clones and confuse them with real
> functions doing random stuff like marking them as address taken or
> as needed. I've added couple asserts for this and modified
> varpool to not re-analyze initializers when reading back.
>
> There are several issues I am aware of that are still wrong. I've added FIXMEs
> for those and intend to look into them incrementally. They affect primarily
> WHOPR: here we pass wrong callgraph to the ltrans stage and we mess up info
> about what is needed. As a result lto/lto.c is re-deciding what is needed but
> it can not do so correctly. Also optimization queue is restarted from wrong
> point so we re-run IPA passes in ltrans and mess up summaries.
>
> Also we are not making any attempt to properly store varpool, simply
> dump the nodes and re-build it. This won't work for WHOPR.
Btw, compared to before your patch we now have (compile & link status
is the same):
CINT2006
400.perlbench -- 0.242 RE
403.gcc -- 0.008 RE
445.gobmk -- 0.373 RE
471.omnetpp -- 0.243 RE
473.astar -- 9.39 VE
CFP2006
433.milc -- 0.029 RE (formerly VE)
436.cactusADM -- 0.514 RE
482.sphinx3 -- 3.95 RE
thus, miscompiles. The RE are exits with non-zero exit code (usually
segfaults).
Richard.