Update LTO plugin interface

Cary Coutant ccoutant@google.com
Wed Dec 1 23:06:00 GMT 2010


>> That is what "Discard all previous inputs" in stage 2 linking is for.
>
> But what does that mean?  Are you saying that the linker interface to
> the plugin should change to work that way?  If we do that, then we
> should change other aspects of the plugin interface as well.  It could
> probably become quite a bit simpler.
>
> The only reason we would ever need to do a complete relink is if the LTO
> plugin can introduce arbitrary new symbol references.  Is that ever
> possible?  If it is, we need to rethink the whole approach.  If the LTO
> plugin can introduce arbitrary new symbol references, that means that
> LTO plugin can cause arbitrary objects to be pulled in from archives.
> And that means that if we only run the plugin once, we are losing
> possible optimizations, because the plugin will never those new objects.
>
> My suspicion is that the LTO plugin can only introduce a small bounded
> set of new symbol references, namely those which we assume can be
> satisified from -lc or -lgcc.  Is that true?

Exactly. The plugin API was designed for this model -- if you want to
start the link all over again, you may as well stick with the collect2
approach and enhance it to deal with archives of IR files.

The plugin API, as implemented in gold (not sure about gnu ld), does
maintain the original order of input files as far as symbol binding is
concerned. When IR files are claimed, the plugin provides the list of
symbols defined and referenced, and the linker builds the symbol table
as if those files were linked in at that particular spot in the
command line. When the compiler provides real definitions of those
symbols later, the real definitions simply replace the "placeholders"
that were left in the linker's symbol table. The only aspect of link
order that isn't maintained is the physical order of the sections in
memory.

As Ian noted, if the compiler introduces new references that weren't
there before, the new references must be from a limited set of
libcalls that the backend can introduce, and those should all be
resolved with an extra pass through -lc or -lgcc. That's not exactly
pretty, but I don't see how it destroys the notion of link order --
the only way those new symbols could have been resolved differently is
if a user library interposed definitions for the libcall, and those
certainly can't be what the compiler intended to bind to. In PR 12248,
I think it's questionable to claim that the compiler-introduced call
to __udivdi3 should not resolve to the version in libgcc. Sure, I
understand it's useful for library developers while debugging and
testing, but an ordinary user certainly can't count on his own
definition of that routine to get called -- the compiler might
generate the division inline, or call a different specialized version.
All of these routines are outside the user's namespace, and we should
be able to optimize without regard for what the user's libraries might
contain.

An improvement could be for the claim file handler to determine what
libcalls might be introduced and add them to the list of referenced
symbols so that the linker can bring in the definitions in the
original pass through the input files -- any that end up not being
referenced can be garbage collected. Alternatively, we could do a
whole-archive link of the library that contains the libcalls, again
discarding unreferenced routines via garbage collection. Neither of
these require a change to the API.

-cary



More information about the Gcc mailing list