This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [whopr] Design/implementation alternatives for the driver and WPA


Diego Novillo <dnovillo@google.com> writes:

I have a feeling that the comments I wrote within Google about the
linker interface were lost.  I am going to try to recreate them here.


> The linker, upon start, examines a configuration file at a known
> location relative to its own location. If this file exists, it
> extracts the location of linker plugins (shared libraries) and loads
> those.  A fixed set of function interfaces needs to be implemented in
> the plugin, these functions are described below. One of many possible
> plugins is a plugin that controls LTO.
>
> Another way to locate a plugin would be via command-line.  This would
> make it easier for two different compilers (and therefore two
> different plugins) to use the same linker.

I think the plugin should always be specified on the command line, and
the linker should never search for it.  The plugin is inherently a
property of the compiler, not the linker.  We already expect that the
linker will always be invoked via the gcc driver program.  It is
trivial for the driver program to pass an option specifying the plugin
or the plugin directory.


> The linker performs regular symbol resolution. For each object file it
> touches, it calls a specific function in the plugin (int
> ldplugin_claim_file(const char *fname, size_t offset)). This
> function returns 1 if it intends to claim a file (e.g. it contains
> IR), and 0 if it doesn't.   The offset is used in the case of an
> archive file. This way the plugin doesn't need to understand archives.

There should be an interface to pass a pointer to the contents of the
file rather than the filename.  Otherwise each file has to be opened
twice, which is pointless.


> The linker also creates a list of all externally referenced symbols
> and passes these to the plugin via the function
> ldplugin_add_external_symbol(const char *mangled_name).
>
> '''TODO''': Would it be better to pass an abstract object to
> ldplugin_add_external_symbol? What should we pass to it if there are
> two symbols in IL files with the same name?  One strong and one weak
> for example.

"Externally referenced" is a bad term.  I think that is meant here is
"referenced by some part of the program which the plugin did not
claim".

There needs to be a way to specify the symbol version.

The interface should not require a separate function call for each
symbol.  This is inefficient.  Some executables have hundreds of
thousands of symbols.  There should be a way to pass a list of
symbols.

More seriously, this interface is much too simple.  In the general
case, for each input file, we need to specify the exact disposition of
each symbol.  If we don't provide a way for the linker to communicate
that to the plugin, then the plugin is forced to do symbol resolution
itself.  That is what we want to get away from.

My assumption is that the symbol table in an LTO object is fully
correct: it correctly reports weak symbols, section groups, etc.
Given that, the linker should be determining the symbol resolution.
For each defined symbol in the symbol table, the linker should say
whether that symbol should be included in the link.  For each
undefined symbol, the linker should say where the definition of that
symbol may be found--it could be in an LTO file or a non-LTO file.


> At this point, the linker calls the main entry point to the pluging
> (ldplugin_main(int argc, char *argv[]), passing its own arguments.
> It's the plugin's responsibility to extract its related {{{-Wx,...}}}
> values.

This does not make sense.  The linker options are complex and varied.
We do not want to require the plugin to understand how to parse them.
We need to define a different approach for sending options to the
plugin.


> '''TODO''': How do we handle symbols defined in more then one file?
> Should ldplugin_add_external_symbol take a abstract pointer/index into
> the linker symbol table?

Yes, this is required.


> '''TODO''': What is passed to ldpluging_claim_file if the file is in a
> .a file?

We should pass a buffer, not a file name.


> '''TODO:'''Are we assuming that the files with IL contain a
> normal symbol table? Should we make it possible for the plugin to call
> back into the linker to add symbols? This should make it possible to
> support a "full custom" file format for the IL files.

If the LTO files do not contain a normal symbol table, then the plugin
will have to provide one for the linker.  The symbol table provided by
the plugin will have to include symbol names and versions, weak
vs. strong, defined vs. common vs. undefined, symbol visibility,
symbol type, section group information.


> == Final Link - ld ==
> After all real object files have been generated, these files, along
> with the rest of the originally passed real object files, need to be
> passed to the linker. There are a few ways to do this:
>
>  * Call a plugin / linker interface which allows to explicitly add
>  files to the linker's internal data structures. '''TODO''': Unclear
>  about the consequences for linker file/code generation.
>
>  * Restart the linker with a new command line, where all original real
>  objects and the objects are being passed in. There are subtle
>  problems possible in terms of symbol resolution. Well - these
>  problems are always there, unless a 1x1 mapping from pre- to post-IPA
>  object files exist.
>
>  * WPA could call the linker, it has all proper command line options,
>  the plugin could do it, but only with difficulties, as WPA decides on
>  the actual number and names of the final real .o files. The plugin
>  could just pick up any object files it finds in the tmp directories,
>  but this may introduce problems - in case of actual problems or
>  debugging.
>
>  * What about adding individual symbols via an API call? The linker
>  will still be running during WPA. The plugin can collect the symbols
>  and pass them back to the linker. With this it shouldn't be necessary
>  to restart the linker. Final strategy to be determined.

Of these options, my preferences would be the first one or the last
one.  The linker must already do all the symbol resolution before
invoking the plugin.  We shouldn't go through that again.  Instead,
the plugin should pass the resulting object files back to the linker,
in one form or another.  The linker should not need to do any further
symbol resolution at that point.  The linker should have told the
plugin precisely which symbols it needs to defined, and the plugin
should define precisely those symbols.

Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]