This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [whopr] Design/implementation alternatives for the driver and WPA
Chris Lattner <clattner@apple.com> writes:
>> * The return value of lto_module_get_symbol_attributes is not
>> defined.
>
> Ah, sorry about that. Most of the details are actually in the public
> header. The result of this function is a 'lto_symbol_attributes'
> bitmask. This should be more useful and revealing:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup
>From an ELF perspective, this doesn't seem to have a way to indicate a
common symbol, and it doesn't provide the symbol's type. It also
doesn't have a way to indicate section groups.
(How do section groups work in Mach-O? Example is a C++ template
function with a static constant array which winds up in the .rodata
section. Section groups permit discarding the array when we discard
the function code.)
>> * Interfaces like lto_module_get_symbol_name and
>> lto_codegen_add_must_preserve_symbol are inefficient when dealing
>> with large symbol tables.
>
> The intended model is for the linker to query the LTO plugin for its
> symbol list and build up its own linker-specific hash table. This way
> you don't need to force the linker to use the plugin's data structure
> or the plugin to use the linker data structure. We converged on this
> approach after trying it the other way.
>
> Does this make sense, do you have a better idea?
In gcc's LTO approach, I think the linker will already have access to
the symbol table anyhow. But my actual point here is that requiring a
function call for every symbol is inefficient. These functions should
take an array and a count. There can be hundreds of thousands of
entries in a symbol table, and the interface should scale accordingly.
>> The LLVM
>> interface does not do that.
>
> Yes it does, the linker fully handles symbol resolution in our model.
>
>> Suppose the linker is invoked on a
>> sequence of object files, some with with LTO information, some
>> without, all interspersed. Suppose some symbols are defined in
>> multiple .o files, through the use of common symbols, weak symbols,
>> and/or section groups. The LLVM interface simply passes each object
>> file to the plugin.
>
> No, the native linker handles all the native .o files.
>
>> The result is that the plugin is required to do
>> symbol resolution itself. This 1) loses one of the benefits of having
>> the linker around; 2) will yield incorrect results when some non-LTO
>> object is linked in between LTO objects but redefines some earlier
>> weak symbol.
>
> In the LLVM LTO model, the plugin only needs to know about its .o
> files, and the linker uses this information to reason about symbol
> merging etc. The Mac OS X linker can even do dead code stripping
> across Macho .o files and LLVM .bc files.
To be clear, when I said object file here, I meant any input file.
You may have understood that.
In ELF you have to think about symbol overriding. Let's say you link
a.o b.o c.o. a.o has a reference to symbol S. b.o has a strong
definition. c.o has a weak definition. a.o and c.o have LTO
information, b.o does not. ELF requires that a.o call the symbol from
b.o, not the symbol from c.o. I don't see how to make that work with
the LLVM interface.
This is not a particularly likely example, of course. People rely on
this sort of symbol overriding quite a bit, but it's unlikely that a.o
and c.o would have LTO information while b.o would not. However,
given that we are designing an interface, I think we should design it
so that correctness is possible.
> Further other pieces of the toolchain (nm, ar, etc) also use the same
> interface so that they can return useful information about LLVM LTO
> files.
Useful, but as I understand it gcc's LTO files will have that
information anyhow.
> This is our second major revision of the LTO interfaces, and the
> interface continues to slowly evolve. I think it would be great to
> work with you guys to extend the design to support GCC's needs.
Agreed.
Ian