This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [C++] GCC tree linkage types
- From: Chris Lattner <sabre at nondot dot org>
- To: Ian Lance Taylor <ian at wasabisystems dot com>
- Cc: Matt Austern <austern at apple dot com>, <gcc at gcc dot gnu dot org>,Gabriel Dos Reis <gdr at integrable-solutions dot net>,Richard Henderson <rth at redhat dot com>
- Date: Fri, 7 Nov 2003 13:41:12 -0600 (CST)
- Subject: Re: [C++] GCC tree linkage types
On 7 Nov 2003, Ian Lance Taylor wrote:
> Chris Lattner <sabre@nondot.org> writes:
> > From your description, it sounds like common symbols and weak symbols have
> > the same behavior, except that common symbols expand as necessary. Do
> > common symbols merge their initializers or something else that I'm
> > missing? If not, how are common symbols different from weak symbols where
> > the linker prefers to keep the largest of the symbols when it links?
>
> I should say that when I wrote the above, I was thinking of the usual
> definition of weak symbols, which is based on the ELF standard. I
> think that you are using the term weak in a different way, meaning
> something more like what I would call linkonce (which I think is not
> what you would call linkonce). So I'll try to answer the question of
> whether common symbols are different from linkonce symbols.
Ah, ok. Again, I appologize for the confusion. :(
> One simple difference between common symbols and linkonce symbols is
> that common symbols have only a size, while linkonce symbols have an
> address. But that is an implementation detail--linkonce symbols can
> in principle have sizes, and if the initializer value is zero the
> address may not be important.
yup
> So the question then is: is there a difference betweem common symbols
> and linkonce symbols initialized to zero, provided we always choose
> the largest linkonce symbol when doing a link?
yup
> First I'll introduce another wrinkle of common symbols, which is that
> it is OK for the same symbol name to appear as both a common symbol
> and as a defined symbol. In such a case the common symbol is treated
> as an undefined reference to the defined symbol. This is used in
> FORTRAN to provides values for a common block.
yup
> So now the question is: is there is a difference between common
> symbols and linkonce symbols initialized to zero, provided we always
> choose the largest linkonce symbol when doing a link, and provided we
> permit a normal definition to override and replace any linkonce
> symbol?
yup :)
> I think the answer to the question may be that there is no difference.
> There would certainly be some trickiness in implementation, particularly
> when linking against a dynamic library. But in principle that should be
> solvable.
Ok, great. That is what I thought. You have correctly described the LLVM
conception of 'weak' linkage above. Again, I'm sorry for the terminology.
> But the uses of common symbols and linkonce symbols are quite
> different. In practice, a linkonce symbol always has a non-zero
> initializer. By definition, a common symbol always has a zero
> initializer. The linker normally simply uses the first definition of
> a linkonce symbol, and discards subsequent ones; is there a benefit to
> choosing the largest one?
A linkonce symbol doesn't always have a non-zero initializer. For
example, in C++:
inline int foo(int X) {
static int linkoncevar = 0;
return linkoncevar += X;
}
linkoncevar should have common or linkonce linkage, because it's in an
inline function which may be included into multiple translation units, but
the initializer could have an arbitrary value.
> Could common symbols be implemented as linkonce symbols? It would be
> less efficient in current practice, because common symbols take up no
> space in an object file whereas linkonce symbols do take up space.
... which is an implementation detail. In LLVM, they take the same amount
of space.
> But other than that, I think common symbols could be implemented as
> linkonce symbols, if we extended the semantics of linkonce symbols as
> described above to always choose the largest one, and to permit a
> strong symbol to override the linkonce symbol.
Ok.
> But common symbols are still clearly different from weak symbols,
> using the ELF definition of weak symbols.
Absolutely. Again, I'm sorry for this confusion. It also doesn't help
that ELF weak symbols have meaning overloaded on whether the function is
defined or external. :)
> > Personally, I'm not working under the constraints of using ELF or any
> > other existing linker technology. We already have our own linker, and we
> > already support the linkage types as described here (I know the names are
> > horribly confusing things, for which I sincerely appologize!):
> > http://llvm.cs.uiuc.edu/docs/LangRef.html#modulestructure
>
> I don't see much consideration of dynamic linking there, though. For
> example, what about symbol versioning? Perhaps it doesn't matter for
> you.
We haven't addressed that issue at all. I'm working on the basics first
:)
> > These linkage types work great for us, I just want to have the compiler
> > generate as many of our 'linkonce' symbols as possible, in preference to
> > 'weak'.
>
> I'm not clear on the semantic difference between your `linkonce'
> symbols and your `weak' symbols. To me the difference seems to be
> whether the linker does garbage collection or not.
Yes, except that the GC can happen at any stage in compilation...
> Or, wait, I see, the difference applies at the compiler level, not the
> linker level. Your `linkonce' symbols may be discarded from the
> object file if they are not referenced from within the object file.
> That is a distinction which makes no difference to the linker, of
> course. It would never see an instance of your `linkonce' symbol
> which was not referenced.
Right. Except that the GC can happen at any time: compile time, library
link time, or application link time. LLVM doesn't use the GCC inliner at
all (LLVM has its own), so all functions referenced get emitted by the
frontend. The LLVM optimizer wants to be able to delete bodies of
functions which have inlined all uses away, which is the reason for the
linkonce/weak distinction (in LLVM terminology).
> > One of the things that I dislike about the GCC/GNU ld approach is that
> > distinct ideas, such as linkage types and executable sections, are often
> > confused, though they are completely orthogonal ideas. GCC/ld happens to
> > implement a variety of linkage optimizations using special named sections
> > (such as the gnu.linkonce family), but this is just an implementation
> > technique, not a necessary approach. I'm trying to filter out the minimal
> > set of information needed to represent the source program, while letting
> > a suitable capable optimizer do good things to the program.
>
> Yes, using a specially named section for linkonce symbols is just a
> trick used for ELF, because ELF didn't have any way to define the
> symbol semantics prior to recent introduction of SHT_GROUP. You are
> of course correct that linkage type and section placement are
> completely different ideas. (Note that there is a subtle difference
> which arises when using specially named sections for linkonce
> sections, which is that the linkonce character is driven from the
> section name, not the symbol name. It is possible to generate object
> files such that this makes no difference, but it is also possible to
> generate object files such that it does make a difference.)
Yup.
Ok, thanks a _lot_ for the detailed explanation. This really helps clear
things up for me.
The next question is, what representation should the GCC front-end use?
Is there any way to reduce the mish-mash of flags used to a minimum, or at
least document what the semantics of various globals are with different
flag combinations?
-Chris
--
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/