This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [C++] GCC tree linkage types
- From: Chris Lattner <sabre at nondot dot org>
- To: Ian Lance Taylor <ian at wasabisystems dot com>
- Cc: Matt Austern <austern at apple dot com>, <gcc at gcc dot gnu dot org>,Gabriel Dos Reis <gdr at integrable-solutions dot net>,Richard Henderson <rth at redhat dot com>
- Date: Fri, 7 Nov 2003 09:15:49 -0600 (CST)
- Subject: Re: [C++] GCC tree linkage types
> > Exactly, but what about the last copy? If it is unreferenced, can it be
> > deleted?
>
> In GNU linker terminology, no, the last copy of a linkonce symbol may
> not be deleted.
Then you're talking about what I called 'weak', for lack of better
terminology. I'm interested in dividing objects between the 'linkonce'
and 'weak' types, as I described them in the initial email.
> I agree that the concept makes sense. It's just that the name
> `linkonce' means something else to me--it means ``keep only one
> copy.'' The linker name for what you are describing is a provided or
> provisional symbol--a symbol which is defined only if it is required.
I appologize for that, all of this terminology is heavily overloaded, and
I'm not helping. :(
> I'll note that the GNU linker also supports garbage collection of
> unused symbols, but that works for any kind of symbol.
As does LLVM.
> > > > * weak (like linkonce, but cannot be deleted if they are unused in a
> > > > xlation unit)
> > >
> > > I wouldn't say that weak symbols merge; I would say that weak symbols
> > > are hidden by strong symbols, or by earlier weak symbols.
> >
> > In a compiler where you can _physically delete_ globals, the definitions
> > are the same: "hiding" is the same as deleting one copy.
>
> To me they are not the same; see below.
>
> > > You didn't mention common symbols, which really do merge. Even g++
> > > can generate common symbols when using the -fconserve-space option.
> >
> > What is the difference between common symbols, and either weak or linkonce
> > symbols, as defined above?
>
> Common symbols are based on FORTRAN common statements, and they are
> also used in traditional C implementations for global variable
> definitions with no initialization. A common symbol specifies only a
> size--the linker is responsible for allocating uninitialized memory in
> the specified size.
I understand that. These variables become 'weak' variables in the LLVM
terminology.
> When the linker sees two common symbols with the same name, it merges
> them and uses the larger of the sizes. That is, two common symbols
> with the same name may have different sizes, and the result in the
> final executable will be a single symbol with the larger size. This
> is required for traditional FORTRAN common statement semantics.
Make sense.
> From the compiler's perspective, the difference between a common
> symbol and a variable initialized to zero is just that a different
> pseudo-op must be used to define the variable in assembly language.
I understand that exactly. In LLVM, in C (not C++), a variable
initialized with zero becomes a strong symbol with external linkage. A
variable without an initializer gets weak linkage. I still do not see the
difference in semantics. You are confusing what linkers _happen to
currently implement_ with the need of the source languages. I'm more
interested in what it takes to implement language requirements
efficiently.
> > > The GNU linker supports more exotic linkage types like `indirect',
> > > `warning', and `set'. But these are probably not of interest to the
> > > compiler.
> >
> > True. What do these linkage types do? Are they documented somewhere? A
> > google search didn't turn up anything obvious.
>
> I doubt they are documented.
Yaay :)
> An indirect symbol is a symbol whose value is that of another symbol.
> This was used in a.out formats to provide symbol aliases.
Ok.
> A warning symbol is a symbol whose value is that of another symbol,
> with a warning. If the linker uses the symbol, it emits the warning,
> and then continues as though the reference were to the other symbol.
> You can see this in action on a GNU/Linux symbol if you write a
> reference to gets().
Ok, you're right, I noticed that.
> A set symbol is used to form tables. All set symbol with the same name
> are allocated together in a table. This was used in a.out formats to
> implement global constructor tables. In ELF this is done in a simpler
> fashion by putting all the global constructors into a particular
> section. The linker already puts all the input sections with the same
> name together.
LLVM has an 'appending' linkage type that does this same kind of thing,
which it uses for building global ctor tables. Thanks for the information
though, I'll think about what the best way for warning and indirect symbol
support would be, if we need it.
-Chris
--
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/