This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [C++] GCC tree linkage types
- From: Ian Lance Taylor <ian at wasabisystems dot com>
- To: Chris Lattner <sabre at nondot dot org>
- Cc: Matt Austern <austern at apple dot com>, <gcc at gcc dot gnu dot org>, Gabriel Dos Reis <gdr at integrable-solutions dot net>, Richard Henderson <rth at redhat dot com>
- Date: 07 Nov 2003 09:49:08 -0500
- Subject: Re: [C++] GCC tree linkage types
- References: <Pine.LNX.4.44.0311062307070.9490-100000@nondot.org>
Chris Lattner <sabre@nondot.org> writes:
> > > * linkonce (globals which merge when linking, but can be deleted from a
> > > translation unit if they are not used, basically like 'extern inline').
> >
> > I would normally say that linkonce means that all but one copy can be
> > deleted.
>
> Exactly, but what about the last copy? If it is unreferenced, can it be
> deleted?
In GNU linker terminology, no, the last copy of a linkonce symbol may
not be deleted.
> In LLVM, linkonce is used for things like vtables and extern
> inline functions. These things are guaranteed to be emitted in each
> translation unit that uses them, so it is ok for the optimizer to delete
> any copy that is not locally used. The front-ends already have a notion
> of this: if a logically "linkonce" global is not used, RTL code is not
> even generated for it.
I agree that the concept makes sense. It's just that the name
`linkonce' means something else to me--it means ``keep only one
copy.'' The linker name for what you are describing is a provided or
provisional symbol--a symbol which is defined only if it is required.
I'll note that the GNU linker also supports garbage collection of
unused symbols, but that works for any kind of symbol.
> > > * weak (like linkonce, but cannot be deleted if they are unused in a
> > > xlation unit)
> >
> > I wouldn't say that weak symbols merge; I would say that weak symbols
> > are hidden by strong symbols, or by earlier weak symbols.
>
> In a compiler where you can _physically delete_ globals, the definitions
> are the same: "hiding" is the same as deleting one copy.
To me they are not the same; see below.
> > You didn't mention common symbols, which really do merge. Even g++
> > can generate common symbols when using the -fconserve-space option.
>
> What is the difference between common symbols, and either weak or linkonce
> symbols, as defined above?
Common symbols are based on FORTRAN common statements, and they are
also used in traditional C implementations for global variable
definitions with no initialization. A common symbol specifies only a
size--the linker is responsible for allocating uninitialized memory in
the specified size.
When the linker sees two common symbols with the same name, it merges
them and uses the larger of the sizes. That is, two common symbols
with the same name may have different sizes, and the result in the
final executable will be a single symbol with the larger size. This
is required for traditional FORTRAN common statement semantics.
This size adjustment was also used by some traditional Unix libc
implementations, such as the one on SunOS. The array of standard FILE
structures was defined (not merely declared) in the stdio.h header
file, with a size of (as I recall) 4. The full size of the array was
only found in libc.
>From the compiler's perspective, the difference between a common
symbol and a variable initialized to zero is just that a different
pseudo-op must be used to define the variable in assembly language.
> > The GNU linker supports more exotic linkage types like `indirect',
> > `warning', and `set'. But these are probably not of interest to the
> > compiler.
>
> True. What do these linkage types do? Are they documented somewhere? A
> google search didn't turn up anything obvious.
I doubt they are documented.
An indirect symbol is a symbol whose value is that of another symbol.
This was used in a.out formats to provide symbol aliases.
A warning symbol is a symbol whose value is that of another symbol,
with a warning. If the linker uses the symbol, it emits the warning,
and then continues as though the reference were to the other symbol.
You can see this in action on a GNU/Linux symbol if you write a
reference to gets().
A set symbol is used to form tables. All set symbol with the same
name are allocated together in a table. This was used in a.out
formats to implement global constructor tables. In ELF this is done
in a simpler fashion by putting all the global constructors into a
particular section. The linker already puts all the input sections
with the same name together.
Ian