This is the mail archive of the
mailing list for the GCC project.
Re: Mistake in C++ ABI substitution rules?
- From: Carlo Wood <carlo at alinoe dot com>
- To: Stan Shebs <shebs at apple dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Wed, 20 Feb 2002 00:05:03 +0100
- Subject: Re: Mistake in C++ ABI substitution rules?
- References: <3C72D4BA.77E7125F@apple.com>
In the past I reported a problem with the way g++ 3.x mangles
names, related to substitutions. It does not handle the substitutions
the way it should (according to the ABI).
The reponse was that the ABI would not be changed when it
didn't break anything the way it was. So yes, it seems that the
mangling is still compiler specific and the choice is to leave it
The comment that I wrote in the code of my demangler (I wrote one too
// <type> ::= <builtin-type> # Starts with a lower case character != r.
// ::= <function-type> # Starts with F
// ::= <class-enum-type> # Starts with N, S, C, D, Z, a digit or a lower case character.
// # since a lower case character would be an operator name, that would
// # be an error. The S is a substitution or St (::std::). A 'C' would
// # be a constructor and thus also an error.
// ::= <template-param> # Starts with T
// ::= <substitution> # Starts with S
// ::= <template-template-param> <template-args> # Starts with T or S, equivalent with the above.
// ::= <array-type> # Starts with A
// ::= <pointer-to-member-type> # Starts with M
// ::= <CV-qualifiers> <type> # Starts with r, V or K
// ::= P <type> # pointer-to # Starts with P
// ::= R <type> # reference-to # Starts with R
// ::= C <type> # complex pair (C 2000) # Starts with C
// ::= G <type> # imaginary (C 2000) # Starts with G
// ::= U <source-name> <type> # vendor extended type qualifier, starts with U
// <template-template-param> ::= <template-param>
// ::= <substitution>
// My own analysis of how to decode qualifiers:
// F is a <function-type>, <T> is a <builtin-type>, <class-enum-type>, <template-param> or <template-template-param> <template-args>.
// <Q> represents a series of qualifiers (not G or C).
// <C> is an unqualified type. <R> is a qualified type.
// <B> is the bare-function-type without return type. <I> is the array index. //
// <Q>M<Q2><C>F<R><B>E ==> R (C::*Q)B Q2 "<C>", "<Q2><C>", "F<R><B>E" (<R> and <B> recursive), "M<Q2><C>F<R><B>E".
// <Q>F<R><B>E ==> R (Q)B "<R>", "<B>" (<B> recursive) and "F<R><B>E".
// <Q>G<T> ==> imaginary T Q "<T>", "G<T>" (<T> recursive).
// <Q>C<T> ==> complex T Q "<T>", "C<T>" (<T> recursive).
// <Q><T> ==> T Q "<T>" (<T> recursive).
// where Q is any of:
// <Q>P ==> *Q "P..."
// <Q>R ==> &Q "R..."
// <Q>[K|V|r]+ ==> [ const| volatile| restrict]+Q "KVr..."
// <Q>U<S> ==> SQ "U<S>..."
// A<I> ==> [I] "A<I>..." (<I> recursive).
// <Q>A<I> ==> (Q) [I] "A<I>..." (<I> recursive).
// <Q>M<C> ==> C::*Q "M<C>..." (<C> recursive).
// A <substitution> is handled with an input position switch during which new substitutions are
// turned off. Because recursive handling of types (and therefore the order in which substitutions
// must be generated) must be done left to right, but the generation of Q needs processing right to left,
// substitutions per <type> are generated by reading the input left to right and marking the starts of
// all substitutions only - implicitly finishing them at the end of the type. Then the output and real
// substitutions are generated.
// The ABI specifies for pointer-to-member function types the format <Q>M<T>F<R><B>E. In other words,
// the qualifier <Q2> (see above) is implicitely contained in <T> instead of explicitly part of the M
// format. I am convinced that this is a bug in the ABI. Unfortunately, this is how we have to
// demangle things as it has a direct impact on the order in which substitutions are stored.
// This ill-formed design results in rather ill-formed demangler code too however :/
On Tue, Feb 19, 2002 at 02:42:02PM -0800, Stan Shebs wrote:
> One of our tasks in migrating Darwin / Mac OS X to use GCC 3.x is
> to provide a way to load I/O drivers written in C++ and compiled
> with GCC 2.95. (Yeah yeah, bad idea, but the deed is done, and
> alternative is to compile the kernel's I/O subsystem with 2.95
> forever, I'll work hard to avoid that fate.)
> Anyway, to translate the symbols we have a homemade 2.95 compat
> demangler (written using a spec I handed to the kernel hacker,
> poor guy) feeding into a remangler written using the spec at
> http://www.codesourcery.com/cxx-abi/abi.html#mangling. So far
> so good, we have something that actually does the right thing
> most of the time. However, there is a troublesome point in the
> substitution rules for the new C++ ABI, where it says
> "Logically, the substitutable components of a mangled name are
> considered left-to-right, components before the composite structure
> of which they are a part. If a component has been encountered
> before, it is substituted as described below. This decision is
> independent of whether its components have been substituted,
> so an implementation MAY OPTIMIZE by considering large structures
> for substitution before their components. If a component has not
> been encountered before, its mangling is identified, and it is
> added to a dictionary of substitution candidates. No entity is
> added to the dictionary twice." (emphasis mine)
> This sure sounds like it's allowing different compilers to mangle
> names differently, by choosing to substitute in different ways.
> And indeed we had to determine 3.x's behavior empirically. But
> this seems like a fatal blow to the goal of an ABI that could
> allow object files from different compilers to be linked together,
> or to link code from two different 3.x versions of GCC.
> Am I missing something here, or is there an unresolved omission
> in the name mangling rules of the C++ ABI?
Carlo Wood <firstname.lastname@example.org>