Bug 23628 - Typeinfo comparison code easily breaks shared libs
Summary: Typeinfo comparison code easily breaks shared libs
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.0.0
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: visibility
Depends on:
Blocks:
 
Reported: 2005-08-29 19:47 UTC by Mat Marcus
Modified: 2006-09-18 21:02 UTC (History)
9 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mat Marcus 2005-08-29 19:47:43 UTC
There is a problem in GCC's typeinfo comparison code where it just compares the
address of the symbol, not what the symbol points at. Hence if the linker
chooses one typeinfo instance in one binary and another in another binary, even
though they are the same typeinfo it returns different results.

This means you must be very, very careful to ensure the typeinfos for the
symbols which get compared must be default visibility in ALL compilation units
if and ONLY if the RTTI "lives" in different shared libraries. As you might
guess, this can get tricky to get right!

We were fortunate that in one of the boost::signals test executables, this leads
to a repeatable crash (boost_any_bridge fails to detect that two types are the
same, because of an erroneous type info comparison). It is much worse in some
other situations. I have seen silent failures (such as the "type" of a standard
exception being changed between where it is thrown and where it is caught). The
problem was obscured by a time by the fact that, despite claims to the contrary
in the documentation, it seems that the symbols are not being made visible in
the "hosting" executable by default. As a result we have two separate copies of
the typeinfo.
Comment 1 Andrew Pinski 2005-08-29 19:55:39 UTC
Usually when this is brought up, we ask how shared libraries are loaded, they need to be loaded with 
global when calling dlopen (if you loading dynamicly).  

If the shared libraries are loaded staticly, then maybe the issue is with incomplete types, which is PR 
20647.
Comment 2 Andrew Pinski 2005-08-29 19:58:47 UTC
Oh, one more thing, we need a testcase because this is hard to reproduce.
Comment 3 Mat Marcus 2005-08-29 20:18:25 UTC
This is for the case when the shared libraries are loaded statically. One test
case can be found by following the link found in this post:
<http://article.gmane.org/gmane.comp.lib.boost.build/10072> . I will also attach
a simple self-contained test case for gcc 4.0 darwin/OS X.
Comment 4 Mat Marcus 2005-08-29 20:23:44 UTC
Actually, I'll just paste the darwin test case here. On this platform, libstdc++
is a sharedlib. The following code:
###

#include <vector>
#include <iostream>
#include <stdexcept>



int main (int argc, char * const argv[]) {
	std::vector<int> v;
	try {
		v.at(1);
	}
	catch(const std::out_of_range& e) {
		std::cout << "Correct behavior: out of range caught, what==: " <<
e.what() << "\n";
		throw;
	}
	catch(...) {
		std::cout << "Oops, apparent problem with type identity across a DLL
boundary\n";
		throw;
	}	
	return 0;
}
###

Output is the "Oops case" when built without explictly setting
-fvisibility=default or -fvisibility-inlines-hidden for the "hosting" exectuable.

Comment 5 Andrew Pinski 2005-08-29 20:27:36 UTC
This just works just fine with:
GNU C++ version 4.1.0 20050822 (experimental) (powerpc-apple-darwin7.9.0)
        compiled by GNU C version 4.1.0 20050822 (experimental).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096

Are you using Apple's 4.0.0, if so you should report it to them as I think this is not a bug in the FSF's 
gcc.

[zhivago:~/src/localgccPRs] pinskia% ~/new-fold/bin/g++ t.cc 
./a.out[zhivago:~/src/localgccPRs] pinskia% ./a.out
Correct behavior: out of range caught, what==: vector::_M_range_check
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check
Abort


[zhivago:~/src/localgccPRs] pinskia% otool -L a.out
a.out:
        /Users/pinskia/new-fold/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.6.0)
        /Users/pinskia/new-fold/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/lib/libmx.A.dylib (compatibility version 1.0.0, current version 47.1.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 71.1.3)
Comment 6 Andrew Pinski 2005-08-29 20:30:20 UTC
Also GCC's testsuite would have caught this by now if the testcase fails as there are testcases in the 
testsuite for this kind of thing.

Please give the output of "gcc -v".
Comment 7 Andrew Pinski 2005-08-29 20:31:29 UTC
One more thing -fvisibility=default is default anyways.
Comment 8 Andrew Pinski 2005-08-29 20:34:45 UTC
I still reproduce this problem, I am starting to think something is wrong some where else and not with 
gcc as -fvisibility=default is default and I am using dynamic libraries as shown below and I know 
nothing has changed in visibility and C++ since before 4.0 release.
Comment 9 Mat Marcus 2005-08-29 20:44:15 UTC
Sorry, I was a bit sloppy--I didn't remove all intermediate layers from my test
environment. I will explore this further. In any case, if I explicitly invoke
g++ -fvisibility=hidden t.cc && ./a.out the  "Oops" case is encountered. 
Comment 10 Andrew Pinski 2005-08-29 20:52:41 UTC
(In reply to comment #9)
> Sorry, I was a bit sloppy--I didn't remove all intermediate layers from my test
> environment. I will explore this further. In any case, if I explicitly invoke
> g++ -fvisibility=hidden t.cc && ./a.out the  "Oops" case is encountered. 

Oh, that is the problem in that the libstdc++ headers are not marked with default visibility.
Let me try that and then try with wrapping the includes with push and pop of the visibility.
If that works, then I am marking this as a dup of the bug for the push/pop in libstdc++.
Comment 11 Andrew Pinski 2005-08-29 20:57:31 UTC
Yes this is a dup of bug 19664 for the pop/push in libstdc++ as coding the includes as:
#pragma GCC visibility push(default)
#include <vector>
#include <iostream>
#include <stdexcept>
#pragma GCC visibility pop

makes this works just fine.

*** This bug has been marked as a duplicate of 19664 ***
Comment 12 Mat Marcus 2005-08-29 20:59:29 UTC
Ok, marking the standard library headers for default visibility sounds like it
would address one aspect of this report. I'm not sure that it addresses the
other aspect. Are you saying that this will somehow also address the
boost::signals issue (the same problem--I can explain further if it is unclear),
which is directly referred to here:  
<http://sourceforge.net/tracker/?func=detail&atid=107586&aid=1262038&group_id=7586>
?
Comment 13 Mat Marcus 2005-08-29 21:00:58 UTC
Looks like our changes collided. I am requesting the bug be reopened pending
answer to the above question.
Comment 14 Andrew Pinski 2005-08-29 21:08:39 UTC
From the sounds of it, it sounds like boost headers are incorrectly not having the push/pop mechanism 
either when they should.
Comment 15 Mat Marcus 2005-08-29 21:27:15 UTC
Suppose for a moment that the boost authors can be convinced to add a bunch of 
GCC-specific pragmas to their code. Would the problem then be solved? What if 
a client is using a third party library (maybe a largeley template, header-
based library such as boost) in a situation involving shared libraries and non-
default visibility. Should the client be expected to figure out exactly which 
pieces of the library make use of RTTI (as an implementation detail) so that 
they can set visibility of client-defined types to default, in order to avoid 
possibly silent failure of the library when it uses RTTI upon those types?
Comment 16 Andrew Pinski 2005-08-29 21:32:25 UTC
Subject: Re:  Typeinfo comparison code easily breaks shared libs


On Aug 29, 2005, at 5:27 PM, mmarcus at emarcus dot org wrote:

>
> ------- Additional Comments From mmarcus at emarcus dot org  
> 2005-08-29 21:27 -------

The point is that you are saying all symbols are hidden by default so 
you are
marking the RTTI as hidden too which is a good idea for your own 
libraries
and classes which are hidden in the shared libraries but since you
are including other people's headers, maybe wrapping them with push/pop
is what you need.  Then this is not a GCC per say but rather you
misunderstanding what the options do.

-- Pinski

Comment 17 Mat Marcus 2005-08-29 21:57:29 UTC
Maybe I should rephrase my concern as the following question. Suppose a gcc 
user is building an application and some shared libraries that depend on third 
party libraries. Under what circumstances can he safely set visibility to 
someting other than default, for any types? My current understanding is that 
it is only safe to do so for types whose typeinfo will he can guarantee will 
never be shared. This is not always an easy guarantee to make, and so, if my 
undersatanding is correct, makes hidden visibility much less useful than it 
first appears. 

If typeinfo equality was based on string comparison instead of address 
comparison, the usefuleness of hidden visibility would be greatly increased. 
This would also address another important use case. There is often a desire to 
produce a shared library with only an extern "C" ABI, especially when building 
artifacts designed for long-term binary compatibility. That is, the only 
visible symbols should be extern "C" functions. To write such a shared library 
in C++, while depending upon a shared libstdc++ or other shared C++ libraries 
is not possible if one most make typeinfo visible just to share it with the 
libraries upon which our library depends. 

Address-only typeinfo equality comparison has a high usability cost. Would it 
be possible to first check for address equality, falling back to a string 
comparison upon failure?
Comment 18 Vladimir Prus 2005-08-30 13:10:16 UTC
To clarify what Marcus said, I've made a small example:  
http://zigzag.cs.msu.su/~ghost/rtti-4.0  
  
The third party library 'libhelper.so' is built with default visibility. I  
want my own library 'libhelper2.so' to be built with hidden visibility.  
However, my library contains this:  
 
    try 
    { 
        foo(); 
    } 
    catch(my_exception& e) 
    { 
 
Both 'foo' and 'my_exception' are defined in libhelper.so -- third party 
library. So, if I build my library with hidden visibility, 'catch' no longer 
works. 
 
I have two solutions: 
1. Add push/pops in other library headers, which is very inconvenient. 
2. Create wrapper headers and add push/pops there, which is very inconvenient. 
 
And if I forget just a single header (which was probably added in the last 
version of third party library), I'll get hard to diagnose bug. Usability cost 
is indeed high. 
Comment 19 Gabriel Dos Reis 2005-08-30 13:41:48 UTC
Subject: Re:  Typeinfo comparison code easily breaks shared libs

"ghost at cs dot msu dot su" <gcc-bugzilla@gcc.gnu.org> writes:

| And if I forget just a single header (which was probably added in the last 
| version of third party library), I'll get hard to diagnose bug. Usability cost 
| is indeed high. 

it sounds that this whole stuff of playing with symbols behind
programmers' back is a total mess :-(

-- Gaby
Comment 20 Vladimir Prus 2005-08-31 07:16:32 UTC
It's is mess, but I hope that we don't just agree on *that*. The situation is 
that the whole -fvisibility thing does not work reliably for C++ dynamic 
libraries which is rather bad. 
Comment 21 Andrew Pinski 2005-08-31 14:12:20 UTC
Lets recap here:
-fvisibility=xxx means mark all things as xxx unless something else marks it differenently.
So if you include a header and that header does not have the push/pop (or otherwise marking the 
classes/functions), you run into this problem report.  But this is an user bug and not a GCC one since 
GCC is doing exactly it was told to do but the user was not expecting it.  Think about headers/#include 
as pasting method and not a way to mark things different (system headers are questionable here but 
usually it is only for diagnostic and extern "C" being default just because system headers are broken).
Comment 22 Vladimir Prus 2005-08-31 14:31:20 UTC
You might be formally right, but what about practical consequences? Unless  
*all* libraries I use have push/pops in *every single header*, using  
-fvisibility for my own library can result in crash. So for a large 
application you'd need to get authors of all used libraries to "fix" them! 
Note that even libstdc++ does not have this at the moment. This makes 
-fvisibility just useless. 
Comment 23 Gabriel Dos Reis 2005-08-31 14:48:05 UTC
Subject: Re:  Typeinfo comparison code easily breaks shared libs

"ghost at cs dot msu dot su" <gcc-bugzilla@gcc.gnu.org> writes:

| You might be formally right, but what about practical consequences? Unless  
| *all* libraries I use have push/pops in *every single header*, using  
| -fvisibility for my own library can result in crash. So for a large 
| application you'd need to get authors of all used libraries to "fix" them! 
| Note that even libstdc++ does not have this at the moment. This makes 
| -fvisibility just useless. 

Please, reopen this PR.  
Andrew, take a deep breadth.

-- Gaby
Comment 24 Mat Marcus 2005-09-01 08:35:08 UTC
I will add a real-world use case. Lets say that I am the author of a shared
library and that today I wish to provide my client, AnApp, with MyLib version 1.
I will also need to provide my clients with updated versions in the future,
*without breaking binary compatibility*. That is, when version 2 of my library
comes out, it must not break AnApp version 1. As a good citizen, I keep my
public ABI as narrow as possible, exporting a small number of extern "C"
functions. Internally, I make use of various C++ shared libraries including
boost version 1.33. It so happens that AnApp version 1 also links to the shared
library boost 1.33. For RTTI to work properly, certain names must be shared
between boost 1.33 and MyLib version 1, so I add those names to my list of
exports. Lets say that one of those names is BoostFoo. AnApp version 1 also
exports those names for RTTI compatibility with boost 1.33.

A year passes and now I want to release MyLib version 2. I decide that it would
be convenient to use the newly released boost 1.35 in my implementation. But
what if, say, boost 1.35 has changed the layout of struct BoostFoo? If so, I
cannot use boost 1.35 in my implementation since I am exporting BoostFoo--my
exported BoostFoo 1.35 will conflict with the BoostFoo 1.33 that AnApp version 1
uses. The requirement that I export the name BoostFoo to my clients in order to
for type identity to be shared with my supplier has effectively
"revision-locked" me to boost 1.33.

Though slightly complex to state, this was not a contrived use-case. For
example, people want updates of their Photoshop plugins to work. The suggestion
above of comparing typeinfo's by name instead of by address is really just a
bandaid. What we really need is some form of directed visibility control where a
library exports one set of symbols to the libraries on which it depends, and a
different set of symbols to its clients.
Comment 25 Niall Douglas 2005-09-01 10:42:41 UTC
Vladimir Prus Wrote:

> It's is mess, but I hope that we don't just agree on *that*. The situation
> is that the whole -fvisibility thing does not work reliably for C++
> dynamic libraries which is rather bad. 

I wouldn't go /that/ far - if you know what you're doing, it's completely 
reliable. The problem is that it involves a lot of knowledge of ELF internals 
which most people using GCC don't have and IMHO, shouldn't need to have.

The problem I see here is that Andrew knows the internals of GCC extremely well, 
and so to him IMHO he can't see why users are having such a problem.

> You might be formally right, but what about practical consequences? Unless  
> *all* libraries I use have push/pops in *every single header*, using  
> -fvisibility for my own library can result in crash. So for a large 
> application you'd need to get authors of all used libraries to "fix" them! 
> Note that even libstdc++ does not have this at the moment. This makes 
> -fvisibility just useless. 

One of our overriding goals when making this patch was to AVOID patching any 
existing library headers! See PR 9283 and PR 15000.

The way I see it is that GCC's current behaviour interferes with the ODR. If you 
define a type in one compilation unit and then again in another, and you compare 
the two, currently it works if those compilation units are in the same shared 
object but it fails if they are in different shared objects.

This to me is confusing at best, and causes hard-to-find bugs at worst. It also 
isn't compatible with MSVC's behaviour which was also one of the original goals 
of the patch - to allow existing MSVC DLL macro support to be reused in GCC.

Now I raised this originally back in the day, but I was told that the 
performance penalty of memcmp()-ing symbols was too much - even though MSVC does 
just this and I haven't seen many complaints about its typeinfo performance. 
However, I am very certain that so long as it remains this way, you're just 
going to get more and more people complaining in here.

So how about this - how about we get GCC to emit a special linkonce symbol 
called say __gcc_linkonce_nondefaultvisibility if a compilation unit uses non-
default visibility? And if that symbol is present, the GCC runtime uses memcmp() 
instead of address comparison during typeinfo work - if I remember, there's 
about two or three places where code would need altering.

This solves the problem, and keeps everyone happy.

If people agree, I could contribute a little to this patch.

Cheers,
Niall
Comment 26 Rich Johnson 2006-01-21 21:02:51 UTC
I just got bit by this using 
   gcc version 4.0.3 20051201 (prerelease) (Debian 4.0.2-5) (powerpc)

It's just my $.02 from the gallery, but that address comparison in the type_info::oprator==() implementation looks suspect.  Stroustrup (3rd edition, $15.4.4) specifically says:

It is _not_ guaranteed that there is only one type_info object for each type in the system. 

and then he goes on to call out dynamically linked libraries as a particular case.

If folks are interested, I can propose a relatively inexpensive (i.e. non-strcmp) runtime strawman for consideration.
Comment 27 gdr@cs.tamu.edu 2006-01-21 21:45:29 UTC
Subject: Re:  Typeinfo comparison code easily breaks shared libs

"rjohnson at dogstar-interactive dot com" <gcc-bugzilla@gcc.gnu.org> writes:

| I just got bit by this using 
|    gcc version 4.0.3 20051201 (prerelease) (Debian 4.0.2-5) (powerpc)
| 
| It's just my $.02 from the gallery, but that address comparison in the
| type_info::oprator==() implementation looks suspect.  Stroustrup (3rd edition,
| $15.4.4) specifically says:
| 
| It is _not_ guaranteed that there is only one type_info object for each type in
| the system. 
| 
| and then he goes on to call out dynamically linked libraries as a particular
| case.
| 
| If folks are interested, I can propose a relatively inexpensive (i.e.
| non-strcmp) runtime strawman for consideration.

Please look more carefully at the real implementation and look up the
archive. 

-- Gaby
Comment 28 Geoff Keating 2006-09-18 21:02:24 UTC
The current version of the documentation says, for -fvisibility=,

Be aware that headers from outside your project, in particular system
headers and headers from any other library you use, may not be
expecting to be compiled with visibility other than the default.  You
may need to explicitly say @samp{#pragma GCC visibility push(default)}
before including any such headers.
...
Note that @samp{-fvisibility} does affect C++ vague linkage
entities. This means that, for instance, an exception class that will
be thrown between DSOs must be explicitly marked with default
visibility so that the @samp{type_info} nodes will be unified between
the DSOs.

Also, the documentation for the 'hidden' attribute no longer uses ELF-specific terminology, so anyone who understands the C++ standard should be able to understand 'hidden' too.

Does that address this bug?

I don't think we can change the meaning of -fvisibility= at this point, people are depending on the existing behaviour.  Apple has a -fvisibility-ms-compat flag which aims to give Visual C++ bug-compatible linkage behaviour, that might provide a good model for any additional features in this area.