There is a problem in GCC's typeinfo comparison code where it just compares the address of the symbol, not what the symbol points at. Hence if the linker chooses one typeinfo instance in one binary and another in another binary, even though they are the same typeinfo it returns different results. This means you must be very, very careful to ensure the typeinfos for the symbols which get compared must be default visibility in ALL compilation units if and ONLY if the RTTI "lives" in different shared libraries. As you might guess, this can get tricky to get right! We were fortunate that in one of the boost::signals test executables, this leads to a repeatable crash (boost_any_bridge fails to detect that two types are the same, because of an erroneous type info comparison). It is much worse in some other situations. I have seen silent failures (such as the "type" of a standard exception being changed between where it is thrown and where it is caught). The problem was obscured by a time by the fact that, despite claims to the contrary in the documentation, it seems that the symbols are not being made visible in the "hosting" executable by default. As a result we have two separate copies of the typeinfo.
Usually when this is brought up, we ask how shared libraries are loaded, they need to be loaded with global when calling dlopen (if you loading dynamicly). If the shared libraries are loaded staticly, then maybe the issue is with incomplete types, which is PR 20647.
Oh, one more thing, we need a testcase because this is hard to reproduce.
This is for the case when the shared libraries are loaded statically. One test case can be found by following the link found in this post: <http://article.gmane.org/gmane.comp.lib.boost.build/10072> . I will also attach a simple self-contained test case for gcc 4.0 darwin/OS X.
Actually, I'll just paste the darwin test case here. On this platform, libstdc++ is a sharedlib. The following code: ### #include <vector> #include <iostream> #include <stdexcept> int main (int argc, char * const argv[]) { std::vector<int> v; try { v.at(1); } catch(const std::out_of_range& e) { std::cout << "Correct behavior: out of range caught, what==: " << e.what() << "\n"; throw; } catch(...) { std::cout << "Oops, apparent problem with type identity across a DLL boundary\n"; throw; } return 0; } ### Output is the "Oops case" when built without explictly setting -fvisibility=default or -fvisibility-inlines-hidden for the "hosting" exectuable.
This just works just fine with: GNU C++ version 4.1.0 20050822 (experimental) (powerpc-apple-darwin7.9.0) compiled by GNU C version 4.1.0 20050822 (experimental). GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Are you using Apple's 4.0.0, if so you should report it to them as I think this is not a bug in the FSF's gcc. [zhivago:~/src/localgccPRs] pinskia% ~/new-fold/bin/g++ t.cc ./a.out[zhivago:~/src/localgccPRs] pinskia% ./a.out Correct behavior: out of range caught, what==: vector::_M_range_check terminate called after throwing an instance of 'std::out_of_range' what(): vector::_M_range_check Abort [zhivago:~/src/localgccPRs] pinskia% otool -L a.out a.out: /Users/pinskia/new-fold/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.6.0) /Users/pinskia/new-fold/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libmx.A.dylib (compatibility version 1.0.0, current version 47.1.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 71.1.3)
Also GCC's testsuite would have caught this by now if the testcase fails as there are testcases in the testsuite for this kind of thing. Please give the output of "gcc -v".
One more thing -fvisibility=default is default anyways.
I still reproduce this problem, I am starting to think something is wrong some where else and not with gcc as -fvisibility=default is default and I am using dynamic libraries as shown below and I know nothing has changed in visibility and C++ since before 4.0 release.
Sorry, I was a bit sloppy--I didn't remove all intermediate layers from my test environment. I will explore this further. In any case, if I explicitly invoke g++ -fvisibility=hidden t.cc && ./a.out the "Oops" case is encountered.
(In reply to comment #9) > Sorry, I was a bit sloppy--I didn't remove all intermediate layers from my test > environment. I will explore this further. In any case, if I explicitly invoke > g++ -fvisibility=hidden t.cc && ./a.out the "Oops" case is encountered. Oh, that is the problem in that the libstdc++ headers are not marked with default visibility. Let me try that and then try with wrapping the includes with push and pop of the visibility. If that works, then I am marking this as a dup of the bug for the push/pop in libstdc++.
Yes this is a dup of bug 19664 for the pop/push in libstdc++ as coding the includes as: #pragma GCC visibility push(default) #include <vector> #include <iostream> #include <stdexcept> #pragma GCC visibility pop makes this works just fine. *** This bug has been marked as a duplicate of 19664 ***
Ok, marking the standard library headers for default visibility sounds like it would address one aspect of this report. I'm not sure that it addresses the other aspect. Are you saying that this will somehow also address the boost::signals issue (the same problem--I can explain further if it is unclear), which is directly referred to here: <http://sourceforge.net/tracker/?func=detail&atid=107586&aid=1262038&group_id=7586> ?
Looks like our changes collided. I am requesting the bug be reopened pending answer to the above question.
From the sounds of it, it sounds like boost headers are incorrectly not having the push/pop mechanism either when they should.
Suppose for a moment that the boost authors can be convinced to add a bunch of GCC-specific pragmas to their code. Would the problem then be solved? What if a client is using a third party library (maybe a largeley template, header- based library such as boost) in a situation involving shared libraries and non- default visibility. Should the client be expected to figure out exactly which pieces of the library make use of RTTI (as an implementation detail) so that they can set visibility of client-defined types to default, in order to avoid possibly silent failure of the library when it uses RTTI upon those types?
Subject: Re: Typeinfo comparison code easily breaks shared libs On Aug 29, 2005, at 5:27 PM, mmarcus at emarcus dot org wrote: > > ------- Additional Comments From mmarcus at emarcus dot org > 2005-08-29 21:27 ------- The point is that you are saying all symbols are hidden by default so you are marking the RTTI as hidden too which is a good idea for your own libraries and classes which are hidden in the shared libraries but since you are including other people's headers, maybe wrapping them with push/pop is what you need. Then this is not a GCC per say but rather you misunderstanding what the options do. -- Pinski
Maybe I should rephrase my concern as the following question. Suppose a gcc user is building an application and some shared libraries that depend on third party libraries. Under what circumstances can he safely set visibility to someting other than default, for any types? My current understanding is that it is only safe to do so for types whose typeinfo will he can guarantee will never be shared. This is not always an easy guarantee to make, and so, if my undersatanding is correct, makes hidden visibility much less useful than it first appears. If typeinfo equality was based on string comparison instead of address comparison, the usefuleness of hidden visibility would be greatly increased. This would also address another important use case. There is often a desire to produce a shared library with only an extern "C" ABI, especially when building artifacts designed for long-term binary compatibility. That is, the only visible symbols should be extern "C" functions. To write such a shared library in C++, while depending upon a shared libstdc++ or other shared C++ libraries is not possible if one most make typeinfo visible just to share it with the libraries upon which our library depends. Address-only typeinfo equality comparison has a high usability cost. Would it be possible to first check for address equality, falling back to a string comparison upon failure?
To clarify what Marcus said, I've made a small example: http://zigzag.cs.msu.su/~ghost/rtti-4.0 The third party library 'libhelper.so' is built with default visibility. I want my own library 'libhelper2.so' to be built with hidden visibility. However, my library contains this: try { foo(); } catch(my_exception& e) { Both 'foo' and 'my_exception' are defined in libhelper.so -- third party library. So, if I build my library with hidden visibility, 'catch' no longer works. I have two solutions: 1. Add push/pops in other library headers, which is very inconvenient. 2. Create wrapper headers and add push/pops there, which is very inconvenient. And if I forget just a single header (which was probably added in the last version of third party library), I'll get hard to diagnose bug. Usability cost is indeed high.
Subject: Re: Typeinfo comparison code easily breaks shared libs "ghost at cs dot msu dot su" <gcc-bugzilla@gcc.gnu.org> writes: | And if I forget just a single header (which was probably added in the last | version of third party library), I'll get hard to diagnose bug. Usability cost | is indeed high. it sounds that this whole stuff of playing with symbols behind programmers' back is a total mess :-( -- Gaby
It's is mess, but I hope that we don't just agree on *that*. The situation is that the whole -fvisibility thing does not work reliably for C++ dynamic libraries which is rather bad.
Lets recap here: -fvisibility=xxx means mark all things as xxx unless something else marks it differenently. So if you include a header and that header does not have the push/pop (or otherwise marking the classes/functions), you run into this problem report. But this is an user bug and not a GCC one since GCC is doing exactly it was told to do but the user was not expecting it. Think about headers/#include as pasting method and not a way to mark things different (system headers are questionable here but usually it is only for diagnostic and extern "C" being default just because system headers are broken).
You might be formally right, but what about practical consequences? Unless *all* libraries I use have push/pops in *every single header*, using -fvisibility for my own library can result in crash. So for a large application you'd need to get authors of all used libraries to "fix" them! Note that even libstdc++ does not have this at the moment. This makes -fvisibility just useless.
Subject: Re: Typeinfo comparison code easily breaks shared libs "ghost at cs dot msu dot su" <gcc-bugzilla@gcc.gnu.org> writes: | You might be formally right, but what about practical consequences? Unless | *all* libraries I use have push/pops in *every single header*, using | -fvisibility for my own library can result in crash. So for a large | application you'd need to get authors of all used libraries to "fix" them! | Note that even libstdc++ does not have this at the moment. This makes | -fvisibility just useless. Please, reopen this PR. Andrew, take a deep breadth. -- Gaby
I will add a real-world use case. Lets say that I am the author of a shared library and that today I wish to provide my client, AnApp, with MyLib version 1. I will also need to provide my clients with updated versions in the future, *without breaking binary compatibility*. That is, when version 2 of my library comes out, it must not break AnApp version 1. As a good citizen, I keep my public ABI as narrow as possible, exporting a small number of extern "C" functions. Internally, I make use of various C++ shared libraries including boost version 1.33. It so happens that AnApp version 1 also links to the shared library boost 1.33. For RTTI to work properly, certain names must be shared between boost 1.33 and MyLib version 1, so I add those names to my list of exports. Lets say that one of those names is BoostFoo. AnApp version 1 also exports those names for RTTI compatibility with boost 1.33. A year passes and now I want to release MyLib version 2. I decide that it would be convenient to use the newly released boost 1.35 in my implementation. But what if, say, boost 1.35 has changed the layout of struct BoostFoo? If so, I cannot use boost 1.35 in my implementation since I am exporting BoostFoo--my exported BoostFoo 1.35 will conflict with the BoostFoo 1.33 that AnApp version 1 uses. The requirement that I export the name BoostFoo to my clients in order to for type identity to be shared with my supplier has effectively "revision-locked" me to boost 1.33. Though slightly complex to state, this was not a contrived use-case. For example, people want updates of their Photoshop plugins to work. The suggestion above of comparing typeinfo's by name instead of by address is really just a bandaid. What we really need is some form of directed visibility control where a library exports one set of symbols to the libraries on which it depends, and a different set of symbols to its clients.
Vladimir Prus Wrote: > It's is mess, but I hope that we don't just agree on *that*. The situation > is that the whole -fvisibility thing does not work reliably for C++ > dynamic libraries which is rather bad. I wouldn't go /that/ far - if you know what you're doing, it's completely reliable. The problem is that it involves a lot of knowledge of ELF internals which most people using GCC don't have and IMHO, shouldn't need to have. The problem I see here is that Andrew knows the internals of GCC extremely well, and so to him IMHO he can't see why users are having such a problem. > You might be formally right, but what about practical consequences? Unless > *all* libraries I use have push/pops in *every single header*, using > -fvisibility for my own library can result in crash. So for a large > application you'd need to get authors of all used libraries to "fix" them! > Note that even libstdc++ does not have this at the moment. This makes > -fvisibility just useless. One of our overriding goals when making this patch was to AVOID patching any existing library headers! See PR 9283 and PR 15000. The way I see it is that GCC's current behaviour interferes with the ODR. If you define a type in one compilation unit and then again in another, and you compare the two, currently it works if those compilation units are in the same shared object but it fails if they are in different shared objects. This to me is confusing at best, and causes hard-to-find bugs at worst. It also isn't compatible with MSVC's behaviour which was also one of the original goals of the patch - to allow existing MSVC DLL macro support to be reused in GCC. Now I raised this originally back in the day, but I was told that the performance penalty of memcmp()-ing symbols was too much - even though MSVC does just this and I haven't seen many complaints about its typeinfo performance. However, I am very certain that so long as it remains this way, you're just going to get more and more people complaining in here. So how about this - how about we get GCC to emit a special linkonce symbol called say __gcc_linkonce_nondefaultvisibility if a compilation unit uses non- default visibility? And if that symbol is present, the GCC runtime uses memcmp() instead of address comparison during typeinfo work - if I remember, there's about two or three places where code would need altering. This solves the problem, and keeps everyone happy. If people agree, I could contribute a little to this patch. Cheers, Niall
I just got bit by this using gcc version 4.0.3 20051201 (prerelease) (Debian 4.0.2-5) (powerpc) It's just my $.02 from the gallery, but that address comparison in the type_info::oprator==() implementation looks suspect. Stroustrup (3rd edition, $15.4.4) specifically says: It is _not_ guaranteed that there is only one type_info object for each type in the system. and then he goes on to call out dynamically linked libraries as a particular case. If folks are interested, I can propose a relatively inexpensive (i.e. non-strcmp) runtime strawman for consideration.
Subject: Re: Typeinfo comparison code easily breaks shared libs "rjohnson at dogstar-interactive dot com" <gcc-bugzilla@gcc.gnu.org> writes: | I just got bit by this using | gcc version 4.0.3 20051201 (prerelease) (Debian 4.0.2-5) (powerpc) | | It's just my $.02 from the gallery, but that address comparison in the | type_info::oprator==() implementation looks suspect. Stroustrup (3rd edition, | $15.4.4) specifically says: | | It is _not_ guaranteed that there is only one type_info object for each type in | the system. | | and then he goes on to call out dynamically linked libraries as a particular | case. | | If folks are interested, I can propose a relatively inexpensive (i.e. | non-strcmp) runtime strawman for consideration. Please look more carefully at the real implementation and look up the archive. -- Gaby
The current version of the documentation says, for -fvisibility=, Be aware that headers from outside your project, in particular system headers and headers from any other library you use, may not be expecting to be compiled with visibility other than the default. You may need to explicitly say @samp{#pragma GCC visibility push(default)} before including any such headers. ... Note that @samp{-fvisibility} does affect C++ vague linkage entities. This means that, for instance, an exception class that will be thrown between DSOs must be explicitly marked with default visibility so that the @samp{type_info} nodes will be unified between the DSOs. Also, the documentation for the 'hidden' attribute no longer uses ELF-specific terminology, so anyone who understands the C++ standard should be able to understand 'hidden' too. Does that address this bug? I don't think we can change the meaning of -fvisibility= at this point, people are depending on the existing behaviour. Apple has a -fvisibility-ms-compat flag which aims to give Visual C++ bug-compatible linkage behaviour, that might provide a good model for any additional features in this area.