This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Excessive calls to iterate_phdr during exception handling


On 28/05/2013 8:47 PM, Ian Lance Taylor wrote:
On Mon, May 27, 2013 at 3:20 PM, Ryan Johnson
<ryan.johnson@cs.utoronto.ca> wrote:
I have a large C++ app that throws exceptions to unwind anywhere from 5-20
stack frames when an error prevents the request from being served (which
happens rather frequently). Works fine single-threaded, but performance is
terrible for 24 threads on a 48-thread Ubuntu 10 machine. Profiling points
to a global mutex acquire in __GI___dl_iterate_phdr as the culprit, with
_Unwind_Find_FDE as its caller.

Tracing the attached test case with the attached gdb script shows that the
-DDTOR case executes ~20k instructions during unwind and calls iterate_phdr
12 times. The -DTRY case executes ~33k instructions and calls iterate_phdr
18 times. The exception in this test case only affects three stack frames,
with minimal cleanup required, and the trace is taken on the second call to
the function that swallows the error, to warm up libgcc's internal caches
[1].

The instruction counts aren't terribly surprising---I know unwinding is
complex---but might it be possible to throw and catch a previously-seen
exception through a previously-seen stack trace, with something fewer than
4-6 global mutex acquires for each frame unwound? As it stands, the deeper
the stack trace (= the more desirable to throw rather than return an error),
the more of a scalability bottleneck unwinding becomes. My actual app would
apparently suffer anywhere from 25 to 80 global mutex acquires for each
exception thrown, which probably explains why the bottleneck arises...

I'm bringing the issue up here, rather than filing a bug, because I'm not
sure whether this is an oversight, a known problem that's hard to fix, or a
feature (e.g. somehow required for reliable unwinding). I suspect the
former, because _Unwind_Find_FDE tries a call to _Unwind_Find_registered_FDE
before falling back to dl_iterate_phdr, but the former never succeeds in my
trace (iterate_phdr is always called).
The issue is dlclose followed by dlopen.  If we had a cache ahead of
dl_iterate_phdr, we would need some way to clear out any information
cached from a dlclose'd library.  Otherwise we might pick up the old
information when looking up an address from a new dlopen.  So 1)
locking will always be required; 2) any caching system to reduce the
number of locks will require support for dlclose, somehow.  It's worth
working on but there isn't going to be a simple solution.
I have mixed feelings on this... on the one had it would be bad to risk sending unwind off to la-la land because somebody did a quick dlclose/dlopen pair on code we're about to unwind through... but on the other hand anybody who does a dlclose/dlopen pair on code we're about to unwind through (a) is asking for trouble and (b) is perfectly free to do so in spite of the mutex [1].

That last point makes me really wonder why we bother grabbing the mutex during unwind at all... at the very least, it would seem profitable to verify the object header cache at throw time---perhaps using the nadds/nsubs trick---and refresh it with a call to dl_iterate_phdr if need be, then do the rest of unwind lock-free, ignoring deranged users who dlclose live code [2].

[1] Example: dlopen B ... call A ... which calls B ... which calls C... which dlcloses B... and then throws. Unwind is doomed to fail, as is normal return. [2] NB: this is only true for "live" unwinding; a profiler would need something more sophisticated to deal with such dlclose/dlopen pairs

Thoughts?
Ryan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]