This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Excessive calls to iterate_phdr during exception handling


On Wed, May 29, 2013 at 2:47 AM, Ian Lance Taylor <iant@google.com> wrote:
> On Mon, May 27, 2013 at 3:20 PM, Ryan Johnson
> <ryan.johnson@cs.utoronto.ca> wrote:
>>
>> I have a large C++ app that throws exceptions to unwind anywhere from 5-20
>> stack frames when an error prevents the request from being served (which
>> happens rather frequently). Works fine single-threaded, but performance is
>> terrible for 24 threads on a 48-thread Ubuntu 10 machine. Profiling points
>> to a global mutex acquire in __GI___dl_iterate_phdr as the culprit, with
>> _Unwind_Find_FDE as its caller.
>>
>> Tracing the attached test case with the attached gdb script shows that the
>> -DDTOR case executes ~20k instructions during unwind and calls iterate_phdr
>> 12 times. The -DTRY case executes ~33k instructions and calls iterate_phdr
>> 18 times. The exception in this test case only affects three stack frames,
>> with minimal cleanup required, and the trace is taken on the second call to
>> the function that swallows the error, to warm up libgcc's internal caches
>> [1].
>>
>> The instruction counts aren't terribly surprising---I know unwinding is
>> complex---but might it be possible to throw and catch a previously-seen
>> exception through a previously-seen stack trace, with something fewer than
>> 4-6 global mutex acquires for each frame unwound? As it stands, the deeper
>> the stack trace (= the more desirable to throw rather than return an error),
>> the more of a scalability bottleneck unwinding becomes. My actual app would
>> apparently suffer anywhere from 25 to 80 global mutex acquires for each
>> exception thrown, which probably explains why the bottleneck arises...
>>
>> I'm bringing the issue up here, rather than filing a bug, because I'm not
>> sure whether this is an oversight, a known problem that's hard to fix, or a
>> feature (e.g. somehow required for reliable unwinding). I suspect the
>> former, because _Unwind_Find_FDE tries a call to _Unwind_Find_registered_FDE
>> before falling back to dl_iterate_phdr, but the former never succeeds in my
>> trace (iterate_phdr is always called).
>
> The issue is dlclose followed by dlopen.  If we had a cache ahead of
> dl_iterate_phdr, we would need some way to clear out any information
> cached from a dlclose'd library.  Otherwise we might pick up the old
> information when looking up an address from a new dlopen.  So 1)
> locking will always be required; 2) any caching system to reduce the
> number of locks will require support for dlclose, somehow.  It's worth
> working on but there isn't going to be a simple solution.

Maybe a simple solution like the one for threads ... a flag on whether
we've seen a dlclose call yet (or on whether libdl is linked in, that is,
make libdl provide an alternate implementation that interposes the
one from glibc which wouldn't care for that case?)

Richard.

> Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]