This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Excessive calls to iterate_phdr during exception handling


On 05/28/2013 12:20 AM, Ryan Johnson wrote:
> Hi all,
> 
> (please CC me in replies, not a list member)
> 
> I have a large C++ app that throws exceptions to unwind anywhere from
> 5-20 stack frames when an error prevents the request from being served
> (which happens rather frequently). Works fine single-threaded, but
> performance is terrible for 24 threads on a 48-thread Ubuntu 10 machine.
> Profiling points to a global mutex acquire in __GI___dl_iterate_phdr as
> the culprit, with _Unwind_Find_FDE as its caller.
> 
> Tracing the attached test case with the attached gdb script shows that
> the -DDTOR case executes ~20k instructions during unwind and calls
> iterate_phdr 12 times. The -DTRY case executes ~33k instructions and
> calls iterate_phdr 18 times. The exception in this test case only
> affects three stack frames, with minimal cleanup required, and the trace
> is taken on the second call to the function that swallows the error, to
> warm up libgcc's internal caches [1].
> 
> The instruction counts aren't terribly surprising---I know unwinding is
> complex---but might it be possible to throw and catch a previously-seen
> exception through a previously-seen stack trace, with something fewer
> than 4-6 global mutex acquires for each frame unwound? As it stands, the
> deeper the stack trace (= the more desirable to throw rather than return
> an error), the more of a scalability bottleneck unwinding becomes. My
> actual app would apparently suffer anywhere from 25 to 80 global mutex
> acquires for each exception thrown, which probably explains why the
> bottleneck arises...
> 
> I'm bringing the issue up here, rather than filing a bug, because I'm
> not sure whether this is an oversight, a known problem that's hard to
> fix, or a feature (e.g. somehow required for reliable unwinding). I
> suspect the former, because _Unwind_Find_FDE tries a call to
> _Unwind_Find_registered_FDE before falling back to dl_iterate_phdr, but
> the former never succeeds in my trace (iterate_phdr is always called).
> 
> FWIW, I've tested both gcc-4.6 and 4.8 but see no meaningful difference
> between them.
> 
> [1] The cache can be seen in libgcc/unwind-dw2-fde-dip.c, though they
> will do little to prevent mutex bottlenecks because they're accessed
> from the iterate_phdr callback, behind the mutex acuqire.
If the bottleneck is really in glibc, then you should probably ask them
to fix it. Could the mutex be changed rwlock instead?

--
VZ



Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]