Bug 36030 - Throwing exceptions in multiple threads leads to spinning in call to _Unwind_Find_FDE
Summary: Throwing exceptions in multiple threads leads to spinning in call to _Unwind_...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: other (show other bugs)
Version: 4.2.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-23 21:20 UTC by Andy Newman
Modified: 2008-04-25 05:27 UTC (History)
3 users (show)

See Also:
Host: i386-undermydesk-freebsd
Target: i386-undermydesk-freebsd
Build: i386-undermydesk-freebsd
Known to work:
Known to fail:
Last reconfirmed:


Attachments
test program (388 bytes, text/plain)
2008-04-23 21:22 UTC, Andy Newman
Details
Patch to protect against concurrent modifications to frame cache (259 bytes, patch)
2008-04-23 21:25 UTC, Andy Newman
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andy Newman 2008-04-23 21:20:46 UTC
gcc/unwind-dw2-fde-glibc.c maintains a cache of frames via its dl_iterate_phdr callback but doesn't protect against multiple threads modifying that cache concurrently.  A simple test program that starts N threads, all of which immediately throw, often, but not always (as such races are prone to behave), leads to some number of threads stuck inside their stack unwinding consuming 100% CPU.  The particular failure mode will no doubt differ depending upon the particular interleaving of accesses/modifications to the cache structure but I'm seeing consistent hangs on the two systems on which I've tested this (quad core Xeon, dual core Core 2 Duo).

Reproduction typically requires running the test in a loop until it doesn't exit.  The usual caveats regarding races apply, i.e it may not fail on certain CPU/OS combinations or may take a long time to fail.

The attached patch avoids the spinning by grabbing "object_mutex" (defined in gcc/unwind-dw2-fde.c which is included and used in gcc/unwind-dw2-fde-glibc.c) to protect against multiple threads manipulating the cache at the same time.

With the patch no failure has been observed in over 200,000 invocations of the test with each invocation starting 100 threads.

The issue was found on FreeBSD 7.x (aka STABLE) gcc 4.2.1 (20070719) but inspection of the VCS shows the same code exists on the trunk and branches.
Comment 1 Andy Newman 2008-04-23 21:22:18 UTC
Created attachment 15521 [details]
test program

Test program to demonstrate issue
Comment 2 Andy Newman 2008-04-23 21:25:06 UTC
Created attachment 15522 [details]
Patch to protect against concurrent modifications to frame cache

Simple fix that applies coarse-grained locking around the frame cache.  If unwinding performance is a concern either a finer-grain locking strategy or even a lock-free structure should be used.
Comment 3 Andy Newman 2008-04-23 21:36:50 UTC
Note for repro using glibc, the test program uses the BSD err() function to report errors and quit.  A quick look at glibc's manual says it has a compatible err() function but declared in error.h not err.h as in BSD.

Comment 4 Richard Biener 2008-04-24 08:27:14 UTC
This is a bug in your glibc.  Current glibc has

__dl_iterate_phdr (int (*callback) (struct dl_phdr_info *info,
                                    size_t size, void *data), void *data)
{
  struct link_map *l;
  struct dl_phdr_info info;
  int ret = 0;

  /* Make sure we are alone.  */
  __rtld_lock_lock_recursive (GL(dl_load_lock));
  __libc_cleanup_push (cancel_handler, 0);
...

so it is already properly locked.

Which glibc version do you use?  You should probably report this as a bug
to your vendor.
Comment 5 Andy Newman 2008-04-25 05:27:31 UTC
> Which glibc version do you use?

As per description it's FreeBSD 7's (aka STABLE) libc who's ld.so implementation uses gcc's glibc-specific unwinding.

>  You should probably report this as a bug to your vendor.

Done with patch to fix it.

http://www.freebsd.org/cgi/query-pr.cgi?pr=123062

Thanks for the clarification.