As this stacktrace shows: #3 0x00000000004044e2 in malloc (size=36024) at tcmalloc.cc:1314 #4 0x000000000047a938 in search_object () #5 0x000000000047b189 in _Unwind_Find_FDE () #6 0x0000000000478049 in uw_frame_state_for () #7 0x0000000000478eca in uw_init_context_1 () #8 0x00000000004790b0 in _Unwind_Backtrace () there are code paths from _Unwind_Backtrace to malloc. This makes the unwinder deadlock prone when called from applications that have their own customized malloc.
What is your malloc doing special and why would it dead lock? (if you are throwing from inside malloc I think that is an invalid thing to do).
It deadlocks because malloc is holding a lock and then calls the unwinder. No, we're not throwing exceptions. One reason why malloc might want to use the unwinder is to do heap profiling. http://goog-perftools.sourceforge.net/doc/heap_profiler.html
You know that glibc has an backtrace function which might be more friendly for your purpose?
I really doubt we can remove it because this is also used in the undwinding for exceptions.
(In reply to comment #3) > You know that glibc has an backtrace function which might be more friendly for > your purpose? > glibc backtrace dlopens libgcc and uses _Unwind_Backtrace() on amd64. glibc backtrace has it's own problems (i.e. mallocs) which is why we're not using it. See: http://sources.redhat.com/bugzilla/show_bug.cgi?id=1579
Hmm, You could try libunwind instead, it should work on x86_64: http://www.hpl.hp.com/research/linux/libunwind/ They show you how to use libunwind to generate a normal backtrace: http://www.hpl.hp.com/research/linux/libunwind/man/libunwind(3).php Though I doubt that none of these will remove the use of malloc though.
(In reply to comment #4) > I really doubt we can remove it because this is also used in the undwinding for > exceptions. > It must be possible to do stack unwinding without any mallocs. If the exception throwing code path requires mallocs, that's fine by us. The particular malloc in question is coming from start_fde_sort() in unwind-dw2-fde.c. Perhaps the sorting can be done earlier i.e. before _Unwind_Backtrace() is called?
(In reply to comment #6) > Hmm, You could try libunwind instead, it should work on x86_64: > http://www.hpl.hp.com/research/linux/libunwind/ > > They show you how to use libunwind to generate a normal backtrace: > http://www.hpl.hp.com/research/linux/libunwind/man/libunwind(3).php > > Though I doubt that none of these will remove the use of malloc though. > libunwind doesn't pass unit tests on amd64. davidm thinks that the problems are outside of libunwind. I think he has a couple of bugs open against gcc/glibc.
(In reply to comment #8) > libunwind doesn't pass unit tests on amd64. davidm thinks that the problems are > outside of libunwind. I think he has a couple of bugs open against gcc/glibc. Yes and the ones against gcc are only about eplogue or prologue so it should not matter for what you are doing.
(In reply to comment #9) > Yes and the ones against gcc are only about eplogue or prologue so it should > not matter for what you are doing. PR 18748 and PR 18749 both are about prologue and eplogue code which should not matter with the backtrace at all.
(In reply to comment #7) > The particular malloc in question is coming from start_fde_sort() in > unwind-dw2-fde.c. Perhaps the sorting can be done earlier i.e. before > _Unwind_Backtrace() is called? If you do that, the start up time is high and every time you load a shared library it stalls and you keep around stuff which you don't need at all.
(In reply to comment #10) > (In reply to comment #9) > > Yes and the ones against gcc are only about eplogue or prologue so it should > > not matter for what you are doing. > > PR 18748 and PR 18749 both are about prologue and eplogue code which should not > matter with the backtrace at all. > ok, will try to root cause our problems with libunwind (they show up as bad pointer dereferences in libunwind) and get back to you. Thanks.
There are two solutions to this: (1) Make sure your binary provides PT_GNU_EH_FRAME. This is the quickest path through the unwinder, since the table is pre-sorted by the linker. (2) Have your malloc detect the recursion and return NULL. This will cause the unwinder to perform a linear search through the unsorted tables. It should not fail due to the fake out-of-memory condition, since it was designed to handle throwing an exception during a true OOM condition.
(In reply to comment #13) > There are two solutions to this: > > (1) Make sure your binary provides PT_GNU_EH_FRAME. This is the quickest > path through the unwinder, since the table is pre-sorted by the linker. This isn't the problem. > (2) Have your malloc detect the recursion and return NULL. This will cause > the unwinder to perform a linear search through the unsorted tables. > It should not fail due to the fake out-of-memory condition, since it > was designed to handle throwing an exception during a true OOM condition. The problem is _Unwind_Find_FDE in unwind-dw2-fde.c calls search_object to find FDE in the registered objects, which is loaded unsorted from .eh_frame section. Can we use .eh_frame_hdr section to load the sorted table directly?
On a system whose linker supports --eh-frame-hdr, we will use the version of _Unwind_Find_FDE in unwind-dw2-fde-dip.c. It will override the version in unwind-dw2-fde.c by renaming it via #define. This file is selected by libgcc/config/t-eh-dw2-dip. It will still call the version of _Unwind_Find_FDE, but that function will only look through files registered by __register_frame_info_bases. __register_frame_info_bases is called by crtstuff.c, but it is only called on systems whose linker does not support --eh-frame-hdr. So on what system are you actually seeing a call to qsort? Does that system have a linker that supports --eh-frame-hdr?
It is an Android target bug.