Bug 24724 - _Unwind_Backtrace() calls malloc
Summary: _Unwind_Backtrace() calls malloc
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: other (show other bugs)
Version: 4.0.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-07 23:24 UTC by Arun Sharma
Modified: 2012-09-13 14:20 UTC (History)
4 users (show)

See Also:
Host: x86_64-linux-gnu
Target: x86_64-linux-gnu
Build: x86_64-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2012-09-13 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arun Sharma 2005-11-07 23:24:45 UTC
As this stacktrace shows:

#3  0x00000000004044e2 in malloc (size=36024) at tcmalloc.cc:1314
#4  0x000000000047a938 in search_object ()
#5  0x000000000047b189 in _Unwind_Find_FDE ()
#6  0x0000000000478049 in uw_frame_state_for ()
#7  0x0000000000478eca in uw_init_context_1 ()
#8  0x00000000004790b0 in _Unwind_Backtrace ()

there are code paths from _Unwind_Backtrace to malloc. This makes the unwinder deadlock prone when called from applications that have their own customized malloc.
Comment 1 Andrew Pinski 2005-11-08 00:23:34 UTC
What is your malloc doing special and why would it dead lock?  (if you are throwing from inside malloc I think that is an invalid thing to do).
Comment 2 Arun Sharma 2005-11-08 00:48:25 UTC
It deadlocks because malloc is holding a lock and then calls the unwinder.
No, we're not throwing exceptions. One reason why malloc might want to use the unwinder is to do heap profiling.

http://goog-perftools.sourceforge.net/doc/heap_profiler.html
Comment 3 Andrew Pinski 2005-11-08 00:51:12 UTC
You know that glibc has an backtrace function which might be more friendly for your purpose?
Comment 4 Andrew Pinski 2005-11-08 00:53:24 UTC
I really doubt we can remove it because this is also used in the undwinding for exceptions.
Comment 5 Arun Sharma 2005-11-08 00:55:13 UTC
(In reply to comment #3)
> You know that glibc has an backtrace function which might be more friendly for
> your purpose?
> 

glibc backtrace dlopens libgcc and uses _Unwind_Backtrace() on amd64. glibc backtrace has it's own problems (i.e. mallocs) which is why we're not using it.

See: 

http://sources.redhat.com/bugzilla/show_bug.cgi?id=1579
Comment 6 Andrew Pinski 2005-11-08 01:02:45 UTC
Hmm, You could try libunwind instead, it should work on x86_64:
http://www.hpl.hp.com/research/linux/libunwind/

They show you how to use libunwind to generate a normal backtrace:
http://www.hpl.hp.com/research/linux/libunwind/man/libunwind(3).php

Though I doubt that none of these will remove the use of malloc though.
Comment 7 Arun Sharma 2005-11-08 01:07:39 UTC
(In reply to comment #4)
> I really doubt we can remove it because this is also used in the undwinding for
> exceptions.
> 

It must be possible to do stack unwinding without any mallocs. If the exception throwing code path requires mallocs, that's fine by us.

The particular malloc in question is coming from start_fde_sort() in unwind-dw2-fde.c. Perhaps the sorting can be done earlier i.e. before _Unwind_Backtrace() is called?
Comment 8 Arun Sharma 2005-11-08 01:09:06 UTC
(In reply to comment #6)
> Hmm, You could try libunwind instead, it should work on x86_64:
> http://www.hpl.hp.com/research/linux/libunwind/
> 
> They show you how to use libunwind to generate a normal backtrace:
> http://www.hpl.hp.com/research/linux/libunwind/man/libunwind(3).php
> 
> Though I doubt that none of these will remove the use of malloc though.
> 

libunwind doesn't pass unit tests on amd64. davidm thinks that the problems are outside of libunwind. I think he has a couple of bugs open against gcc/glibc.
Comment 9 Andrew Pinski 2005-11-08 01:10:26 UTC
(In reply to comment #8)
> libunwind doesn't pass unit tests on amd64. davidm thinks that the problems are
> outside of libunwind. I think he has a couple of bugs open against gcc/glibc.

Yes and the ones against gcc are only about eplogue or prologue so it should not matter for what you are doing.
Comment 10 Andrew Pinski 2005-11-08 01:12:58 UTC
(In reply to comment #9)
> Yes and the ones against gcc are only about eplogue or prologue so it should
> not matter for what you are doing.

PR 18748 and PR 18749 both are about prologue and eplogue code which should not matter with the backtrace at all.
Comment 11 Andrew Pinski 2005-11-08 01:23:49 UTC
(In reply to comment #7)
> The particular malloc in question is coming from start_fde_sort() in
> unwind-dw2-fde.c. Perhaps the sorting can be done earlier i.e. before
> _Unwind_Backtrace() is called?

If you do that, the start up time is high and every time you load a shared library it stalls and you keep around stuff which you don't need at all.
Comment 12 Arun Sharma 2005-11-08 01:30:34 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Yes and the ones against gcc are only about eplogue or prologue so it should
> > not matter for what you are doing.
> 
> PR 18748 and PR 18749 both are about prologue and eplogue code which should not
> matter with the backtrace at all.
> 

ok, will try to root cause our problems with libunwind (they show up as bad pointer dereferences in libunwind) and get back to you.

Thanks.
Comment 13 Richard Henderson 2010-08-04 23:08:28 UTC
There are two solutions to this:

(1) Make sure your binary provides PT_GNU_EH_FRAME.  This is the quickest
    path through the unwinder, since the table is pre-sorted by the linker.

(2) Have your malloc detect the recursion and return NULL.  This will cause
    the unwinder to perform a linear search through the unsorted tables.
    It should not fail due to the fake out-of-memory condition, since it
    was designed to handle throwing an exception during a true OOM condition.
Comment 14 H.J. Lu 2012-09-13 13:43:42 UTC
(In reply to comment #13)
> There are two solutions to this:
> 
> (1) Make sure your binary provides PT_GNU_EH_FRAME.  This is the quickest
>     path through the unwinder, since the table is pre-sorted by the linker.

This isn't the problem.

> (2) Have your malloc detect the recursion and return NULL.  This will cause
>     the unwinder to perform a linear search through the unsorted tables.
>     It should not fail due to the fake out-of-memory condition, since it
>     was designed to handle throwing an exception during a true OOM condition.

The problem is _Unwind_Find_FDE in unwind-dw2-fde.c
calls search_object to find FDE in the registered objects,
which is loaded unsorted from .eh_frame section.  Can
we use .eh_frame_hdr section to load the sorted table
directly?
Comment 15 Ian Lance Taylor 2012-09-13 14:06:03 UTC
On a system whose linker supports --eh-frame-hdr, we will use the version of _Unwind_Find_FDE in unwind-dw2-fde-dip.c.  It will override the version in unwind-dw2-fde.c by renaming it via #define.  This file is selected by libgcc/config/t-eh-dw2-dip.  It will still call the version of _Unwind_Find_FDE, but that function will only look through files registered by __register_frame_info_bases.  __register_frame_info_bases is called by crtstuff.c, but it is only called on systems whose linker does not support --eh-frame-hdr.

So on what system are you actually seeing a call to qsort?  Does that system have a linker that supports --eh-frame-hdr?
Comment 16 H.J. Lu 2012-09-13 14:20:21 UTC
It is an Android target bug.