[RFC][PR 67336][PING^6] Verify pointers during stack unwind

Jeff Law law@redhat.com
Thu Aug 3 15:28:00 GMT 2017


On 08/03/2017 06:52 AM, Yury Gribov wrote:
> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov <tetra2005.patches@gmail.com> wrote:
>>> On 02.08.2017 23:04, H.J. Lu wrote:
>>>>
>>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov <tetra2005.patches@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 02.08.2017 21:48, H.J. Lu wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>>>>>> <tetra2005.patches@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 02.08.2017 20:02, Jeff Law wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I've rebased the previous patch to trunk per Andrew's suggestion.
>>>>>>>>>>> Original patch description/motivation/questions are in
>>>>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Is his stuff used for exception handling?  If so, doesn't that make
>>>>>>>>>> the
>>>>>>>>>> performance a significant concern (ie that msync call?)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For the performance issue, I wonder if it wouldn't be better to just
>>>>>>>>> compile the unwinder twice, basically include everything needed for
>>>>>>>>> the
>>>>>>>>> verification backtrace in a single *.c file with special defines, and
>>>>>>>>> not export anything from that *.o file except the single entrypoint.
>>>>>>>>> That would also handle the ABI problems.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yea.
>>>>>>>>
>>>>>>>> Given that the vast majority of the time the addresses are valid, I
>>>>>>>> wonder if we could take advantage of that property to keep the
>>>>>>>> overhead
>>>>>>>> lower in the most common cases.
>>>>>>>>
>>>>>>>>> For the concurrent calls of other threads unmapping stacks of running
>>>>>>>>> threads (that is where most of the verification goes against), I'm
>>>>>>>>> afraid
>>>>>>>>> that is simply non-solveable, even if you don't cache anything, it
>>>>>>>>> will
>>>>>>>>> still be racy.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Absolutely -- I think solving this stuff 100% reliably without races
>>>>>>>> isn't possible.  I think the question is can we make this
>>>>>>>> significantly
>>>>>>>> better.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> First of all, thanks for the feedback.  This is indeed not a 100%
>>>>>>> robust
>>>>>>> solution, just something to allow tools like Asan to produce more sane
>>>>>>> backtraces on crash (currently when stack is corrupted they would crash
>>>>>>> in
>>>>>>> the unwinder without printing at least partial backtrace).
>>>>>>>
>>>>>>
>>>>>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
>>>>>> corrupted:
>>>>>>
>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>>>>>
>>>>>> There is very little point to unwind stack when stack is corrupted.
>>>>>
>>>>>
>>>>>
>>>>> I'd argue that situation with __stack_chk_fail is very special.  It's
>>>>> used
>>>>> in production code where you'd want to halt as soon as corruption is
>>>>> detected to prevent further damage so even printing message is an
>>>>> additional
>>>>> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
>>>>> to
>>>>> print as many survived frames as possible.
>>>>>
>>>>
>>>> But stack unwinder in libgcc is primarily for production use, not for
>>>> debugging.
>>>
>>>
>>> I can see it used in different projects to print backtraces on error (bind9,
>>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder and
>>> seems to be meant primarily for debugging (at least many projects use it for
>>> this purpose:
>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for
>>> libbacktrace which, as its README mentions, is meant to "print a detailed
>>> backtrace when an error occurs or to gather detailed profiling information".
>>
>> But it shouldn't be performed on a corrupted stack.  When a stack is
>> corrupted, there is just no way to reliably unwind it.
> 
> Why not? Attached patch shows that it's possible (up to a point of
> corruption of course).
Right.  We should unwind up to the point where we detect something
amiss, then stop.

And that's what I'm most interested in here -- detecting of the invalid
frame data.  THe problem I see is doing it efficiently enough.

jeff



More information about the Gcc-patches mailing list