Bug 65479

Summary: sanitizer stack trace missing frames past #0 on powerpc64
Product: gcc Reporter: Martin Sebor <msebor>
Component: sanitizerAssignee: Not yet assigned to anyone <unassigned>
Status: CLOSED FIXED    
Severity: normal CC: bergner, bill.schmidt, dodji, dvyukov, ian, jakub, joakim.tjernlund, kcc
Priority: P3    
Version: 5.0   
Target Milestone: ---   
Host: Target: powerpc64*-*-linux*
Build: Known to work:
Known to fail: Last reconfirmed: 2016-04-11 00:00:00
Bug Depends on: 65749    
Bug Blocks:    
Attachments: Proof-of-concept patch.
Test case demonstrating stability problem with backtrace_qsort.
Patch tested on powerp64*-*-*
Proposed patch.

Description Martin Sebor 2015-03-19 22:36:42 UTC
On both powerpc64 and powerpc64le, the c-c++-common/asan/misalign-1.c shows 6 failures, all in the output pattern test.  The failures are due to to the stack trace missing stack frame #1 (it only includes frame #0).  It looks like the backtrace on powerpc doesn't work correctly.

=================================================================
==87868==ERROR: AddressSanitizer: unknown-crash on address 0x3fffc231eb2f at pc 0x000010000ce8 bp 0x3fffc231e9f0 sp 0x3fffc231ea10
READ of size 4 at 0x3fffc231eb2f thread T0
    #0 0x10000ce4 in foo /src/gcc-5.0-git/gcc/testsuite/c-c++-common/asan/misalign-1.c:11

Address 0x3fffc231eb2f is located in stack of thread T0 at offset 175 in frame
    #0 0x1000086c in main /src/gcc-5.0-git/gcc/testsuite/c-c++-common/asan/misalign-1.c:28

  This frame has 3 object(s):
    [32, 36) 'v'
    [96, 100) 'w'
    [160, 176) 't' <== Memory access at offset 175 partially overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: unknown-crash /src/gcc-5.0-git/gcc/testsuite/c-c++-common/asan/misalign-1.c:11 foo
Shadow bytes around the buggy address:
  0x09fff8463d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x09fff8463d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x09fff8463d30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x09fff8463d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x09fff8463d50: f1 f1 f1 f1 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4
=>0x09fff8463d60: f2 f2 f2 f2 00[00]f4 f4 f3 f3 f3 f3 00 00 00 00
  0x09fff8463d70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x09fff8463d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x09fff8463d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x09fff8463da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x09fff8463db0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
==87868==ABORTING
Comment 1 Martin Sebor 2015-03-19 23:40:46 UTC
The same problem is causing failures in the following tests on these targets:
c-c++-common/asan/misalign-2.c
c-c++-common/asan/null-deref-1.c
Comment 2 Martin Sebor 2015-04-01 01:14:54 UTC
Created attachment 35196 [details]
Proof-of-concept patch.

Two problems are contributing to the failures in these tests:

1) a missing -fasynchronous-unwind-tables option; the option is necessary on powerpc*-*-*-* to generate a full stack trace
2) a bug in the backtrace_qsort function introduced in r208403 that makes the algorithm unstable (see also http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00342.html)

The attached proof-of-concept patch adds the missing option mentioned in (1) and backs out the commit referenced in (2) as a proof of concept of fixing the problem.  I'll try to come up with an approach that doesn't undo the performance improvement in a subsequent patch.
Comment 3 Martin Sebor 2015-04-10 02:14:49 UTC
The bug in backtrace_qsort is actually worse than the regression introduced in r208403.  There is a fundamental problem with relying on the addresses of the array elements to maintain stability.  Either the algorithm needs to be replaced with a stable one like Merge Sort, or a new data member needs to be introduced into struct line to reflect their initial order.  I suspect the latter alternative will be cheaper in terms of resources (i.e., less memory and faster sort time).

This also means that the aspect of the bug isn't powerpc specific.  It only happens to manifest in the testsuite runs on that target.
Comment 4 Jakub Jelinek 2015-04-10 06:26:56 UTC
As for the -fasynchronous-unwind-tables option, this really should be handled by teaching libsanitizer to handle powerpc* fast unwinding.
For the backtrace_qsort, can you cook up some short testcase that calls backtrace_qsort and sorts things incorrectly?
Comment 5 Martin Sebor 2015-04-10 15:31:18 UTC
Created attachment 35289 [details]
Test case demonstrating stability problem with backtrace_qsort.

Attached is a demo program showing the stability bug in the backtrace_qsort function.  The output shows the result of the current implementation (Unstable) and the expected result (Stable).  A better test case wouldn't rely on the knowledge of the line_compare function and instead arrange to construct a DWARF line program with similar properties that would then cause the backtrace line problem.  I suspect that would take quite a bit of effort to put together, especially if we wanted it to be reproducible across targets.

I plan to work on the fast unwinding but I don't expect it to be ready in time for the 5.0 release.  In the meantime, I'll post a patch to fix the test failures and maintain stability to be considered for 5.0.
Comment 6 Jakub Jelinek 2015-04-10 15:39:54 UTC
(In reply to Martin Sebor from comment #5)
> Created attachment 35289 [details]
> Test case demonstrating stability problem with backtrace_qsort.
> 
> Attached is a demo program showing the stability bug in the backtrace_qsort
> function.  The output shows the result of the current implementation
> (Unstable) and the expected result (Stable).  A better test case wouldn't
> rely on the knowledge of the line_compare function and instead arrange to
> construct a DWARF line program with similar properties that would then cause
> the backtrace line problem.  I suspect that would take quite a bit of effort
> to put together, especially if we wanted it to be reproducible across
> targets.
> 
> I plan to work on the fast unwinding but I don't expect it to be ready in
> time for the 5.0 release.  In the meantime, I'll post a patch to fix the
> test failures and maintain stability to be considered for 5.0.

That hints at a bug in the line_compare function, what it does is just bogus.
Adding the idx field to struct line sounds IMHO like the right thing, and on 64-bit arches won't even eat any extra memory because there have been 32 bits of padding.
Comment 7 Martin Sebor 2015-04-12 23:46:11 UTC
I forgot to mention that there is yet another bug here that's complicating things.  I was initially going to describe it here but since it's independent of this problem I decided to open a separate bug: pr65749.

The complication is that a patch for this bug that produces the expected results on POWER (i.e., passes all sanitizer tests) breaks at least one test on x86_64 because of the incorrect computation of the PC value in libsanitizer/sanitizer_common/sanitizer_stacktrace_libcdep.cc.
Comment 8 Martin Sebor 2015-04-13 18:32:51 UTC
Created attachment 35308 [details]
Patch tested on powerp64*-*-*

This patch lets the affected tests pass on powerp64*-*-* but due to bug 65749, causes regressions in the stack-overflow-1.c test where asan reports different line numbers in the stack trace than expected.
Comment 9 Martin Sebor 2015-04-20 01:37:58 UTC
Created attachment 35360 [details]
Proposed patch.

The attached patch resolves all the Address Sanitizer test suite failures on powerpc64 except for those that are subject to pr65643.  Tested on powerpc64*-*-*-* and x86_64.
Comment 10 Martin Sebor 2015-06-12 00:02:22 UTC
Author: msebor
Date: Fri Jun 12 00:01:50 2015
New Revision: 224402

URL: https://gcc.gnu.org/viewcvs?rev=224402&root=gcc&view=rev
Log:
2015-06-11  Martin Sebor  <msebor@redhat.com>

	PR sanitizer/65479
	* dwarf.c (struct line): Add new field idx.
	(line_compare): Use it.
	(add_line): Set it.
	(read_line_info): Reset it.

Modified:
    trunk/libbacktrace/ChangeLog
    trunk/libbacktrace/dwarf.c
Comment 11 Bill Schmidt 2015-12-18 16:25:40 UTC
Martin, can this be closed?
Comment 12 Martin Sebor 2015-12-18 16:38:48 UTC
The sanitizer tests are still failing and this bug still hasn't been fixed.

The patch in r224402 was only partial, committed on the assumption that the LLVM AddressSantitizer maintainers would approve my patch for it.  Unfortunately, I never managed to convince them to approve the trivial patch there and after much back and forth I gave up (the review is here: http://reviews.llvm.org/D10065).
Comment 13 Joakim Tjernlund 2016-02-15 16:23:52 UTC
with gcc 4.9.3 an powerpc32 I sometimes get backtraces like this:
=================================================================
==2899==ERROR: AddressSanitizer: heap-use-after-free on address 0xb400091c at pc 0xb3800120 bp 0xbf86a270 sp 0xbf86a288
READ of size 4 at 0xb400091c thread T0
==2899==AddressSanitizer: while reporting a bug found another one.Ignoring.
    #0 0xb380011c (+0x11c)
    #1 0xfb2aa10 in __asan_report_error (/usr/lib/gcc/powerpc-unknown-linux-gnu/4.9.3/libasan.so.1+0x65a10)
    #2 0xfb2bac8 in __asan_report_load4 (/usr/lib/gcc/powerpc-unknown-linux-gnu/4.9.3/libasan.so.1+0x66ac8)
    #3 0xf3c028c in Icn_unregisterIf (/opt/appl/cuappl04a-r27a-160209jt2//lib/libiciif.so.1+0x4c28c)

0xb400091c is located 2204 bytes inside of 2208-byte region [0xb4000080,0xb4000920)
freed by thread T0 here:
    #0 0xf3b9ba0 in Icn_removeEntry (/opt/appl/cuappl04a-r27a-160209jt2//lib/libiciif.so.1+0x45ba0)

previously allocated by thread T0 here:
    #0 0xf3c0870 in Icn_registerIf (/opt/appl/cuappl04a-r27a-160209jt2//lib/libiciif.so.1+0x4c870)

SUMMARY: AddressSanitizer: heap-use-after-free ??:0 ??
Shadow bytes around the buggy address:
  0x368000d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x368000e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x368000f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x36800100: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x36800110: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x36800120: fd fd fd[fd]fa fa fa fa fa fa fa fa fa fa fa fa
  0x36800130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x36800140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x36800150: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x36800160: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x36800170: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==2899==ABORTING

This backtrace is incomplete, could this bug be the cause?
Comment 14 Bill Schmidt 2016-04-11 01:49:42 UTC
I wonder if this is just support that hasn't been updated in the GCC copy of libsanitizer.  I recall fixing this bug (or one very similar to it) on the Clang side in 2015.  There is some Power-specific logic in there for doing the stack unwinding.  I will try to dig that information up and compare the two.
Comment 15 Bill Schmidt 2016-04-11 14:48:26 UTC
Yes, it seems likely that this is due to this patch being missing from GCC 5.3:

http://reviews.llvm.org/D11552

The fix is present in trunk, so this should be fixed with a backport.
Comment 16 Bill Schmidt 2016-04-11 17:35:29 UTC
Unfortunately, that does not appear to be sufficient to solve any of the extant ASAN failures.  I'm suspicious that fast unwinding is still being disabled somewhere.  In any case, we need the backport; but we seem to need something else as well.

Confirmed, in any case.
Comment 17 Martin Sebor 2016-05-03 20:54:21 UTC
Unassigning myself since I'm not actively working on it anymore.
Comment 18 Bill Seurer 2016-10-14 19:26:40 UTC
With other fixes that have been made the only thing remaining to get the 3 test cases working is the -fasynchronous-unwind-tables option.  I tried this on powerpc64 LE and BE and it works fine.  I also verified it doesn't break on x86.

c-c++-common/asan/misalign-1.c
c-c++-common/asan/misalign-2.c
c-c++-common/asan/null-deref-1.c


/* { dg-additional-options "-fasynchronous-unwind-tables" { target { powerpc*-*-linux* } } } */

(plus some tweaking of line numbers in expected outputs)

I'll be submitting a patch for this.
Comment 19 seurer 2016-12-21 19:09:42 UTC
Author: seurer
Date: Wed Dec 21 19:09:10 2016
New Revision: 243863

URL: https://gcc.gnu.org/viewcvs?rev=243863&root=gcc&view=rev
Log:
[PATCH, v2, rs6000] pr65479 Add -fasynchronous-unwind-tables when the -fsanitize=address option is seen.

All feedback from the earlier version has been taken into account now.

This patch adds the -fasynchronous-unwind-tables option to compilations when
the -fsanitize=address option is seen but not if any
-fasynchronous-unwind-tables options were already specified.
-fasynchronous-unwind-tables causes a full strack trace to be produced when
the sanitizer detects an error.  Without the full trace several of the asan
test cases fail on powerpc.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65479 for more information.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu,
powerpc64be-unknown-linux-gnu, and x86_64-pc-linux-gnu with no regressions.
Is this ok for trunk?

[gcc]

2016-12-21  Bill Seurer  <seurer@linux.vnet.ibm.com>

	PR sanitizer/65479
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
	-fasynchronous-unwind-tables option when -fsanitize=address is
	specified.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/rs6000.c
Comment 20 seurer 2017-01-09 20:46:31 UTC
Author: seurer
Date: Mon Jan  9 20:45:59 2017
New Revision: 244241

URL: https://gcc.gnu.org/viewcvs?rev=244241&root=gcc&view=rev
Log:
2017-01-09  Bill Seurer  <seurer@linux.vnet.ibm.com>

	Backport from mainline
	2016-12-21  Bill Seurer  <seurer@linux.vnet.ibm.com>

	PR sanitizer/65479
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
	-fasynchronous-unwind-tables option when -fsanitize=address is
	specified.


Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/config/rs6000/rs6000.c
Comment 21 seurer 2017-01-10 19:12:27 UTC
Author: seurer
Date: Tue Jan 10 19:11:55 2017
New Revision: 244284

URL: https://gcc.gnu.org/viewcvs?rev=244284&root=gcc&view=rev
Log:
2017-01-10  Bill Seurer  <seurer@linux.vnet.ibm.com>

	Backport from mainline
	2016-12-21  Bill Seurer  <seurer@linux.vnet.ibm.com>

	PR sanitizer/65479
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
	-fasynchronous-unwind-tables option when -fsanitize=address is
	specified.

Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/config/rs6000/rs6000.c
Comment 22 Bill Seurer 2017-01-10 19:34:21 UTC
This is now resolved in gcc 5, 6, and trunk
Comment 23 Peter Bergner 2017-01-10 19:58:55 UTC
Fixed.
Comment 24 Peter Bergner 2017-01-10 19:59:12 UTC
Closing as fixed.
Comment 25 Peter Bergner 2017-01-10 19:59:38 UTC
Really closed now. :-(