Bug 64999 - s390x libgo test failure in TestMemoryProfiler
Summary: s390x libgo test failure in TestMemoryProfiler
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: go (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: ---
Assignee: Ian Lance Taylor
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-10 12:13 UTC by Dominik Vogt
Modified: 2015-04-21 08:28 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Experimental fix (1.33 KB, patch)
2015-02-16 11:49 UTC, Dominik Vogt
Details | Diff
Different fix (1.92 KB, patch)
2015-02-17 12:07 UTC, Dominik Vogt
Details | Diff
Proposed fix (1.31 KB, patch)
2015-03-09 20:56 UTC, boger
Details | Diff
Increment the pc in the callback routine for backtrace_full (574 bytes, patch)
2015-04-17 12:47 UTC, boger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dominik Vogt 2015-02-10 12:13:05 UTC
With the current trunk I get a test failure in the runtime/pprof test.  I'll start investigating what's going on now but wanted to report it beforehand.  Any hints on debugging this?

--

$ make -k check-go RUNTESTFLAGS="go-test.exp"
...
--- FAIL: TestMemoryProfiler (0.16s)
	testing.go:278: The entry did not match:
		32: 1024 \[32: 1024\] @ 0x[0-9,a-f x]+
		#	0x[0-9,a-f]+	pprof_test\.allocatePersistent1K\+0x[0-9,a-f]+	.*/mprof_test\.go:43
		#	0x[0-9,a-f]+	runtime_pprof_test\.TestMemoryProfiler\+0x[0-9,a-f]+	.*/mprof_test\.go:66
		
		
		Profile:
		heap profile: 37: 3544 [3111: 4229640] @ heap/2
		1: 1408 [1: 1408] @ 0x800250a9 0x800250a9 0x8001e5c7 0x8001e5c7 0x80021d3f 0x8001bf73 0x8001d54b 0x800069b1 0x3fff74ac92b 0x80006a15
		#	0x8001bf73	mcommoninit+0x93	../../../libgo/runtime/proc.c:739
		#	0x8001d54b	runtime_schedinit+0x8b	../../../libgo/runtime/proc.c:459
		#	0x800069b1	main+0x31		../../../libgo/runtime/go-main.c:40
		#	0x3fff74ac92b
		#	0x80006a15
		
		1: 1024 [1024: 1048576] @ 0x800250a9 0x800250a9 0x8001163f 0x80011687 0x8000c551 0x800ad279 0x8001e229
		#	0x800ad279	testing.$thunk15+0xc9	../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41		../../../libgo/runtime/proc.c:235
		
		32: 1024 [32: 1024] @ 0x800250a9 0x800250a9 0x8000c44b 0x8000c595 0x800ad279 0x8001e229
		#	0x8000c44b	pprof_test.allocatePersistent1K+0x23	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:43
		#	0x8000c595	runtime_pprof_test.TestMemoryProfiler+0x11d	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:65
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		1: 64 [1: 64] @ 0x800250a9 0x800250a9 0x8001134d 0x8000c533 0x800ad279 0x8001e229
		#	0x800ad279	testing.$thunk15+0xc9	../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41		../../../libgo/runtime/proc.c:235
		
		1: 16 [1: 16] @ 0x800250a9 0x800250a9 0x8000c4df 0x800ad279 0x8001e229
		#	0x800ad279	testing.$thunk15+0xc9	../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41		../../../libgo/runtime/proc.c:235
		
		1: 8 [1: 8] @ 0x800250a9 0x800250a9 0x8000c50f 0x800ad279 0x8001e229
		#	0x800ad279	testing.$thunk15+0xc9	../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41		../../../libgo/runtime/proc.c:235
		
		0: 0 [0: 0] @ 0x800250a9 0x800250a9 0x8000c5bb 0x800ad279 0x8001e229
		#	0x800ad279	testing.$thunk15+0xc9	../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41		../../../libgo/runtime/proc.c:235
		
		0: 0 [1: 32] @ 0x800250a9 0x800250a9 0x8000c3b7 0x8000c58f 0x800ad279 0x8001e229
		#	0x8000c3b7	pprof_test.allocateTransient2M+0x47	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:30
		#	0x8000c58f	runtime_pprof_test.TestMemoryProfiler+0x117	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:64
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		0: 0 [1: 2097152] @ 0x800250a9 0x800250a9 0x8001163f 0x80011687 0x8000c3a7 0x8000c58f 0x800ad279 0x8001e229
		#	0x8000c3a7	pprof_test.allocateTransient2M+0x37	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:30
		#	0x8000c58f	runtime_pprof_test.TestMemoryProfiler+0x117	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:64
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		0: 0 [1024: 1048576] @ 0x800250a9 0x800250a9 0x8000c32b 0x8000c589 0x800ad279 0x8001e229
		#	0x8000c32b	pprof_test.allocateTransient1M+0x23	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:21
		#	0x8000c589	runtime_pprof_test.TestMemoryProfiler+0x111	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:59
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		0: 0 [1024: 32768] @ 0x800250a9 0x800250a9 0x8000c561 0x800ad279 0x8001e229
		#	0x800ad279	testing.$thunk15+0xc9	../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41		../../../libgo/runtime/proc.c:235
		
		0: 0 [1: 16] @ 0x800250a9 0x800250a9 0x80025b41 0x80025b41 0x8001d531 0x800069b1 0x3fff74ac92b 0x80006a15
		#	0x800069b1	main+0x31	../../../libgo/runtime/go-main.c:40
		#	0x3fff74ac92b
		#	0x80006a15
		
		0: 0 [0: 0] @ 0x800250a9 0x800250a9 0x8001163f 0x80011687 0x80009025 0x80007949 0x8000c609 0x800ad279 0x8001e229
		#	0x80009025	pprof.writeHeap+0x8d			/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:401
		#	0x80007949	runtime_pprof.WriteTo.pN21_runtime_pprof.Profile+0x99	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:229
		#	0x8000c609	runtime_pprof_test.TestMemoryProfiler+0x191		/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:71
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		0: 0 [0: 0] @ 0x800250a9 0x800250a9 0x80008fd3 0x80007949 0x8000c609 0x800ad279 0x8001e229
		#	0x80008fd3	pprof.writeHeap+0x3b			/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:388
		#	0x80007949	runtime_pprof.WriteTo.pN21_runtime_pprof.Profile+0x99	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:229
		#	0x8000c609	runtime_pprof_test.TestMemoryProfiler+0x191		/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:71
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		0: 0 [0: 0] @ 0x800250a9 0x800250a9 0x8001134d 0x80006ecf 0x8000c5e1 0x800ad279 0x8001e229
		#	0x800ad279	testing.$thunk15+0xc9	../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41		../../../libgo/runtime/proc.c:235
		
		0: 0 [0: 0] @ 0x800250a9 0x800250a9 0x80011821 0x800112cf 0x80006c03 0x80006ea3 0x8000c5e1 0x800ad279 0x8001e229
		#	0x80006c03	pprof.lockProfiles+0x103		/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:90
		#	0x80006ea3	runtime_pprof.Lookup+0x23		/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:127
		#	0x8000c5e1	runtime_pprof_test.TestMemoryProfiler+0x169	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:71
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		0: 0 [0: 0] @ 0x800250a9 0x800250a9 0x80011c07 0x800112a9 0x80006c03 0x80006ea3 0x8000c5e1 0x800ad279 0x8001e229
		#	0x80006c03	pprof.lockProfiles+0x103		/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:90
		#	0x80006ea3	runtime_pprof.Lookup+0x23		/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:127
		#	0x8000c5e1	runtime_pprof_test.TestMemoryProfiler+0x169	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:71
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		0: 0 [0: 0] @ 0x800250a9 0x800250a9 0x80011be3 0x800112a9 0x80006c03 0x80006ea3 0x8000c5e1 0x800ad279 0x8001e229
		#	0x80006c03	pprof.lockProfiles+0x103		/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:90
		#	0x80006ea3	runtime_pprof.Lookup+0x23		/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/pprof.go:127
		#	0x8000c5e1	runtime_pprof_test.TestMemoryProfiler+0x169	/home/vogt/src/git/gcc/build-go/s390x-ibm-linux-gnu/libgo/gotest81795/test/mprof_test.go:71
		#	0x800ad279	testing.$thunk15+0xc9			../../../libgo/go/testing/testing.go:555
		#	0x8001e229	kickoff+0x41				../../../libgo/runtime/proc.c:235
		
		
		# runtime.MemStats
		# Alloc = 25574936
		# TotalAlloc = 29809488
		# Sys = 35420408
		# Lookups = 11
		# Mallocs = 7276
		# Frees = 3243
		# HeapAlloc = 25574936
		# HeapSys = 30408704
		# HeapIdle = 4521984
		# HeapInuse = 25886720
		# HeapReleased = 0
		# HeapObjects = 4033
		# Stack = 0 / 0
		# MSpan = 9472 / 49152
		# MCache = 2200 / 16384
		# BuckHashSys = 1540888
		# NextGC = 50917776
		# PauseNs = [85000 94000 92000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
		# NumGC = 3
		# EnableGC = true
		# DebugGC = false
		
FAIL
FAIL: runtime/pprof
make[3]: *** [runtime/pprof/check] Error 1
Comment 1 Dominik Vogt 2015-02-10 15:21:18 UTC
These entries looks screwed to me:

-- snip --
32: 1024 [32: 1024] @ ...
#	0x8000c44b ...allocatePersistent1K ... mprof_test.go:43
#	0x8000c595 ...TestMemoryProfiler ... mprof_test.go:65
...

0: 0 [1: 32] @ ...
#	0x8000c3b7 ...allocateTransient2M ... mprof_test.go:30
#	0x8000c58f ...TestMemoryProfiler ... mprof_test.go:64
...
-- snip --

In line 64 there's a call to allocateTransient1M(), not to allocateTransient2M() as the stack trace claims, and likewise line 65 calls allocateTransient2M() and not allocatePersistent1K().  On top of that, the output for line 66 is missing completely.
Comment 2 Ian Lance Taylor 2015-02-10 15:23:43 UTC
From a cursory look the problem is that the regexp expects line 66 but the actual output is line 65.  Looking at the code line 66 seems correct.  However if the code is built with optimization it's easy to imagine this being not quite correct for one reason or another.  Probably the test should be loosened.
Comment 3 Ian Lance Taylor 2015-02-10 15:24:57 UTC
Of course it's also possible that there is some bug in libbacktrace.  It's worth checking what gdb reports.
Comment 4 Dominik Vogt 2015-02-10 15:36:34 UTC
The stack traces in gdb look good:

line 64:
--------
#0  __go_new (td=td@entry=0x8010c8e8 <__go_td_S1_xAN5_uint81024ee>, 
    size=size@entry=1024) at ../../../libgo/runtime/go-new.c:14
#1  0x000000008000c32c in pprof_test.allocateTransient1M () at mprof_test.go:21
#2  0x000000008000c58a in runtime_pprof_test.TestMemoryProfiler (
    t=0xc2088502d0) at mprof_test.go:64

line 65:
--------
#0  __go_make_slice1 (td=td@entry=0x80109848 <__go_td_AN5_uint8e>, 
    len=len@entry=2097152) at ../../../libgo/runtime/go-make-slice.c:74
#1  0x000000008000c3a8 in pprof_test.allocateTransient2M () at mprof_test.go:30
#2  0x000000008000c590 in runtime_pprof_test.TestMemoryProfiler (
    t=0xc2088502d0) at mprof_test.go:65

line 66:
--------
#0  __go_new (td=td@entry=0x8010cc18 <__go_tdn_runtime_pprof_test.Obj32>, 
    size=size@entry=32) at ../../../libgo/runtime/go-new.c:14
#1  0x000000008000c44c in pprof_test.allocatePersistent1K ()
    at mprof_test.go:43
#2  0x000000008000c596 in runtime_pprof_test.TestMemoryProfiler (
    t=0xc2088502d0) at mprof_test.go:66
Comment 5 Dominik Vogt 2015-02-10 15:55:01 UTC
Just noticed that all function offsets in the stack traces are broken.  On s390x, only even offsets are valid, but all the numbers are odd, e.g.

  pprof_test.allocatePersistent1K+0x23
                                 ^^^^^
Comment 6 Ian Lance Taylor 2015-02-10 17:20:11 UTC
The odd addresses are most likely a symptom of the libbacktrace library.  It should probably be considered a bug.  I'm guessing that it's because of the --pc in the static function unwind in libbacktrace/backtrace.c.
Comment 7 Dominik Vogt 2015-02-11 07:15:23 UTC
The "--pc" is definitely the cause of the bogus addresses.  To me it seems that these addresses are intended to identify the function and source line from the dwarf info, not to be printed out as a real address (which they are not).

Anyway, the printed address and source line information do not match.

  # 0x8000c595 runtime_pprof_test.TestMemoryProfiler+0x11d mprof_test.go:65

From the disassembly (function names shortened):

   0x000000008000c584 <+268>:	brasl	%r14,0x8000c308 <allocateTransient1M>
   0x000000008000c58a <+274>:	brasl	%r14,0x8000c370 <allocateTransient2M>
   0x000000008000c590 <+280>:	brasl	%r14,0x8000c428 <allocatePersistent1K>
   0x000000008000c596 <+286>:	lghi	%r1,0

The address points to line 66 in the file, but the textual trace associates it with line 65.  On the other hand gdb properly identifies the source line, so the dwarf information must be correct:

  (gdb) $pc = 0x8000c595
  (gdb) p $pc
  $1 = (void (*)()) 0x8000c595 <runtime_pprof_test.TestMemoryProfiler+285>
  (gdb) bt
  #0  0x000000008000c595 in runtime_pprof_test.TestMemoryProfiler (
      t=0xf318aa800000000) at mprof_test.go:66
                              ^^^^^^^^^^^^^^^^
Comment 8 Dominik Vogt 2015-02-11 07:38:48 UTC
The cause for this bug is the interaction between the "--pc" and this code snippet from printStackRecord() in pprof.go:

  ... if runtime.GOARCH == "s390" || runtime.GOARCH == "s390x" {
    // only works if function was called
    // with the brasl instruction (or a
    // different 6-byte instruction).
    tracepc -= 6
  } ...

First, the pc is decremented by one, pointing to the last byte of the brasl instruction.  Then we subtract another six bytes and end up pointing to the last byte of the previous instruction, and that messes up the stack trace.  printStackRecord() does not know that unwind() has already decremented the pc and may mis-calculate the instruction address, possibly not just for s390[x] but for other architectures as well.
Comment 9 Ian Lance Taylor 2015-02-11 17:45:27 UTC
I think that libbacktrace is doing more or less the right thing.  If we don't subtract one from pc there, we have no way to convey signal handler frames correctly.  Also, the resulting PC value is in the instruction we want, not the next instruction.

So probably runtime.Callers (in libgo/runtime/go-callers.c) ought to adjust the value, since that is what the Go code expects.

There is a deeper problem with this code, which is that although the gc compiler can reliably map a PC value to a location, the gccgo compiler can not, because of inlining.  To get inlining right, you need to collect the file/line information when unwinding; you can't recreate it afterward.  runtime_callers get the right information, but runtime.Callers throws it away.
Comment 10 Dominik Vogt 2015-02-12 07:02:49 UTC
As far as I understand, the code in libbacktrace was originally only intended for handling exceptions, not for generating stack traces?  For the former, the code is fine.  But given a function's return address, identifying the call instruction is non-trivial.  Gdb has complex heuristics to get that right in case of inlined functions, tail calls or the hand written trampolines in libffi.  Glibc probably has the same complexity for backtrace_symbols() call.  With libgo's backtrace functionality we need to duplicate the same logic in libbacktrace, no?
Comment 11 Ian Lance Taylor 2015-02-12 16:45:34 UTC
libbacktrace is all about stack backtraces.  It is not about handling exceptions.

libbacktrace handles inlined calls and hand written trampolines, assuming the DWARF information is correct.  libbacktrace does not currently handle sibling calls; it could, based on http://www.dwarfstd.org/ShowIssue.php?issue=100909.2, but nobody has implemented it.

The glibc backtrace_symbols call is a simpler function that is not nearly as powerful as libbacktrace.
Comment 12 Ian Lance Taylor 2015-02-12 17:24:27 UTC
I should add that for purposes of Go, it's not all that important that libbacktrace does not yet handle sibling calls, because the Go compiler turns on -fno-optimize-sibling-calls by default (https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01643.html in gcc/go/go-lang.c).
Comment 13 Dominik Vogt 2015-02-16 10:59:47 UTC
I see.

But what bug or bugs do we have here then?  Current symptoms are:

(1) A bogus call addres may be stored in the backtrace structure (i.e. next instruction's address minus 1).

(2) The address from (1) in printed in the textual backtrace.  As I understand it, the code should instead print the address calculated in printStackRecord().

(3) At least for s390[x], the caller's line number in the printed backtrace may be wrong, but I guess that affects other targets as well.

--

So, what happens is this:

* The code in libbacktrace knows whether the pc points to the call instruction or the instruction after that and handles the latter by decrementing the pc by one before it's stored in some structure describing the stack trace.

* When the user prints a memory heap trace, writeHeap() is called, which prints the address from the stack trace structure and then calls printStackRecord().

* printStackRecord() then deducts the guessed size of the call instruction from the pc and uses this modified pc to determine the file and line number to print in the stack trace.

--

I see at least two things that need to be fixed:

- printStackRecord() needs to know whether the pc has already been decremented in libbacktrace and adapt the calculation accordingly.  On s390[x] this could simply be done by looking at the pc, because odd pcs are invalid.  This is not the case on other architectures though (e.g. ARM7).

- writeHeap should print the address calculated in printStackRecord() instead of the address from the stack trace structure.

As an alternative, if the code in libbacktrace could properly determine the call address, the address calculation code in printStackTrace() could be removed and everything would be fine.
Comment 14 Dominik Vogt 2015-02-16 11:49:23 UTC
Created attachment 34773 [details]
Experimental fix

The attached patch is a sketch of a possible fix (at the expense of duplicating some code from printStackRecord() in a separate function).

This fixes the problems running the test runtime/pprof on s390 and should also fix the inconsistencies in the printed stack trace.
Comment 15 Dominik Vogt 2015-02-16 11:50:13 UTC
P.S.: The patch only addresses s390 and s390x.
Comment 16 Ian Lance Taylor 2015-02-16 14:08:24 UTC
Back in comment #9 I was trying to suggest that runtime.Callers should adjust the PC that it returns to Go code.  That seems simpler and should have the same effect as your patch.
Comment 17 Dominik Vogt 2015-02-17 11:13:06 UTC
runtime.Callers is not called in this test case, so changing it won't help.
Comment 18 Dominik Vogt 2015-02-17 12:07:07 UTC
Created attachment 34788 [details]
Different fix

How about this patch to the callback() function in go-callers.c to modify the incoming pc.  The algorithm to do this is moved from ppfor.go to go-callers.c.
Comment 19 Ian Lance Taylor 2015-02-17 14:24:04 UTC
I see a call to runtime.Callers in Profile.Add in libgo/go/runtime/pprof/pprof.go.  If the PC values in question do not come from there, where do they come from?
Comment 20 Dominik Vogt 2015-02-18 09:21:27 UTC
runtime_MProf_Malloc() calls runtime_callers() without going through runtime.Callers():

#0  runtime_callers (skip=skip@entry=1, locbuf=locbuf@entry=0xc2094734b8, 
    m=m@entry=32, keep_thunks=keep_thunks@entry=0 '\000')
    at ../../../libgo/runtime/go-callers.c:186
#1  0x0000000080023150 in runtime_MProf_Malloc (p=p@entry=0xc20957ac00, 
    size=size@entry=1024) at ../../../libgo/runtime/mprof.goc:189
#2  0x0000000080021722 in profilealloc (size=1024, v=0xc20957ac00)
    at ../../../libgo/runtime/malloc.goc:315
#3  runtime_mallocgc (size=size@entry=1024, typ=typ@entry=2148440120, 
    flag=flag@entry=0) at ../../../libgo/runtime/malloc.goc:253
#4  0x000000008000e30e in __go_new (
    td=td@entry=0x800e9838 <__go_td_S1_xAN5_uint81024ee>, size=size@entry=1024)
    at ../../../libgo/runtime/go-new.c:15
#5  0x000000008000c174 in pprof_test.allocateTransient1M () at mprof_test.go:21
#6  0x000000008000c3d2 in runtime_pprof_test.TestMemoryProfiler (
    t=0xc2088502d0) at mprof_test.go:64
Comment 21 Ian Lance Taylor 2015-02-18 12:28:11 UTC
Ah, thanks.  So we should change that to
			r->stk[i] = locstk[i].pc + 1;
too.
Comment 22 Dominik Vogt 2015-02-18 13:13:13 UTC
You mean the function saveg() code in mprof.goc?

I'm actually not sure how the above patch to runtime_callers() interacts with all the other functions that call runtime_callers().  :-/
Comment 23 Ian Lance Taylor 2015-02-18 18:52:48 UTC
Yes, I do mean to change saveg in mprof.goc.

runtime_callers in general returns full file/line information, which is required for full correctness when using gccgo.  When it devolves back to a plain PC, you've correctly observed that it gets the wrong PC.  I'm suggesting correcting the PC value wherever we devolve, which happens in runtime.Callers and, as you've pointed out, in saveg in mprof.goc.

I looked for other calls to runtime_callers, and found one other case: in runtime_sigprof in proc.c.  I think it's worth trying fixing those cases to see if it fixes this bug.
Comment 24 boger 2015-02-24 14:05:22 UTC
In my opinion, the way the decrement of pc is done in libbacktrace is incorrect.  The code in libbacktrace looks similar to code in libgcc, where the pc type is _Unwind_Ptr.  Since the libgcc has been around for quite a while I assume it is correct, and I see no reason why the code to do the pc decrement in libbacktrace should be different from what's in libgcc.  

I made this change to libbacktrace/backtrace.c and simple.c:

 if (!ip_before_insn)
    pc = (uintptr_t)((_Unwind_Ptr)pc-1);

And then in pprof.go, added this so that the pc value did not get decremented again for ppc64 & ppc64le:

  tracepc := pc
                        // Back up to call instruction.
                        if i > 0 && pc > f.Entry() && !wasPanic {
                                if runtime.GOARCH == "386" || runtime.GOARCH == "amd64" {
                                        tracepc--
                                } else if runtime.GOARCH == "s390" || runtime.GOARCH == "s390x" {
                                        // only works if function was called
                                        // with the brasl instruction (or a
                                        // different 6-byte instruction).
                                        tracepc -= 6
                                } else if runtime.GOARCH != "ppc64" && runtime.GOARCH != "ppc64le" {
                                        tracepc -= 4 // arm, etc
                                }
                        }

After making this change the regression went away and no further failures occurred on ppc64 and ppc64le.  I am not familiar with other platforms (i.e., other supported GOARCHes) to understand how they might be affected by this.  However, since the code that appears in libgcc has existed and been used for quite some time, that code should be right and decrementing the pc in libbacktrace in the same way as is done in libgcc should be correct for all platforms that use it.  Then that leads to the question of what is done in pprof.go, which looks like a hack to undo what was done incorrectly in libbacktrace at least for some platforms.

I will continue to do further testing, but would be curious to know if this change to libbacktrace would fix s390, assuming the special check in pprof.go would be removed?  And then I don't think you should have to make changes in other locations, once libbacktrace has been corrected.
Comment 25 Ian Lance Taylor 2015-02-24 17:17:28 UTC
To which code in libgcc are you referring?  I don't see it.

Our goal has to be for runtime.Callers to return the same sort of values in gccgo as it does in gc.  That means that your change to go/runtime/pprof/pprof.go looks wrong to me.  It suggests that the ppc64 values are different from those of other architectures.  What's the justification for that?
Comment 26 boger 2015-02-24 21:29:52 UTC
(In reply to Ian Lance Taylor from comment #25)
> To which code in libgcc are you referring?  I don't see it.
> 

Here is the code I was referring to, but I was wrong when I thought it fixed the problem.  Even though the testcase passed on further inspection I see that the pc values are still wrong.

ip a.k.a. pc

unwind-c.c:

#ifdef HAVE_GETIPINFO
  ip = _Unwind_GetIPInfo (context, &ip_before_insn);
#else
  ip = _Unwind_GetIP (context);
#endif
  if (! ip_before_insn)
    --ip;

> Our goal has to be for runtime.Callers to return the same sort of values in
> gccgo as it does in gc.  That means that your change to
> go/runtime/pprof/pprof.go looks wrong to me.  It suggests that the ppc64
> values are different from those of other architectures.  What's the
> justification for that?

Sorry I was only experimenting with a solution for Power, just to understand what could be changed to make it work so it was not meant to be a solution for other platforms.  I could see that the pc was being decremented wrong for Power so once I fixed that it didn't make sense to decrement it again in pprof.go.  That was my thinking.

After rereading all you've wrote I think we are coming to same result (somewhat).  If I read the documentation for runtime.Callers, it says that the pc values are the return program counters, but in gccgo it is program counter for the call instruction, so runtime.Callers does need to be changed.  But in one place you are just incrementing the pc, so won't that be wrong for the cases where libbacktrace didn't decrement it?
Comment 27 Ian Lance Taylor 2015-02-24 22:25:09 UTC
The runtime.Callers function (and friends) are somewhat broken for gccgo no matter what we do.  The problem is that we can't represent full file/line information using a single PC value, because a single PC value can't represent inlined functions.

Fortunately, the case where libbacktrace does not decrement the PC is not very interesting for Go, because we aren't going to get any useful file/line information for that case anyhow.  It's the case of unwinding through a signal handler: the PC value in the signal handler does not follow a call instruction, and therefore should not be decremented.  But there is no useful file/line information in a signal handler anyhow.

I'm still not fully certain as to what the best fix is.
Comment 28 boger 2015-02-24 22:41:18 UTC
I'm not concerned about the inline case.  The user could build without inlining if it was important. 

To me it seems like you don't want libbacktrace to decrement the pc when it is being called by runtime.Callers.  Then runtime.Callers should be correct and no changes needed for pprof.go.  But I suppose that messes up other users of libbacktrace unless you somehow had a way to indicate not to decrement in this case or to create a different unwind callback function.
Comment 29 Dominik Vogt 2015-02-26 08:02:46 UTC
(In reply to Ian Lance Taylor from comment #27)
> The problem is that we can't represent full file/line
> information using a single PC value, because a single PC value can't
> represent inlined functions.

Why is it important to be able to map a file:line to a single PC?

> But there is no useful file/line information in a signal handler anyhow.

I would imagine file:line information *is* useful for user written signal handlers and in case of a panic or crash.
Comment 30 boger 2015-02-26 15:15:28 UTC
I don't think it is a good idea to add code in multiple places to fix up the pc values after calling runtime.Callers.  That seems prone to error and confusing for future updates to the code.  

From what I can tell, the problem originates with the use of backtrace_full from the libgo code to find the pc stack values for some platforms.  I think if the pc was never decremented in the backtrace code then I think the values are right.  However I understand you can't just change that behavior due to other existing users of this code that might expect it to continue working as before.  Here are some options:
- Add a new function to libbacktrace for this situation where the pc is never decremented.
- Add a flag to one of the backtrace structures to indicate you don't want the pc decremented in libbacktrace and then use that flag in libbacktrace to control whether the pc is decremented.
- Add a wrapper function to the libgo code to call backtrace_full and then adjust the pc value based on the GOARCH value, understanding what backtrace_full might have done and what the GO code expects.  Then there should be no direct calls to backtrace_full in libgo, but only calls to this wrapper function.

Once the stack pc values are generated as expected then runtime.Callers has the correct values according to what its callers would expect, how it is documented, and matches gc behavior.

I think the pc mapping for inlined functions is a separate issue.  Inlining happens in gccgo and not gc and happens on all gcc compilers so the mapping of pc values to line numbers for inlined code is not an issue specific to gccgo and that is not the issue in this testcase.  Maybe it just needs to be documented so users understand that can happen or maybe inlining should be disabled by default for gccgo and then if users enable it they understand what could happen.
Comment 31 Ian Lance Taylor 2015-02-26 16:03:12 UTC
> Why is it important to be able to map a file:line to a single PC?

The problem is that the Go code in the runtime/pprof package assumes that it can take a single PC value and map that to a meaningful file/line.  Which it can, but with gccgo it loses inlining information.


>> But there is no useful file/line information in a signal handler anyhow.
>
> I would imagine file:line information *is* useful for user written signal handlers and in case of a panic or crash.

Sorry, I misspoke.  I meant signal trampoline--the code in the C library that is invoked by the kernel--not signal handler.
Comment 32 Ian Lance Taylor 2015-02-26 16:11:39 UTC
> I don't think it is a good idea to add code in multiple places to fix up the pc values after calling runtime.Callers.  That seems prone to error and confusing for future updates to the code.

Right, which is why I never suggested that.  I suggested changing runtime.Callers itself to adjust the values that it gets from libbacktrace.


> - Add a wrapper function to the libgo code to call backtrace_full and then adjust the pc value based on the GOARCH value, understanding what backtrace_full might have done and what the GO code expects.  Then there should be no direct calls to backtrace_full in libgo, but only calls to this wrapper function.

Yes, that is exactly what I have been trying to say, but we don't need to introduce a new function.  We already only call backtrace_full from a single place in libgo: runtime_callers (which, to be clear, is not the same as runtime.Callers).


> I think the pc mapping for inlined functions is a separate issue.  Inlining happens in gccgo and not gc and happens on all gcc compilers so the mapping of pc values to line numbers for inlined code is not an issue specific to gccgo and that is not the issue in this testcase.  Maybe it just needs to be documented so users understand that can happen or maybe inlining should be disabled by default for gccgo and then if users enable it they understand what could happen.

To be clear, libbacktrace can handle inlined functions just fine, and libgo does the right thing for things like the stack traces dumped when a program crashes.  It also does the right thing when handling the skip parameter to runtime.Caller and runtime.Callers.  The problem arises when converting the libbacktrace values into the single PC value expected by Go functions like runtime.Callers.

I am not going to disable inlining by default for gccgo.  That would be a performance killer.
Comment 33 boger 2015-02-26 16:34:22 UTC
(In reply to Ian Lance Taylor from comment #32)
> > I don't think it is a good idea to add code in multiple places to fix up the pc values after calling runtime.Callers.  That seems prone to error and confusing for future updates to the code.
> 
> Right, which is why I never suggested that.  I suggested changing
> runtime.Callers itself to adjust the values that it gets from libbacktrace.
> 
> 
> > - Add a wrapper function to the libgo code to call backtrace_full and then adjust the pc value based on the GOARCH value, understanding what backtrace_full might have done and what the GO code expects.  Then there should be no direct calls to backtrace_full in libgo, but only calls to this wrapper function.
> 
> Yes, that is exactly what I have been trying to say, but we don't need to
> introduce a new function.  We already only call backtrace_full from a single
> place in libgo: runtime_callers (which, to be clear, is not the same as
> runtime.Callers).
> 

In comments 21 & 23 it sounded there were plans to make changes in other locations too.  I agree with what you just said here, that you could just make a change in runtime_callers after it calls backtrace_full to adjust the pc and then no other changes should be needed anywhere else.

Yes I realize runtime_callers and runtime.Callers are different and I was being sloppy when I mentioned them.

> 
> > I think the pc mapping for inlined functions is a separate issue.  Inlining happens in gccgo and not gc and happens on all gcc compilers so the mapping of pc values to line numbers for inlined code is not an issue specific to gccgo and that is not the issue in this testcase.  Maybe it just needs to be documented so users understand that can happen or maybe inlining should be disabled by default for gccgo and then if users enable it they understand what could happen.
> 
> To be clear, libbacktrace can handle inlined functions just fine, and libgo
> does the right thing for things like the stack traces dumped when a program
> crashes.  It also does the right thing when handling the skip parameter to
> runtime.Caller and runtime.Callers.  The problem arises when converting the
> libbacktrace values into the single PC value expected by Go functions like
> runtime.Callers.
> 
> I am not going to disable inlining by default for gccgo.  That would be a
> performance killer.
Comment 34 Ian Lance Taylor 2015-02-26 17:33:44 UTC
You're right, we could just change runtime_callers.  Or, simpler still, just change callback in libgo/runtime/go-callers.c.  Good point.
Comment 35 Dominik Vogt 2015-02-27 09:18:50 UTC
I'd like to bring back to attention the fact that the code that deducts six from the pc (s390x) in pprof.go is broken regardless of what patches are made to the runtime code.  Determining the size of the call instruction is by no means trivial.  Other platforms certainly have similar problems with the "pc -= 4".  It would be good to solve this at the same time as the "pc--" problem.
Comment 36 boger 2015-03-06 16:57:26 UTC
I'd like to take a step back and clarify what the functions in question, runtime_callers, runtime.Caller, and runtime.Callers should be returning:  the pc values for the call instruction, or the pc values from the instruction following the call?

The golang documentation for runtime.Callers says it is the pc for the instruction after the call so I will assume that is what we want for that one.

For runtime.Caller I believe it is supposed to return the call instruction itself but I'm not 100% sure on that.

Since runtime.Callers calls runtime_callers, I assume runtime_callers should be providing the instruction after the call.  That means all callers of runtime_callers should be decrementing pc values to the previous instruction if the call instruction is what they want.

Does this sound right?
Comment 37 Ian Lance Taylor 2015-03-06 17:24:49 UTC
The PC returned by runtime.Caller is the same as that returned by runtime.Callers: the instruction following the call.

Otherwise, yes, I agree.
Comment 38 boger 2015-03-06 20:08:27 UTC
I've spent some time thinking about a fix for this and made a few other observations...

I've noticed that when runtime_callers calls backtrace_full, it locks the thread using runtime_xadd and then looks up the line number and function name information and returns it to runtime_callers.  However, in some cases, the caller of runtime callers doesn't use the line and file information, as happens with runtime.Callers and others.   And actually if the caller of runtime_callers (or runtime.Caller) wants to back up the pc, then line number information that was determined by backtrace_full might not be correct so has to be looked up again.

That has me wondering if runtime_callers should be changed so that it calls backtrace_simple instead of backtrace_full and returns just the pc values.  I don't think backtrace_simple would require the thread to be locked since it doesn't care about the backtrace_state; then the code in libbacktrace wouldn't have to read in and save all the debug information as it does now.  And then it would be up to the callers of runtime_callers to determine if the pc should be backed up and determine the line information and file information after it has the correct pc using FuncForPC.

And then to complete the fix for this, we could add a simple function BackupPC which would be similar to what is in pprof.go to move the pc back to the previous instruction based on the GOARCH value.  Unfortunately I am not familiar with all that needs to be done for the s390 and s390x case but whatever has to be done for that would be done in this function.  And then any time the pc needs to be backed up it would be done by calling this function.

Summarizing the changes:
- runtime_callers would call backtrace_simple instead of backtrace_full, returning an array of just the pc values for the stack
- backtrace_simple would not decrement the pc for Power and s390.  I'm assuming it is OK to do the decrement for other platforms because they don't have an issue with the way it is now?  This could be controlled by an #if defined
- a new function BackupPC would be added to take a pc value as input and return the pc for the previous instruction, as described above
- callers of runtime_callers, runtime.Caller or runtime.Callers would call BackupPC if that was needed, and then use FuncForPC to find the file and line number information only if needed.

I have started to work on a patch for this but wanted to put this idea out there.
Comment 39 Ian Lance Taylor 2015-03-06 21:08:53 UTC
Unfortunately backtrace_simple does not handle inlined calls correctly, so it won't work for a couple of very important cases: doing a stack backtrace on panic, and skipping the right number of frames when called from places like the log package.  I don't think that is a workable approach.
Comment 40 boger 2015-03-09 20:56:13 UTC
Created attachment 34995 [details]
Proposed fix

Here is my proposed fix to correct the problem on Power, and mostly fix it for s390/s390x.  Since I don't know the details on the s390/s390x call instructions that have different sizes, I didn't add the code to fix the problem where the call instruction might not be 6 bytes.

Description:

- I made a change to avoid doing the decrement in libbacktrace for Power and s390/s390x.  I think it is better to do it this way than having libbacktrace decrement it and then on return or through the callback increment it.  If it is fixed up in the callback or after the return then the line number information has already been determined in libbacktrace and it could be incorrect.  My assumption is that no other platform has the issue with the decrement of the PC since there are no bugzillas for them so the change in libbacktrace is only needed for Power and s390/s390x.
- I added a new function BackupPC to back up the PC by one instruction by subtracting the correct instruction size for the platform.  As mentioned above, some extra work is needed here to handle different call instructions with different sizes.
- All callers of runtime_callers, runtime.Caller, and runtime.Callers have to recognize that the PC values being returned represent the instruction following the call instruction, so if the call instruction is what is needed, the code will do the back up of the PC and determine the line number based on the new PC.  I had to make a change to log.go to do this otherwise log_test.go failed.  I changed the code in pprof.go to call the new function.

With my patch, the line number information is now correct but the mprof testcase fails due to other differences in the mprof output.  My patch applied to commit 220481 works and it fails on commit 221230, but have not yet narrowed it down to the exact commit where it starts to fail.  I am investigating further, but here is where the output differs:

testing.go:370: The entry did not match:
                (0|1): (0|2097152) \[1: 2097152\] @ 0x[0-9,a-f x]+
                #       0x[0-9,a-f]+    pprof_test\.allocateTransient2M\+0x[0-9,a-f]+   .*/mprof_test.go:30
                #       0x[0-9,a-f]+    runtime_pprof_test\.TestMemoryProfiler\+0x[0-9,a-f]+    .*/mprof_test.go:65


0: 0 [1: 32] @ 0x1002815c 0x1002815c 0x10011a48 0x1000b510 0x1000b7b0 0x100c5a00 0x1001fed4
                #       0x1000b510      pprof_test.allocateTransient2M+0x50             /home/boger/gccgo.work/go1.4/150121/bld/powerpc64le-linux/libgo/gotest8260/test/mprof_test.go:30
                #       0x1000b7b0      runtime_pprof_test.TestMemoryProfiler+0x170     /home/boger/gccgo.work/go1.4/150121/bld/powerpc64le-linux/libgo/gotest8260/test/mprof_test.go:65

On the first line, the actual output contains [1: 32] but the expected output is [1: 2097152].

I built the latest gccgo on an x86 with no extra patches and this same failure occurs there.
Comment 41 Ian Lance Taylor 2015-03-09 22:02:16 UTC
I really don't want libbacktrace to be processor-dependent.  That makes all uses of it more complex for no significant gain.  Maybe we should change libbacktrace, but definitely not by making it processor-dependent.

I would prefer that changes to files like runtime/pprof/pprof.go and log/log.go go through the master repo first.  If gccgo's runtime.Callers acts the same as gc's runtime.Callers, then there shouldn't be any gccgo-specific need to change those files.

Can somebody please try just editing the function callback in runtime/go-callers.c to add one to pc when storing it in loc->pc?  I'd like to understand why that simple change doesn't work before trying something more complex.
Comment 42 boger 2015-03-10 12:28:15 UTC


(In reply to Ian Lance Taylor from comment #41)
> I really don't want libbacktrace to be processor-dependent.  That makes all
> uses of it more complex for no significant gain.  Maybe we should change
> libbacktrace, but definitely not by making it processor-dependent.

I'm sorry but I have to disagree.  In my opinion what libbacktrace does now for Power and s390/s390x is wrong.  It does not return valid PC values for those platforms.  Everyone who calls it has to know that they must look through the list of PC values and change them back.  In addition, libbacktrace has already looked up the line number information with those invalid PCs.  So essentially libbacktrace is returning invalid PCs and usually incorrect line numbers which the caller has to know about and fix up and that seems significant to me.

Maybe the PCs should never be decremented for any platform?

> 
> I would prefer that changes to files like runtime/pprof/pprof.go and
> log/log.go go through the master repo first.  If gccgo's runtime.Callers
> acts the same as gc's runtime.Callers, then there shouldn't be any
> gccgo-specific need to change those files.

Yes I agree that is the way it should be done.  I mainly wanted to show all the changes in one place for review and didn't think through the process for getting that code in place.

> 
> Can somebody please try just editing the function callback in
> runtime/go-callers.c to add one to pc when storing it in loc->pc?  I'd like
> to understand why that simple change doesn't work before trying something
> more complex.

Yes I've already tried that.  But I ran into the issues I described in my first comment above, especially seeing that all the stack information from runtime_callers had incorrect line information and it is not obvious to the caller that they should fix that information before using it.  In addition, there is evidence that looking up the line number information is not cheap, so doing it in libbacktrace and then having to look it up again if it is needed seems like a waste especially in those cases when the line number information is not even used.
Comment 43 Ian Lance Taylor 2015-03-10 14:21:29 UTC
I'm getting confused.  I think I need to talk about one thing at a time.

You say that libbacktrace is returning incorrect line numbers.  That obviously needs to be fixed.  When does that happen?
Comment 44 boger 2015-03-10 14:44:18 UTC
If we do the increment of the pc to fix it in the callback, here is how that happens:
- backtrace_full gets the pc and decrements by 1
- backtrace_full calls backtrace_pcinfo to look up the file, function, line number information based on the decremented pc
- backtrace_pcinfo calls the callback function after the line number has been determined.  If the pc is incremented here then it might not correspond to the line number anymore.  A common example is if you have a statement that does a function call with no return value, the line number will be wrong because the call instruction and the instruction following the call correspond to different line numbers.
Comment 45 Ian Lance Taylor 2015-03-10 16:17:12 UTC
If we change the PC returned by backtrace_full, and then use that changed PC to look up file/line information, we might get different results.  That seems clear.  My next question is: when does this matter?

There are two ways that we use the result of runtime_callers: either we use the file/line information, or we use the PC.  If we use the file/line information, all is well, and there is nothing to discuss.  If we use the PC, as in runtime.Callers, then it's true that if we pass that PC back to libbacktrace we may get different file/line information.  But, so what?  We've already lost the original file/line information anyhow.

The idea is that the changed PC will be the same as the PC returned by the gc library.  Therefore, we should get approximately the same file/line information as the gc library gets.  That is why runtime/pprof/pprof.go in the gc library backs up the PC to the call instruction: because it knows that it has the PC after the call instruction.
Comment 46 boger 2015-03-10 17:49:17 UTC
(In reply to Ian Lance Taylor from comment #45)
> If we change the PC returned by backtrace_full, and then use that changed PC
> to look up file/line information, we might get different results.  That
> seems clear.  My next question is: when does this matter?
> 
> There are two ways that we use the result of runtime_callers: either we use
> the file/line information, or we use the PC.  If we use the file/line
> information, all is well, and there is nothing to discuss.  If we use the
> PC, as in runtime.Callers, then it's true that if we pass that PC back to
> libbacktrace we may get different file/line information.  But, so what? 
> We've already lost the original file/line information anyhow.
> 
> The idea is that the changed PC will be the same as the PC returned by the
> gc library.  Therefore, we should get approximately the same file/line
> information as the gc library gets.  That is why runtime/pprof/pprof.go in
> the gc library backs up the PC to the call instruction: because it knows
> that it has the PC after the call instruction.

Let me just respond to this last paragraph for now.  If the pc values in gc are always the instruction following the call (or the excepting instruction), and runtime_callers expects it to be that way, and we want gc and gccgo to return the same results, then I don't understand why backtrace_full and backtrace_simple should ever decrement the pc for any platform.  There is a brief mention of this in comment 9 -- that we can make use the fact that the pc had been decremented by inspecting it and making use of the fact that is an abnormal pc value; however I don't see anywhere in the libgo code where it does this, and besides, for some platforms after the pc is decremented it is still a normal looking pc so there is no way to know if it had been decremented.  Getting the right pc and the right instruction seems more important than preserving some information that is currently not being used and can't be used on all platforms.

That means, if the decrement was removed completely from backtrace_full and backtrace_simple then the pc values would be correct for each platform and the line numbers would be determined based on the correct pc.  And then we would still need the BackupPC function for those cases where the call instruction was needed as is done in pprof.go and some extra stuff for s390 and s390x for their various call instructions to get the right number of bytes to back up.
Comment 47 Ian Lance Taylor 2015-03-10 18:28:18 UTC
We have to separate backtrace_full and backtrace_simple, which are part of the libbacktrace library, from functions like runtime_callers which are part of libgo.  The libbacktrace library is used by several different projects.  It is not part of libgo.

It may be correct that libbacktrace should never decrement pc.  But the argument for that can't be the behaviour of the gc library.

The bit in comment #9 about not decrementing the pc for a signal frame is specific to libbacktrace.  You can see that in unwind in libbacktrace/backtrace.c and in simple_unwind in libbacktrace/simple.c.

Perhaps we need to change libbacktrace so that backtrace_full_callback and backtrace_simple_callback take an additional parameter, which is whether the PC is in a signal frame.  From libgo's perspective, though, that would be exactly the same as simply always adding to pc in callback in go-callers.c.  You said that did not work, and I'm still trying to understand why.
Comment 48 boger 2015-03-10 20:18:28 UTC
(In reply to Ian Lance Taylor from comment #47)
> We have to separate backtrace_full and backtrace_simple, which are part of
> the libbacktrace library, from functions like runtime_callers which are part
> of libgo.  The libbacktrace library is used by several different projects. 
> It is not part of libgo.
> 
> It may be correct that libbacktrace should never decrement pc.  But the
> argument for that can't be the behaviour of the gc library.
> 

The backtraces I'm familiar with are those where the pc values represent the instruction after the call.  Regardless of what the gc library does I don't see why you'd want to decrement the pcs in backtrace_full.  Actually to me it seems like decrementing it would be wrong for all platforms (based on what is done in pprof.go -- doesn't that mean in some cases it would get decremented twice?) but I just haven't had time to investigate that further.

> The bit in comment #9 about not decrementing the pc for a signal frame is
> specific to libbacktrace.  You can see that in unwind in
> libbacktrace/backtrace.c and in simple_unwind in libbacktrace/simple.c.
> 
> Perhaps we need to change libbacktrace so that backtrace_full_callback and
> backtrace_simple_callback take an additional parameter, which is whether the
> PC is in a signal frame.  From libgo's perspective, though, that would be
> exactly the same as simply always adding to pc in callback in go-callers.c. 
> You said that did not work, and I'm still trying to understand why.

I'm not exactly sure what you are describing -- which functions would have an additional argument and what would you do with that argument.  

Here is the problem with incrementing the pc in the callback.  In unwind the pc is decremented, and the line number is looked up for that pc and saved away.  Then callback is called, and the pc is incremented, but now the line number that was saved away is stale.  That means when on return to runtime_callers, the pcs are all correct but some line numbers might not be right for those pcs.  I tried adding code to look up the line number again in the callback after the pc was modified but that caused the program to hang.
Comment 49 Ian Lance Taylor 2015-03-10 20:45:03 UTC
libbacktrace returns the line number that you actually care about: the line number of the call instruction.  There is no question that that is correct.

You say that it is a problem if the PC does not match the line number, but to me that sounds like a conceptual problem.  What is the actual problem?

The pprof.go code expects to get the PC after the call instruction, and tries to turn it into the PC of the call instruction.  I think we can all agree that pprof.go should in fact see the PC after the call instruction.  The simple way to do that is for callback in libgo/runtime/go-callers.c to increment pc before storing it in loc->pc.  Apparently there is some problem with that but I do not know what that problem is.

The additional argument I mentioned would be for the function types backtrace_fulL_callback and backtrace_simple_callback defined in libbacktrace/backtrace.h.  The additional argument would be, essentially, the value of ip_before_insn in libbacktrace/backtrace.c and libbacktrace/simple.c.  The argument would tell you whether the PC follows a call instruction, or whether it is the correct PC for a signal frame instruction.
Comment 50 boger 2015-03-10 21:52:26 UTC
(In reply to Ian Lance Taylor from comment #49)
> libbacktrace returns the line number that you actually care about: the line
> number of the call instruction.  There is no question that that is correct.
> 
> You say that it is a problem if the PC does not match the line number, but
> to me that sounds like a conceptual problem.  What is the actual problem?
> 
> The pprof.go code expects to get the PC after the call instruction, and
> tries to turn it into the PC of the call instruction.  I think we can all
> agree that pprof.go should in fact see the PC after the call instruction. 
> The simple way to do that is for callback in libgo/runtime/go-callers.c to
> increment pc before storing it in loc->pc.  Apparently there is some problem
> with that but I do not know what that problem is.
> 
> The additional argument I mentioned would be for the function types
> backtrace_fulL_callback and backtrace_simple_callback defined in
> libbacktrace/backtrace.h.  The additional argument would be, essentially,
> the value of ip_before_insn in libbacktrace/backtrace.c and
> libbacktrace/simple.c.  The argument would tell you whether the PC follows a
> call instruction, or whether it is the correct PC for a signal frame
> instruction

OK.

I didn't realize that backtrace_full is expected to return an array of Location structures which contain the PC for the instruction following the call but the line number for the call.  When I saw there was a single structure containing a PC and lineno, I assumed that it was intended to hold information for a single location, i.e., a PC and the lineno that corresponded to it.

I will go back and try this again as you suggested.  I had tried several variations and hit different errors and when I saw how the line number and PC were out of sync that steered me down the wrong path.  Now that I understand what it really should be I think it should probably work.
Comment 51 boger 2015-03-12 14:13:39 UTC
Here is the change I made to go-callers.c.  This patch along with my previous changes to extern.go and pprof.go allows the test identified in this issue to report the correct line number on ppc64le.  I wanted to post it here for comments before I do the bootstrap build and test on ppc64 and ppc64le in case there is a concern with using an #if defined.  This is in libgo/runtime and I didn't see any other examples where it used the GOARCH value in C code so that is why I did it this way.

===================================================================
--- libgo/runtime/go-callers.c  (revision 221039)
+++ libgo/runtime/go-callers.c  (working copy)
@@ -83,6 +83,10 @@ callback (void *data, uintptr_t pc, const char *fi
     }

   loc = &arg->locbuf[arg->index];
+#if defined(__powerpc64__) || defined(__s390__) || defined(__s390x__)
+  if ((pc & 1) == 1)
+    pc++;
+#endif
   loc->pc = pc;

   /* The libbacktrace library says that these strings might disappear,
Comment 52 Ian Lance Taylor 2015-03-12 15:30:47 UTC
Why not just
    pc++;
on all targets?  Why the #ifdef?  It seems to me that we should aim for consistent results on all platforms.
Comment 53 boger 2015-03-12 18:32:53 UTC
I was taking the approach of only fixing what was known to be broken, and I was not aware that this was broken on other platforms.  Minimizing risk.  I can change it for all platforms but these tests are not broken on x86 today with the pc decremented in both libbacktrace and in pprof.go and I don't understand why not so I didn't want to mess with it.

I'm not sure if I understand how it would work to add an argument to callback.  The call to callback happens much later after the ip_before_insn flag has been set:  from inside unwind -> backtrace_pcinfo -> dwarf_fileline -> dwarf_lookup_pc -> callback.  Do you mean pass this flag as an argument to each of these functions so it can be passed to callback?
Comment 54 Ian Lance Taylor 2015-03-12 19:07:37 UTC
I assume that it works on x86 because subtracting 1 from PC in libbacktrace, and then subtracting 1 again in runtime/pprof/pprof.go, still gives you a PC within the call instruction.  On PPC, subtracting 1 in libbacktrace and then subtracting 4 in pprof.go gives you a PC in the instruction just before the call instruction.

I appreciate that you only want to fix what is broken, but in this case I think we should turn that around and keep the code simple unless we find that it breaks something.

Yes, you're right that if we were to change libbacktrace it would mean passing the flag through a chain of functions.  Basically the flag would have to move along with the PC value.  I'm not sure this is worth doing, ip_before_insn is always false except when tracing back through a signal trampoline, where the file/line information is generally not helpful anyhow.
Comment 55 boger 2015-04-17 12:47:11 UTC
Created attachment 35344 [details]
Increment the pc in the callback routine for backtrace_full

Always increment the pc in the callback, knowing that it was decremented during the unwind function to find the line number number of the call instruction.  This leaves the pc with the correct value for the call stack.  If a signal had occurred, and it wasn't a call instruction, the pc will get incremented when it hadn't been decremented during unwind but that is OK for this fix.
Comment 56 boger 2015-04-17 13:04:04 UTC
Here is a bit more detail.  Now that I think I understand all the considerations, I'm proposing this simple fix for gcc 5.  Maybe longer term a more thorough solution could be done but not sure it is worth changing multiple functions to add an additional argument and changing external header files in gcc 5.

The problems to be fixed are:
- the pc in the call stack returned by runtime_callers is wrong for some platforms, such as Power and Z and possibly others
- the pc adjustment and line number that is looked up in go/runtime/pprof.go sometimes provides the wrong line number if based on the bad pc

The proposed fix is to increment the pc in the callback routine that is passed to backtrace_full when trying to find the runtime_callers all the time.  During the unwind the pc was decremented for the normal call case.  If it was a signal, the pc would not have been decremented so it is being incremented unnecessarily for that situation but I think that is OK since the line number will be right and the pc will still refer to the signalling instruction.

As far as the change to the pprof code, at least in gccgo for Power and Z the pc and the line number should be correct so no adjustment is even needed, but my understanding from previous posts is that in the gc compiled case an adjustment is needed and this code is common for gc and gccgo?  With the pprof code that is there now, the pprof code is OK for Power, i.e., the line number in the pprof testcase matches the expected line number.  For Z, I think that the amount to decrement should be the minimum size for a call instruction (not sure what that value is) and then the tracepc value would refer to the previous instruction.

Note that the pprof testcase does not yet work even with this fix because there is a new problem that fails on all platforms with gccgo described here:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64683.
Comment 57 ian@gcc.gnu.org 2015-04-17 19:29:59 UTC
Author: ian
Date: Fri Apr 17 19:29:28 2015
New Revision: 222196

URL: https://gcc.gnu.org/viewcvs?rev=222196&root=gcc&view=rev
Log:
	PR go/64999
	PR go/65180
runtime: Adjust libbacktrace PC value to what runtime.Callers expects.

From Lynn Boger.

Modified:
    trunk/libgo/runtime/go-callers.c
Comment 58 ian@gcc.gnu.org 2015-04-17 19:30:15 UTC
Author: ian
Date: Fri Apr 17 19:29:43 2015
New Revision: 222197

URL: https://gcc.gnu.org/viewcvs?rev=222197&root=gcc&view=rev
Log:
	PR go/64999
	PR go/65180
runtime: Adjust libbacktrace PC value to what runtime.Callers expects.

From Lynn Boger.

Modified:
    branches/gcc-5-branch/libgo/runtime/go-callers.c
Comment 59 Ian Lance Taylor 2015-04-17 21:31:54 UTC
As far as I know all issues here are now fixed.  Please reopen if not.  Thanks.
Comment 60 Dominik Vogt 2015-04-21 08:28:24 UTC
Works on s390x.