Bug 18010 - bad unwind info due to multiple returns (missing epilogue)
Summary: bad unwind info due to multiple returns (missing epilogue)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: unknown
: P2 normal
Target Milestone: 4.0.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2004-10-15 11:14 UTC by davidm
Modified: 2004-11-04 18:06 UTC (History)
3 users (show)

See Also:
Host: ia64-hp-linux
Target: ia64-hp-linux
Build: ia64-hp-linux
Known to work:
Known to fail:
Last reconfirmed: 2004-10-21 18:07:34


Attachments
test-ptrace-misc.c (1.34 KB, text/plain)
2004-10-15 11:15 UTC, davidm
Details
patch.copy.frame_related (525 bytes, text/plain)
2004-10-19 01:07 UTC, Jim Wilson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description davidm 2004-10-15 11:14:11 UTC
It appears there is a long-standing and quite nasty bug in GCC.  I believe all
versions of GCC which allow for reordering of basic-blocks are affected.  In the
example I attached, func has two epilogues, but the unwind info looks like this:

<func>: [0x0-0x1d0], info at +0x0
  v1, flags=0x0 (), len=32 bytes
    R2:prologue_gr(mask=[rp,ar.pfs],grsave=r37,rlen=11)
        P7:pfs_when(t=0)
        P7:mem_stack_f(t=1,size=4096)
        P7:pr_when(t=2)
        P3:pr_gr(reg=r40)
        P7:lc_when(t=7)
        P3:lc_gr(reg=r41)
        P7:rp_when(t=10)
    R3:body(rlen=46)
        B1:label_state(label=1)
        B2:epilogue(t=1,ecount=0)
    R1:body(rlen=30)
        B1:copy_state(label=1)

Note that there is only one "epilogue" directive.  The stack-popping instruction
in the second "body" region is not marked at all, meaning that the unwind info
will be incorrect once the stack got popped in the second "body" region.

It's easiest to reproduce this bug with a libunwind-enabled gdb:

 $ gcc -v 2>&1 | grep 'version'
 gcc version 3.4.1
 $ gcc -O2 -Wall .test-ptrace-misc.c ident.c -o test-ptrace-misc
 $ gdb ./test-ptrace-misc
 (gdb) b 0x4000000000000aa2     # this should be the last instruction in func
 (gdb) r
 Starting program: /home/davidm/src/unwind/build-opt/tests/test-ptrace-misc 

 Program received signal SIGUSR1, User defined signal 1.
 0x20000000000ef042 in kill () from /lib/tls/libc.so.6.1
 (gdb) c
 Continuing.

 Breakpoint 1, 0x4000000000000aa2 in func ()
 (gdb) bt
 #0  0x4000000000000aa2 in func ()
 #1  0x4000000000000970 in func ()
 #2  0x4000000000002070 in bar ()
 #3  0x0000000000000000 in ?? ()
 #4  0x0000000000000000 in ?? ()
       ...etc...

(You can verify that gdb is picking up libunwind by starting it with environment
variable LD_DEBUG set to "files"; you should see a line along the lines of:

     16559:     file=libunwind-ia64.so;  generating link map

while gdb is starting up.)

To fix this bug, GCC should be emitting a ".restore sp" directive in front of
every instruction which pops the stack-pointer.
Comment 1 davidm 2004-10-15 11:15:46 UTC
Created attachment 7356 [details]
test-ptrace-misc.c

Test case.
Comment 2 Jim Wilson 2004-10-19 01:07:27 UTC
Subject: Re:  New: bad unwind info due to multiple
	returns (missing epilogue)

On Fri, 2004-10-15 at 04:14, davidm at hpl dot hp dot com wrote:
> To fix this bug, GCC should be emitting a ".restore sp" directive in front of
> every instruction which pops the stack-pointer.

I'm still sick, four weeks and counting, but this looks like a pretty
easy one.  We just need to copy RTX_FRAME_RELATED_P when we copy
instructions.  The following patch gives the right result for the
testcase.  I have as yet done no other testing of the patch.
Comment 3 Jim Wilson 2004-10-19 01:07:28 UTC
Created attachment 7373 [details]
patch.copy.frame_related
Comment 4 davidm 2004-10-19 18:08:46 UTC
(In reply to comment #2)
> Subject: Re:  New: bad unwind info due to multiple
> 	returns (missing epilogue)
> On Fri, 2004-10-15 at 04:14, davidm at hpl dot hp dot com wrote:
> > To fix this bug, GCC should be emitting a ".restore sp" directive in front 
of
> > every instruction which pops the stack-pointer.
> I'm still sick, four weeks and counting, but this looks like a pretty
> easy one.  We just need to copy RTX_FRAME_RELATED_P when we copy
> instructions.  The following patch gives the right result for the
> testcase.  I have as yet done no other testing of the patch.

Thanks for coming up with a patch so quickly!  I'm currently on travel but 
I'll try running it against the C, C++, and Java test-suites once I'm back on 
Thursday.  It looks like things are getting very close to getting perfect 
unwind info.

Hope you get well soon!

  --david
Comment 5 davidm 2004-10-21 18:03:37 UTC
OK, I tried this patch on the CVS gcc-3_4_branch (the 4.0 branch didn't work at
all for me, even in it's pristine version).  As you said, the patch does fix the
bug I reported.  In addition, the test-suites report the following:

gcc Summary: 4 additional tests now pass (24083, up from 24079)
g++ Summary: 2 new unexpected failures (up from 0).
g77 Summary: no changes
objc Summary: no changes
libstdc++ Summary: no changes
libjava Summary: no changes

I'm not sure the g++ failures are real: if I try and cut & past one of the
failing commands myself, it works just fine.  For example, one of the failures is:

Executing on host: /home/davidm/src/gcc-build-p2/gcc/testsuite/../g++
-B/home/davidm/src/gcc-build-p2/gcc/testsuite/../
/r/wailua/usr/src/misc/gcc-3.4-p2/gcc/testsuite/g++.dg/template/access8.C 
-nostdinc++
-I/home/davidm/src/gcc-build-p2/ia64-unknown-linux-gnu/libstdc++-v3/include/ia64-unknown-linux-gnu
-I/home/davidm/src/gcc-build-p2/ia64-unknown-linux-gnu/libstdc++-v3/include
-I/r/wailua/usr/src/misc/gcc-3.4-p2/libstdc++-v3/libsupc++
-I/r/wailua/usr/src/misc/gcc-3.4-p2/libstdc++-v3/libsupc++
-I/r/wailua/usr/src/misc/gcc-3.4-p2/libstdc++-v3/include/backward
-I/r/wailua/usr/src/misc/gcc-3.4-p2/libstdc++-v3/testsuite -fmessage-length=0  
-ansi -pedantic-errors -Wno-long-long  -S  -o access8.s
WARNING: program timed out.
compiler exited with status 1
FAIL: g++.dg/template/access8.C (test for excess errors)

but when I run this by hand, it works just fine.

I'll see if I can figure out what's going on here tomorrow.
Comment 6 Andrew Pinski 2004-10-21 18:07:34 UTC
Confirmed.
Comment 7 davidm 2004-10-22 10:57:29 UTC
(In reply to comment #5)

Argh, I reran "make check-g++" with the original (unpatched) GCC and am now
seeing 3 unexpected failures (when the exact same compiler produced 0 failures
yesterday).  From what I can see, these failures are due to problems in the
test-infrastructure: they all show up as timeouts after a compile.  It looks to
me as if "expect" sometimes fails to notice the (failure-free) termination of
the compiler and that leads to subsequent and spurious test-suite failures. 
I'll see if this is a bug in "expect".

I wonder if the failure(s) Jim saw for the patch for bug #13158 have the same
root-cause.

Has anyone heard of such spurious failures before?
Comment 8 Jim Wilson 2004-10-25 23:54:45 UTC
Subject: Re:  bad unwind info due to multiple returns
	(missing epilogue)

On Fri, 2004-10-22 at 03:57, davidm at hpl dot hp dot com wrote:
>  It looks to
> me as if "expect" sometimes fails to notice the (failure-free) termination of
> the compiler and that leads to subsequent and spurious test-suite failures. 
> I'll see if this is a bug in "expect".

Yes, there are some known problems with some versions of expect,
particularly on 64-bit machines.  I haven't seen any such problems on my
IA-64 debian linux machine though, at least, not that I have noticed. 
Most of the testsuite problems I have had have been load related.  If
the load average changes while running the testsuite, this can cause
tests which are close to the timeout to be over/under depending on the
load.  Also, some java tests are self-timing, and can fail if the load
changes unexpected while they are running.

I think it was HJ that wrote the patch to fix the expect problem.

Comment 9 Jim Wilson 2004-10-26 00:06:47 UTC
Subject: Re:  bad unwind info due to multiple returns
	(missing epilogue)

On Thu, 2004-10-21 at 11:03, davidm at hpl dot hp dot com wrote:
> OK, I tried this patch on the CVS gcc-3_4_branch (the 4.0 branch didn't work at
> all for me, even in it's pristine version).
> I'm not sure the g++ failures are real: if I try and cut & past one of the
> failing commands myself, it works just fine.  For example, one of the failures is:

I have started some builds to try to test the patch.  You didn't say
what kind of problems you ran into with gcc mainline.  There are a few
known problems such as linking problems if you don't have the libunwind
package installed, but that shouldn't be an issue here.  I haven't done
an IA-64 bootstrap in a while.  I will find out soon enough.

Comment 10 davidm 2004-10-26 08:49:30 UTC
(In reply to comment #9)

> You didn't say what kind of problems you ran into with gcc mainline.

The compiler seemed to get stuck in an apparent endless loop.  "make check"
quickly resulted in timeout failures (and these were real, with the compiler
burning CPU cycles for several minutes until the timeout hit).

  --david
Comment 11 Jim Wilson 2004-10-27 01:29:58 UTC
Subject: Re:  bad unwind info due to multiple returns
	(missing epilogue)

On Tue, 2004-10-26 at 01:49, davidm at hpl dot hp dot com wrote:
> The compiler seemed to get stuck in an apparent endless loop.  "make check"
> quickly resulted in timeout failures (and these were real, with the compiler
> burning CPU cycles for several minutes until the timeout hit).

Maybe a temporary problem?  I did a cvs update on Monday, and it built
for me without problem on my debian testing system.  I also managed to
run the testsuite without problems.  The only difference is in libjava,
and I got two more passes with the patch.  I'm not sure if the patch is
directly responsible for that, but it doesn't matter.  There are more
passes, so I can check it in.

Comment 12 GCC Commits 2004-10-27 01:36:17 UTC
Subject: Bug 18010

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	wilson@gcc.gnu.org	2004-10-27 01:36:12

Modified files:
	gcc            : ChangeLog emit-rtl.c 

Log message:
	Fix for PR 18010, copy epilogue unwind info when copying epilogue insns.
	* emit-rtl.c (emit_copy_of_insn_after): Copy RTX_FRAME_RELATED_P.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6032&r2=2.6033
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/emit-rtl.c.diff?cvsroot=gcc&r1=1.421&r2=1.422

Comment 13 Jim Wilson 2004-10-27 01:48:35 UTC
Fixed.
Comment 14 davidm 2004-10-27 11:04:56 UTC
(In reply to comment #11)
> Subject: Re:  bad unwind info due to multiple returns
> 	(missing epilogue)
> 
> On Tue, 2004-10-26 at 01:49, davidm at hpl dot hp dot com wrote:
> > The compiler seemed to get stuck in an apparent endless loop.  "make check"
> > quickly resulted in timeout failures (and these were real, with the compiler
> > burning CPU cycles for several minutes until the timeout hit).
> 
> Maybe a temporary problem?

It doesn't appear to be for me: I just updated the sources and the compiler
still gets stuck almost immediately.  I did a quick q-syscollect run and got:

Command: /home/davidm/src/gcc/gcc/cc1 -quiet -iprefix /home/davidm/src/gcc/gcc/.
./lib/gcc/ia64-unknown-linux-gnu/4.0.0/ -isystem /home/davidm/src/gcc/gcc/includ
e /r/wailua/usr/src/misc/gcc/gcc/testsuite/gcc.c-torture/compile/20001226-1.c -q
uiet -dumpbase 20001226-1.c -auxbase-strip 20001226-1.o -Os -w -o /tmp/ccR2zJ8h.
s
Flat profile of CPU_CYCLES in cc1-pid1134-cpu0.hist#0:
 Each histogram sample counts as 1.00051m seconds
% time      self     cumul     calls self/call  tot/call name
 98.38      9.59      9.59      127k     75.8u     75.8u find_insn_list
  0.84      0.08      9.67      130k      631n     74.9u add_forward_dependence
  0.27      0.03      9.70      134k      195n      292n ggc_alloc_stat
  0.16      0.02      9.71      132k      121n      424n rtx_alloc_stat

[snip...]

Call-graph table:
index %time      self  children         called     name
                                                       <spontaneous>
[11]  100.0     11.0m      9.73         -          compute_forward_dependences
                82.0m      9.65      130k/130k         add_forward_dependence [2
1]
----------------------------------------------------
                82.0m      9.65      130k              compute_forward_dependenc
es [11]
[21]   99.9     82.0m      9.65      130k          add_forward_dependence
                3.00m     58.0m      128k/128k         alloc_INSN_LIST [22]
                 9.59      0.00      127k/127k         find_insn_list [23]
----------------------------------------------------
                 9.59      0.00      127k              add_forward_dependence [2
1]
[23]   98.4      9.59      0.00      127k          find_insn_list
----------------------------------------------------
                3.00m     58.0m      128k              add_forward_dependence [2
1]
[22]    0.6     3.00m     58.0m      128k          alloc_INSN_LIST
                2.00m     56.0m      130k/130k         gen_rtx_fmt_ue [27]
----------------------------------------------------

[snip..]

I ran configure with:

  $ configure --prefix=/opt/gcc-pre4.0

This was on a Debian/unstable system.  How do you run configure?
Comment 15 Jim Wilson 2004-10-27 19:50:59 UTC
Subject: Re:  bad unwind info due to multiple returns
	(missing epilogue)

On Wed, 2004-10-27 at 04:05, davidm at hpl dot hp dot com wrote:
> This was on a Debian/unstable system.  How do you run configure?

I just use ../gcc/configure most of the time.  I have libunwind 0.98
installed, but then you likely have it or a more recent version
installed also.

You are configuring in the source directory?  That is supposed to work,
but no gcc developers ever do that, so it is often broken.  Usually this
gives some kind of obvious build failure though.  It seems unlikely that
this would result in an infinite loop in the compiler.   FYI The
preferred way to build gcc is something like
  mkdir objdir
  cd objdir
  ../gcc/configure ...

I don't know when I last ran apt-get.  I can try updating and doing
another build.  Also, I have debian/unstable on an alternate partition. 
I can try booting it, running apt-get, and trying a build there, in case
it is debian/unstable specific.

Right now, I am testing the patch for 13158.  I need to wait for this to
finish first.

Comment 16 Jim Wilson 2004-10-28 01:16:03 UTC
Subject: Re:  bad unwind info due to multiple returns
	(missing epilogue)

On Wed, 2004-10-27 at 04:05, davidm at hpl dot hp dot com wrote:
> Command: /home/davidm/src/gcc/gcc/cc1 -quiet -iprefix /home/davidm/src/gcc/gcc/.
> ./lib/gcc/ia64-unknown-linux-gnu/4.0.0/ -isystem /home/davidm/src/gcc/gcc/includ
> e /r/wailua/usr/src/misc/gcc/gcc/testsuite/gcc.c-torture/compile/20001226-1.c -q
> uiet -dumpbase 20001226-1.c -auxbase-strip 20001226-1.o -Os -w -o /tmp/ccR2zJ8h.

Perhaps I should have read your message closer.  I get timeouts for this
testcase also.  However, it bootstraps fine, and the total number of
unexpected gcc failures is only 45, which is really not that bad when
you consider that there is no one actively maintaining the IA-64 port. 
Only 4 of these are timeouts.  Are you seeing worse results than this? 
I agree the results should be better, but there is only so much I can do
in the limited time I have available for IA-64 gcc work.  If you think I
should be spending more time looking at gcc testsuite results, I can try
shifting my priorities.  I've mostly been trying to respond to bug
reports, and not worrying about the testsuite results.

Comment 17 davidm 2004-10-28 09:24:48 UTC
(In reply to comment #16)
> Perhaps I should have read your message closer.  I get timeouts for this
> testcase also.  However, it bootstraps fine, and the total number of
> unexpected gcc failures is only 45, which is really not that bad when
> you consider that there is no one actively maintaining the IA-64 port. 
> Only 4 of these are timeouts.  Are you seeing worse results than this? 

Yes, it's worse but not nearly as bad as I thought.  If I let the test-suite
complete, I get:

                === gcc Summary ===

# of expected passes            30227
# of unexpected failures        115
# of unexpected successes       3
# of expected failures          82
# of unresolved testcases       52
# of untested testcases         28
# of unsupported tests          498

Perhaps the additional failures are due to the fact that my machine is running
Debian/unstable.  Anyhow, this clearly isn't as bad as I thought.  I saw a
couple of timeouts in a row and since that took quite some time, I thought it
was hopeless.  I should have been more patient.
Comment 18 Jim Wilson 2004-10-29 22:00:12 UTC
Subject: Re:  bad unwind info due to multiple returns
	(missing epilogue)

On Thu, 2004-10-28 at 02:24, davidm at hpl dot hp dot com wrote:
> # of unexpected failures        115

This is a lot more failures than we should have.  I didn't have any luck
in reproducing this though.  I did an apt-get update and dist-upgrade on
my debian/unstable partition, rebooted just in case, built and installed
libunwind-0.98 from source, then did a gcc bootstrap and make check, and
got 46 gcc failures.  This is from gcc mainline, last updated on Monday.

I am a novice at configuring a debian/unstable system.  So maybe I
missed something.  I noticed that my debian/unstable partition is using
a 2.4.17 kernel which is apparently the same as debian/testing which
seems a little surprising.  I didn't try looking for a debian/unstable
libunwind package.  Maybe I need to use that one instead of one I built
myself?

Comment 19 davidm 2004-11-04 18:06:51 UTC
(In reply to comment #18)
> On Thu, 2004-10-28 at 02:24, davidm at hpl dot hp dot com wrote:
> > # of unexpected failures        115
> 
> This is a lot more failures than we should have.  I didn't have any luck
> in reproducing this though.  I did an apt-get update and dist-upgrade on
> my debian/unstable partition, rebooted just in case, built and installed
> libunwind-0.98 from source, then did a gcc bootstrap and make check, and
> got 46 gcc failures.  This is from gcc mainline, last updated on Monday.

I tried this again, on two different Debian/unstable systems and it now worked
much better.  On the first (which had the 115 failures before), I got:

# of unexpected failures        47

On the second, which has a "better" libc and gas installed I get:

# of unexpected failures        41

Here, by "better" I mean a glibc which has some unwind-info fixes and a gas
which handles the psp-relative unwind directives correctly.  Though I should say
that I did not try to verify that the additional passes are due to these
differences.

In any case, I agree now: GCC head looks pretty good!

Also, you might like to know that as of last Friday, I was for the first time
able to successfully complete a test which single-steps through a program from
the beginning to the end (including all the ld.so startup/shutdown!), getting a
backtrace after each instruction without detecting any failures!  Of course,
that doesn't prove that the unwind-info is 100% correct, but it _is_ a tough test.
Comment 20 GCC Commits 2004-11-24 22:23:35 UTC
Subject: Bug 18010

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-rhl-branch
Changes by:	jakub@gcc.gnu.org	2004-11-24 22:22:29

Modified files:
	gcc            : ChangeLog emit-rtl.c 
	gcc/config/ia64: ia64.c 

Log message:
	2004-10-27  David Mosberger  <davidm@hpl.hp.com>
	James E Wilson  <wilson@specifixinc.com>
	
	PR target/13158
	* config/ia64/ia64.c (ia64_expand_epilogue): Set RTX_FRAME_RELATED_P on
	sibcall alloc instruction.
	(process_set): Handle sibcall alloc instruction.
	
	2004-10-26  James E Wilson  <wilson@specifixinc.com>
	
	PR target/18010
	* emit-rtl.c (emit_copy_of_insn_after): Copy RTX_FRAME_RELATED_P.
	
	2004-02-03  Kazu Hirata  <kazu@cs.umass.edu>
	
	* config/ia64/ia64.c: Use const0_rtx instead of GEN_INT (0).

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-rhl-branch&r1=2.2326.2.399.2.57&r2=2.2326.2.399.2.58
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/emit-rtl.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-rhl-branch&r1=1.365.4.5&r2=1.365.4.5.2.1
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/ia64/ia64.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-rhl-branch&r1=1.265.2.6.2.5&r2=1.265.2.6.2.6