User account creation filtered due to spam.

Bug 18266 - GCJ: Using references drops finalizers causing all apps to eventually crash ( SIGSEGV in GC_register_finalizer_inner () )
Summary: GCJ: Using references drops finalizers causing all apps to eventually crash ...
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcj (show other bugs)
Version: 4.0.0
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 17574
  Show dependency treegraph
 
Reported: 2004-11-01 17:20 UTC by Rutger Ovidius
Modified: 2016-09-30 22:52 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2004-12-02 13:42:43


Attachments
Testcase (750 bytes, text/plain)
2005-07-22 21:01 UTC, Rutger Ovidius
Details
Possible work-around (1.23 KB, patch)
2008-09-23 17:40 UTC, David Daney
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rutger Ovidius 2004-11-01 17:20:56 UTC
gcc version 4.0.0 20041014 (experimental)

When I leave my gcj (4.0.0 20041014 - linux) app running for a few
days, it eventually crashes/locks up in what looks like an infinite
loop of SIGSEGVs (I did an strace on one process that was hung). I
then ran the app under gdb twice (and waited 2 days each time) and the
cause was the same each time. 

Original post:
http://gcc.gnu.org/ml/java/2004-10/msg00134.html

Response:
http://gcc.gnu.org/ml/java/2004-10/msg00142.html

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1116179376 (LWP 22238)]
0x40523321 in GC_register_finalizer_inner () from ./lib/libgcj.so.6
(gdb) bt
#0  0x40523321 in GC_register_finalizer_inner () from ./lib/libgcj.so.6
#1  0x40523513 in GC_register_finalizer_no_order () from ./lib/libgcj.so.6
#2  0x403acf0d in _Jv_MonitorExit () from ./lib/libgcj.so.6
#3  0x081d1d4c in org::eclipse::swt::widgets::Synchronizer::syncExec ()
#4  0x081c4790 in org::eclipse::swt::widgets::Display::syncExec ()
#5  0x080b608a in 
sancho::view::transfer::downloads::DownloadTableTreeContentProvider::update ()
#6  0x4041d761 in java::util::Observable::notifyObservers () 
from ./lib/libgcj.so.6
#7  0x4041d627 in java::util::Observable::notifyObservers () 
from ./lib/libgcj.so.6
#8  0x080ec02e in sancho::model::mldonkey::FileCollection::sendUpdate ()
#9  0x08108aae in sancho::core::MLDonkeyCore$1::run ()
#10 0x404288ed in java::util::Timer$Scheduler::run () from ./lib/libgcj.so.6
#11 0x403d7855 in java::lang::Thread::run () from ./lib/libgcj.so.6
#12 0x403b1c3b in _Jv_ThreadRun () from ./lib/libgcj.so.6
#13 0x40511f50 in _Jv_ThreadRegister () from ./lib/libgcj.so.6
#14 0x4052f418 in GC_start_routine () from ./lib/libgcj.so.6
#15 0x435f979c in start_thread () from /lib/tls/libpthread.so.0
#16 0x433daf2a in clone () from /lib/tls/libc.so.6

(gdb) disas 0x40523281 0x40523391
Dump of assembler code from 0x40523281 to 0x40523391:
0x40523281 <GC_register_finalizer_inner+289>:   mov    %eax,(%esp)
0x40523284 <GC_register_finalizer_inner+292>:   call   0x4036cecc <_init+18776>
0x40523289 <GC_register_finalizer_inner+297>:   mov    0x55d8(%ebx),%eax
0x4052328f <GC_register_finalizer_inner+303>:   mov    (%eax),%eax
0x40523291 <GC_register_finalizer_inner+305>:   test   %eax,%eax
0x40523293 <GC_register_finalizer_inner+307>:   jne    0x40523365 
<GC_register_finalizer_inner+517>
0x40523299 <GC_register_finalizer_inner+313>:   mov    0xffffe9a0(%ebx),%esi
0x4052329f <GC_register_finalizer_inner+319>:   jmp    0x405231b3 
<GC_register_finalizer_inner+83>
0x405232a4 <GC_register_finalizer_inner+324>:   mov    0x14(%ebp),%eax
0x405232a7 <GC_register_finalizer_inner+327>:   test   %eax,%eax
0x405232a9 <GC_register_finalizer_inner+329>:   je     0x405232b4 
<GC_register_finalizer_inner+340>
0x405232ab <GC_register_finalizer_inner+331>:   mov    0x14(%ebp),%edx
0x405232ae <GC_register_finalizer_inner+334>:   movl   $0x0,(%edx)
0x405232b4 <GC_register_finalizer_inner+340>:   mov    0x18(%ebp),%esi
0x405232b7 <GC_register_finalizer_inner+343>:   test   %esi,%esi
0x405232b9 <GC_register_finalizer_inner+345>:   je     0x405232c4 
<GC_register_finalizer_inner+356>
0x405232bb <GC_register_finalizer_inner+347>:   mov    0x18(%ebp),%ecx
0x405232be <GC_register_finalizer_inner+350>:   movl   $0x0,(%ecx)
0x405232c4 <GC_register_finalizer_inner+356>:   mov    0xc(%ebp),%ecx
0x405232c7 <GC_register_finalizer_inner+359>:   test   %ecx,%ecx
0x405232c9 <GC_register_finalizer_inner+361>:   je     0x4052325b 
<GC_register_finalizer_inner+251>
0x405232cb <GC_register_finalizer_inner+363>:   mov    0x4380(%ebx),%eax
0x405232d1 <GC_register_finalizer_inner+369>:   mov    %edi,%edx
0x405232d3 <GC_register_finalizer_inner+371>:   shr    $0x16,%edx
0x405232d6 <GC_register_finalizer_inner+374>:   mov    0xb074(%eax,%edx,4),%edx
0x405232dd <GC_register_finalizer_inner+381>:   mov    %edi,%eax
0x405232df <GC_register_finalizer_inner+383>:   shr    $0xc,%eax
0x405232e2 <GC_register_finalizer_inner+386>:   and    $0x3ff,%eax
0x405232e7 <GC_register_finalizer_inner+391>:   mov    (%edx,%eax,4),%eax
0x405232ea <GC_register_finalizer_inner+394>:   test   %eax,%eax
0x405232ec <GC_register_finalizer_inner+396>:   mov    %eax,0xfffffff0(%ebp)
0x405232ef <GC_register_finalizer_inner+399>:   je     0x4052325b 
<GC_register_finalizer_inner+251>
0x405232f5 <GC_register_finalizer_inner+405>:   mov    $0x1,%edx
0x405232fa <GC_register_finalizer_inner+410>:   mov    %edx,0x4(%esp)
0x405232fe <GC_register_finalizer_inner+414>:   movl   $0x18,(%esp)
0x40523305 <GC_register_finalizer_inner+421>:   call   0x4037563c <_init+53448>
0x4052330a <GC_register_finalizer_inner+426>:   test   %eax,%eax
0x4052330c <GC_register_finalizer_inner+428>:   mov    %eax,%esi
0x4052330e <GC_register_finalizer_inner+430>:   je     0x405233d1 
<GC_register_finalizer_inner+625>
0x40523314 <GC_register_finalizer_inner+436>:   mov    0xfffffff0(%ebp),%edx
0x40523317 <GC_register_finalizer_inner+439>:   not    %edi
0x40523319 <GC_register_finalizer_inner+441>:   mov    %edi,(%esi)
0x4052331b <GC_register_finalizer_inner+443>:   mov    0xc(%ebp),%ecx
0x4052331e <GC_register_finalizer_inner+446>:   mov    0x10(%ebp),%edi
0x40523321 <GC_register_finalizer_inner+449>:   mov    (%edx),%eax
0x40523323 <GC_register_finalizer_inner+451>:   mov    0xffffffec(%ebp),%edx
0x40523326 <GC_register_finalizer_inner+454>:   mov    %ecx,0x8(%esi)
---Type <return> to continue, or q <return> to quit---
0x40523329 <GC_register_finalizer_inner+457>:   mov    0x1c(%ebp),%ecx
0x4052332c <GC_register_finalizer_inner+460>:   mov    %edi,0xc(%esi)
0x4052332f <GC_register_finalizer_inner+463>:   mov    %eax,0x10(%esi)
0x40523332 <GC_register_finalizer_inner+466>:   mov    0x166d4(%ebx),%eax
0x40523338 <GC_register_finalizer_inner+472>:   mov    %ecx,0x14(%esi)
0x4052333b <GC_register_finalizer_inner+475>:   add    %eax,%edx
0x4052333d <GC_register_finalizer_inner+477>:   mov    (%edx),%eax
0x4052333f <GC_register_finalizer_inner+479>:   mov    %esi,(%edx)
0x40523341 <GC_register_finalizer_inner+481>:   mov    %eax,0x4(%esi)
0x40523344 <GC_register_finalizer_inner+484>:   mov    0x5598(%ebx),%eax
0x4052334a <GC_register_finalizer_inner+490>:   incl   (%eax)
0x4052334c <GC_register_finalizer_inner+492>:   jmp    0x4052325b 
<GC_register_finalizer_inner+251>
0x40523351 <GC_register_finalizer_inner+497>:   call   0x403697ac <_init+4664>
0x40523356 <GC_register_finalizer_inner+502>:   jmp    0x4052318d 
<GC_register_finalizer_inner+45>
0x4052335b <GC_register_finalizer_inner+507>:   xor    %ecx,%ecx
0x4052335d <GC_register_finalizer_inner+509>:   lea    0x0(%esi),%esi
0x40523360 <GC_register_finalizer_inner+512>:   jmp    0x40523207 
<GC_register_finalizer_inner+167>
0x40523365 <GC_register_finalizer_inner+517>:   xor    %ecx,%ecx
0x40523367 <GC_register_finalizer_inner+519>:   xor    %eax,%eax
0x40523369 <GC_register_finalizer_inner+521>:   mov    %ecx,0x10(%esp)
0x4052336d <GC_register_finalizer_inner+525>:   mov    0xffffe9a0(%ebx),%ecx
0x40523373 <GC_register_finalizer_inner+531>:   xor    %esi,%esi
0x40523375 <GC_register_finalizer_inner+533>:   mov    %eax,0x18(%esp)
0x40523379 <GC_register_finalizer_inner+537>:   xor    %eax,%eax
0x4052337b <GC_register_finalizer_inner+539>:   xor    %edx,%edx
0x4052337d <GC_register_finalizer_inner+541>:   mov    %eax,0x8(%esp)
0x40523381 <GC_register_finalizer_inner+545>:   mov    $0x1,%eax
0x40523386 <GC_register_finalizer_inner+550>:   mov    %esi,0x14(%esp)
0x4052338a <GC_register_finalizer_inner+554>:   shl    %cl,%eax
0x4052338c <GC_register_finalizer_inner+556>:   mov    %eax,0x4(%esp)
0x40523390 <GC_register_finalizer_inner+560>:   lea    0xffed22fb(%ebx),%eax
End of assembler dump.


(gdb) info registers
eax            0x8dc2270        148644464
ecx            0x405104e0       1079051488
edx            0x1      1
ebx            0x406ecc6c       1081003116
esp            0x42878818       0x42878818
ebp            0x42878850       0x42878850
esi            0x8dc2270        148644464
edi            0x403b27c0       1077618624
eip            0x40523321       0x40523321
eflags         0x10202  66050
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x33     51
Comment 1 Hans Boehm 2004-11-01 20:44:09 UTC
This would be a lot easier if libgcj had been built with something like -O2 -g.

Based on approximate manual matching of the object code to finalize.s, I think 
this is failing around line 452 of finalize.c on the line

new_fo -> fo_object_size = hhdr -> hb_sz;

It appears that hhdr is in %edx and is 1.  This can occur if the first argument 
to GC_register_finalizer_inner is a pointer to somewhere in the second page of 
a large object.  It should of course be a base pointer to an object, so this 
should be impossible.

I think the GC_register_finalizer_no_order call must be coming from 
maybe_remove_all_heavy(), which called remove_all_heavy, which was presumably 
inlined into _Jv_MonitorExit().  I see no other path to 
GC_register_finalizer_no_order().

That makes it appear that an object whose heavy-weight lock we are about to 
remove has previously been garbage collected.  That should be impossible since 
we previously registered our own finalizer for the object in question, and that 
acquires the lock bit in the lock hash table entry, as does remove_all_heavy.  
Thus the finalizer should have previously been run to completion, and all 
traces of the heavy lock should have been previously removed.

Are there places we add a finalizer to an existing object without checking for 
prior finalizers?  That might explain the problem.

We really need some more evidence to confirm this chain of reasoning.  A -g 
stack trace, and the values of the finalization proc and data (and the object 
the data pointer points to, if any) that are being passed to 
GC_register_finalizer_inner might help.  So would GC_find_header
(object_being_registered_address).  Assuming that's one, as expected, then 
*GC_find_header(object_being_registered_address - 4096) together with GC_gc_no 
would also be somewhat interesting.

Does this application use some flavor of weak references?  If so, which one?
Comment 2 Rutger Ovidius 2004-11-01 22:08:27 UTC
The app uses many java.util.WeakHashMap s (usually with null values, just 
storing objects in the keys ie: map.put(object, null), if that matters).
Comment 3 Rutger Ovidius 2004-11-08 03:00:16 UTC
Recompiled with -g (and waited a few days..), but I'm not sure if this is the 
same problem or not:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1106963376 (LWP 23098)]
0x4052753c in GC_mark_from (mark_stack_top=0x8c5f000, mark_stack=0x8c5f000, 
mark_stack_limit=0x8c7f000)
    at /datal/gcc/gcc/boehm-gc/mark.c:724
724                 descr = *(word *)(type_descr
(gdb) bt
#0  0x4052753c in GC_mark_from (mark_stack_top=0x8c5f000, 
mark_stack=0x8c5f000, 
    mark_stack_limit=0x8c7f000) at /datal/gcc/gcc/boehm-gc/mark.c:724
#1  0x40523b4b in GC_finalize () at /datal/gcc/gcc/boehm-gc/finalize.c:639
#2  0x4051fa60 in GC_finish_collection () at /datal/gcc/gcc/boehm-
gc/alloc.c:659
#3  0x405200eb in GC_try_to_collect_inner (stop_func=Variable "stop_func" is 
not available.
) at /datal/gcc/gcc/boehm-gc/alloc.c:376
#4  0x4052087e in GC_collect_or_expand (needed_blocks=Variable "needed_blocks" 
is not available.
) at /datal/gcc/gcc/boehm-gc/alloc.c:1020
#5  0x40520adb in GC_allocobj (sz=12, kind=0) at /datal/gcc/gcc/boehm-
gc/alloc.c:1071
#6  0x405253aa in GC_generic_malloc_inner (lb=48, k=0) at /datal/gcc/gcc/boehm-
gc/malloc.c:136
#7  0x4052621c in GC_generic_malloc_many (lb=48, k=0, result=0x8722cf8)
    at /datal/gcc/gcc/boehm-gc/mallocx.c:512
#8  0x4053014e in GC_local_malloc_atomic (bytes=48) at /datal/gcc/gcc/boehm-
gc/pthread_support.c:334
#9  0x403780fc in _Jv_AllocString (len=14) at java-gc.h:57
#10 0x403b0b75 in java::lang::String::toLowerCase (this=0x859fcc0, 
locale=Variable "locale" is not available.
) at cni.h:41
#11 0x403d4943 in java.lang.String.toLowerCase() (this=0xffffffef)
    at /datal/gcc/gcc/libjava/java/lang/String.java:1031
#12 0x4050de5f in gnu.gcj.convert.IOConverter.canonicalize(java.lang.String) 
(name=Variable "name" is not available.
)
    at /datal/gcc/gcc/libjava/gnu/gcj/convert/IOConverter.java:77
#13 0x4050c36d in gnu.gcj.convert.BytesToUnicode.getDecoder(java.lang.String) 
(encoding=0x859fcc0)
    at /datal/gcc/gcc/libjava/gnu/gcj/convert/BytesToUnicode.java:78
#14 0x403af2c0 in java::lang::String::init (this=0x97c42d0, bytes=0x88ea000, 
offset=Variable "offset" is not available.
)
    at /datal/gcc/gcc/libjava/java/lang/natString.cc:488
#15 0x403d429e in java.lang.String.String(byte[], int, int) 
(this=Variable "this" is not available.
)
    at /datal/gcc/gcc/libjava/java/lang/String.java:345

...
I'll leave it open in a screen session.
Any gdb commands I should type?

Comment 4 Hans Boehm 2004-11-08 19:55:55 UTC
I think this could be explained by the same problem.

This time the collector is in the Java-specific finalization pass which
marks objects reachable from objects that are about to be finalized,
so that the finalizer doesn't see deallocated memory.

It appears that the finalizable object it's marking from has somehow
been clobbered already, and no longer has a proper vtable entry.
Hence the collector dies trying to find the mark descriptor in the
vtable.

I can see how this might happen if we accidentally register a finalizer
on something that was already collected, which can happen if the
finalizer installed by hash synchronization is dropped.

It might be useful to find a little more of the context.  Try

p descr
p current_p
p type_descr
p GC_gc_no
p *mark_stack_top
up 1		-- goto GC_finalize frame
p real_ptr
p *curr_fo
x/8wx real_ptr -4

Printing *curr_fo should indicate the finalization function and "client data"
associated with this object.  It would be useful to explore the "client data"
a bit further, so that we can understand what the finalizer is really trying to 
do.
(All Java finalizers use the same function, and use the client data field to
specify what really needs to be done.)

Assuming current_p and real_ptr are the same, and you can call functions
from gdb, try

p GC_find_header(real_ptr)

If that looks like a sane pointer, also try 

p *GC_find_header(real_ptr)
p GC_base(real_ptr)

I need to look at the WeakHashMap code, but I won't get a chance to do that
for a few days.

We could also probably track this down more systematically by
having the hash synchronization code check that the vtable pointer hasn't
changed when we reregister the client finalizer around line 757 in natObject.cc.
We would have to remember the vtable pointer in the hl structure.
This might cause this to fail far more predictably, and might at
least confirm that we're not barking up the wrong tree.
Comment 5 Rutger Ovidius 2004-11-08 20:27:49 UTC
Unforunately many variables seem not to be available:
GNU gdb 6.1

(gdb) p descr
Variable "descr" is not available.
(gdb) p current_p
$1 = (word *) 0x9acf618
(gdb) p type_descr
No symbol "type_descr" in current context.
(gdb) p GC_gc_no
$2 = 768
(gdb) p *mark_stack_top
$3 = {mse_start = 0x9acf618, mse_descr = 4294967279}
(gdb) up 1
#1  0x40523b4b in GC_finalize () at /datal/gcc/gcc/boehm-gc/finalize.c:639
639                     GC_MARK_FO(real_ptr, GC_normal_finalize_mark_proc);
(gdb) p real_ptr
Variable "real_ptr" is not available.
(gdb) p *curr_fo
Variable "curr_fo" is not available.
(gdb) 
Comment 6 Hans Boehm 2004-11-25 01:50:31 UTC
After finally finding time to look at the code, it appears that my earlier 
guesses were correct.

::java::lang::ref::Reference::create in natReference.cc calls 
_Jv_RegisterFinalizer(referent ...), where referent is an arbitrary object, 
which may already have a finalizer.  This is bad news, since the original 
finalizer will be dropped.

The original finalizer may be a Java finalizer, or it may be one that was 
registered by the hash synchronization code to clean up a heavy lock entry for 
the object.  In either case we lose.  (The hash synchronization code is careful 
to not lose the original finalizer.)  In both cases I think, we are likely to 
mostly introduce leaks, and crash only occasionally.  So this may explain some 
other misbehavior.

The fix may require some thought.  At a minimum, we need to export more GC 
functionality, so that the Reference implementation can retrieve the old 
finalizer.  (The hash synchronization code currently cheats and goes directly 
to the GC interface, which should also be fixed.)

I think that so long as Reference gets the ordering right, and doesn't assume 
that all finalizers are Java finalizers, the hash synchronization code should 
work.  It needs to drop the heavy lock before the object is deallocated, and 
while the lock is not held.  I don't think the timing otherwise matters.  If 
the object is resurrected 17 times, we can drop the heavy lock at any of those 
points, recreating it if necessary.

This really needs to be fixed to make any use of References reliable.
Comment 7 Tom Tromey 2004-11-25 03:25:33 UTC
Oops, I wasn't aware that the locks code was using finalizers.
We had to make special consideration in the reference code for
String.intern; we can do something similar for locks.
This is pretty important, I am going to add it to our 4.0
wish list.
Comment 8 Andrew Pinski 2004-12-02 13:42:42 UTC
Confirmed based on Tromey's comments.
Comment 9 Rutger Ovidius 2005-06-08 21:14:02 UTC
Since this bug seems a bit lost, I've been trying some things on my own without
success.

Can someone please explain:

If "referent" is just a RawData pointer to some Object, how are its previous
finalizers supposed to be found?  How to append them along with the new
finalizer in the correct order (order matters?) to
GC_REGISTER_FINALIZER_NO_ORDER (GC_register_finalizer_inner(obj, fn, cd, ofn,
ocd, mp)) which seems to be what is used in natObject.cc? 

The comments in this bug seem to suggest that there is some similar code
somewhere that I could lift and hook into natReference.create, but all the code
in natObject and String.intern finalization looks very different to my
novice-gcj eyes since they don't seem to be working with some foreign object,
nor with finalizers that aren't already locally stored in a struct.  

Secondly, is there any way to craft a testcase for this to know if it has been
fixed?  I don't fully understand why (based on comment #6) this would ever cause
a crash, and waiting many days for the crash of my apps is a very tedious process.
Comment 10 Hans Boehm 2005-06-09 05:10:31 UTC
Unfortunately, I haven't had time to pursue this.

I think that in order to get this to fail, you want lots of weak references to
objects which are also sobject to lock contention or wait/notify calls.  I don't
think we currently have a good test case.

My impression is that natReference.cc already keeps a fairly elaborate data
structure to which you should be able to add the prior finalization info, so
that it can be invoked at the right point by the existing finalizer there.

In general, the GC's data structures don't queue multiple finalizers.  You need
to register a new finalizer that knows it has to reregister the old one when
it's done.  The information that there was another finalizer needs to be kept
off to the side somewhere in a separate table, or as part of the "client data"
registered with the finalizer.  The locking code also has to deal with opaque
objects, but it again has its own hash table off to the side.
Comment 11 Rutger Ovidius 2005-06-20 16:25:05 UTC
I've tried to create a testcase but can't seem to get a crash or infinite loop 
lockup.

Anyway, I think I understand conceptually what must be done, but in practice 
I'm still unsure of how to go about it. You don't seem to have a problem with 
the call to: 

_Jv_RegisterFinalizer (this, finalize_reference);

only

_Jv_RegisterFinalizer (referent, finalize_referred_to_object);

but since referent is an arbitrary object, what kind of finalizers can it 
already have? I'm stumped on how to get access to an arbitrary object's 
finalizers from natReference.cc.  Is it just the heavy_lock structure from 
natObject that needs to be considered somehow? 

From comment #10:
"My impression is that natReference.cc already keeps a fairly elaborate data
structure to which you should be able to add the prior finalization info"

This is the part that confuses me. Not all objects are References, so how would 
a Reference know about some arbitrary Object's previous finalizers or even 
attempt to maintain a data structure? 

And once I have the Object's old finalizer (if there is one), I guess I just 
run it, and register the new one with
"GC_REGISTER_FINALIZER_NO_ORDER(x, x, cd, 0, 0);"
from natReference's  "finalize_referred_to_object" ?  

Or am I way off track?
Comment 12 Rutger Ovidius 2005-07-19 15:06:22 UTC
I've spent a lot of time trying to make a testcase of this, but no luck yet.

I can basically create a testapp with 2 threads. When they both access a 
synchronized method, and are forced to wait long enough (natObject.cc:907 spins 
18 times before making a hard lock) you can drop the finalizer by inserting a 
WeakHashMap.put(this, null) call. (I force System.gc() regularly)  In gdb I can 
see that heavy_lock_obj_finalization_proc is no longer called once the 
Reference::create() call has been made.  I've done this a ton of times in a 
loop, but I just can't get the test app to crash.   Is dropping the finalizer 
enough to cause a crash (over time)?  I'm not sure what I'm missing, or what I 
can do to force this crash.  

My real app however does crash.  I've recompiled libgcj and do get all the 
information originally requested from gdb from an above comment in yet another 
but similar backtrace.  I don't know if it confirms that the problem is in 
dropping finalizers (or maybe that is a separate problem?), but thought I'd 
post it.

Program received signal SIGSEGV, Segmentation fault.
0x404229f5 in GC_mark_from (mark_stack_top=0xc82b000, mark_stack=0xc82b000,
    mark_stack_limit=0xc83b000) at /home/gcc/gcc/boehm-gc/mark.c:724
724                 descr = *(word *)(type_descr
(gdb) bt
#0  0x404229f5 in GC_mark_from (mark_stack_top=0xc82b000, mark_stack=0xc82b000,
    mark_stack_limit=0xc83b000) at /home/gcc/gcc/boehm-gc/mark.c:724
#1  0x4041eab8 in GC_finalize () at /home/gcc/gcc/boehm-gc/finalize.c:639
#2  0x4041ab83 in GC_finish_collection () at /home/gcc/gcc/boehm-gc/alloc.c:659
#3  0x4041a35b in GC_try_to_collect_inner (stop_func=0x40419c5c 
<GC_never_stop_func>)
    at /home/gcc/gcc/boehm-gc/alloc.c:376
#4  0x4041b3e8 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0)
    at /home/gcc/gcc/boehm-gc/alloc.c:996
#5  0x4041b683 in GC_allocobj (sz=4, kind=0) at /home/gcc/gcc/boehm-
gc/alloc.c:1071
#6  0x40420679 in GC_generic_malloc_inner (lb=16, k=0) at /home/gcc/gcc/boehm-
gc/malloc.c:136
#7  0x404217b3 in GC_generic_malloc_many (lb=16, k=0, result=0x4062b4a8)
    at /home/gcc/gcc/boehm-gc/mallocx.c:512
#8  0x4042b32d in GC_local_malloc_atomic (bytes=12) at /home/gcc/gcc/boehm-
gc/pthread_support.c:334
#9  0x401f2ec7 in _Jv_AllocPtrFreeObj (size=12, klass=0x8816688) at java-gc.h:57
#10 0x401f1674 in _Jv_NewPrimArray (eltype=0x87a3be0, count=1)
    at /home/gcc/gcc/libjava/prims.cc:559
#11 0x08287db9 in org.eclipse.swt.widgets.Table.textCellDataProc(int, int, int, 
int, int) (
    this=0x8940dc0, tree_column=146453640, cell=146453856, 
tree_model=206469928, iter=-1073753012,
    data=146439960) at Table.java:2704
#12 0x082b15b4 in org.eclipse.swt.widgets.Display.textCellDataProc(int, int, 
int, int, int) (
    this=0x884ed48, tree_column=146453640, cell=146453856, 
tree_model=206469928, iter=-1073753012,
    data=146439960) at Display.java:3305
#13 0x4040aceb in ffi_call_SYSV () at /home/gcc/gcc/libffi/src/x86/sysv.S:60
#14 0x4040a8d2 in ffi_call (cif=0xbfffd0b8,

    fn=0x82b1544 <org.eclipse.swt.widgets.Display.textCellDataProc(int, int, 
int, int, int)>,
    rvalue=0xbfffd0b0, avalue=0xbfffcfd0) 
at /home/gcc/gcc/libffi/src/x86/ffi.c:221
#15 0x4023e91e in _Jv_CallAnyMethodA (obj=0x884ed48, return_type=0x87a3be0, 
meth=0x87007c0,
    is_constructor=0 '\0', is_virtual_call=1 '\001', parameter_types=0xc7a5460, 
args=0xbfffd160,
    result=0xbfffd1d4, is_jni_call=1 '\001', iface=0x0)
    at /home/gcc/gcc/libjava/java/lang/reflect/natMethod.cc:495
#16 0x401fa956 in _Jv_JNI_CallAnyMethodV<jint, normal> (env=0x87b28f8, 
obj=0x884ed48, klass=0x0,
    id=0x87007c0, vargs=0xbfffd250 "\210ÎéÎ÷ÎáÎõÎù\b`ÎåÎéÎÝÎáÎõÎù\b
({N\fLÎùÎ÷ÎáÎñÎåÎáÎõÎý\030\177ÎáÎõÎù\bÎùÎé\200iKÎíÎíÎåÎíÎõ\227K")
    at /home/gcc/gcc/libjava/jni.cc:796
#17 0x401fa9ed in _Jv_JNI_CallMethodV<jint> (env=0x87b28f8, obj=0x884ed48, 
id=0x87007c0,
    args=0xbfffd250 "\210ÎéÎ÷ÎáÎõÎù\b`ÎåÎéÎÝÎáÎõÎù\b({N\fLÎùÎ÷ÎáÎñÎåÎáÎõÎý\030
\177ÎáÎõÎù\bÎùÎé\200iKÎíÎíÎåÎíÎõ\227K")
    at /home/gcc/gcc/libjava/jni.cc:967
#18 0x40fbcfac in callback () from ./lib/libswt-gtk-3138.so
#19 0x40faeb65 in fn16_5 () from ./lib/libswt-gtk-3138.so

(gdb) p descr
$1 = 4294967279
(gdb) p current_p
$2 = (word *) 0x93b10e0
(gdb) p type_descr
$3 = 0x2d02ca8a <Address 0x2d02ca8a out of bounds>
(gdb) p GC_gc_no
$4 = 1731
(gdb) p *mark_stack_top
$5 = {mse_start = 0x93b10e0, mse_descr = 4294967279}
(gdb) up
#1  0x4041eab8 in GC_finalize () at /home/gcc/gcc/boehm-gc/finalize.c:639
639                     GC_MARK_FO(real_ptr, GC_normal_finalize_mark_proc);
(gdb) p real_ptr
$6 = 0x93b10e0 "\212ÎõÎ÷\002-"
(gdb) p *curr_fo
$7 = {prolog = {hidden_key = 154865888, next = 0x96d54f8}, fo_fn = 0x40408c14 
<call_finalizer>, 
  fo_client_data = 
0x4023b092 "U\211ÎáÎéÎíVS\203ÎáÎíÎá`ÎáÎéÎý\026TÎáÎñÎÝÎáÎñÎå\201ÎõÎñÎáÎáÎá\225=",
 fo_object_size = 22, 
  fo_mark_proc = 0x4041e03e <GC_null_finalize_mark_proc>}
(gdb) x/8wx real_ptr -4
0x93b10dc:      0x00000000      0x2d02ca8a      0x00000000      0x00000000
0x93b10ec:      0x00000000      0x00000000      0x00000000      0x00000000
(gdb) p GC_find_header(real_ptr)
$8 = (hdr *) 0x92023b8
(gdb) p *GC_find_header(real_ptr)
$9 = {hb_sz = 22, hb_next = 0x8fa5000, hb_prev = 0x0, hb_descr = 4294967279, 
  hb_map = 0x8830008 "", hb_obj_kind = 4 '\004', hb_flags = 0 '\0', 
hb_last_reclaimed = 1730, 
  hb_marks = {0 <repeats 27 times>, 65536, 268435520, 262144, 0, 0}}
(gdb) p GC_base(real_ptr)
$10 = (void *) 0x93b10b0





Comment 13 Rutger Ovidius 2005-07-22 21:01:52 UTC
Created attachment 9333 [details]
Testcase

Attached is the first messy testcase that I've come up with. It fills up the
sync hash table so all future locks are heavy (natObject.cc:~ 990), and drops
the heavy_lock finalizer by Reference::create()'ing via a WeakHashMap.put()
call. 

It consistently crashes in under an hour.  I assume this is a similar crash to
the ones posted above.	This is less than the 2+ days of waiting I used to
endure, but I've not figured out how to speed up the testcase crash yet to a
more reasonable time period.

I experience the bug a lot in real apps but unfortunately my attempts haven't
been able to turn the conceptual solution posted above into tangible code.    

Here are some fun (and different) traces from crashes that this test app
causes.

=====================================

Program received signal SIGSEGV, Segmentation fault.
0x080cbaa4 in ?? ()
(gdb) bt
#0  0x080cbaa4 in ?? ()
#1  0x4024421b in finalize_referred_to_object (obj=0x81342d0)
    at /home/gcc/gcc/libjava/java/lang/ref/natReference.cc:303
#2  0x40411c31 in call_finalizer (obj=0x81342d0, client_data=0x40244092)
    at /home/gcc/gcc/libjava/boehm.cc:516
#3  0x40427f01 in GC_invoke_finalizers () at
/home/gcc/gcc/boehm-gc/finalize.c:787
#4  0x40411c8f in _Jv_RunFinalizers () at /home/gcc/gcc/libjava/boehm.cc:529
#5  0x4022eb92 in gnu::gcj::runtime::FinalizerThread::run (this=0x80b7d70)
    at /home/gcc/gcc/libjava/gnu/gcj/runtime/natFinalizerThread.cc:60
#6  0x40242f92 in _Jv_ThreadRun (thread=0x80b7d70)
    at /home/gcc/gcc/libjava/java/lang/natThread.cc:287
#7  0x40412ce1 in really_start (x=0x80c8ff8) at
/home/gcc/gcc/libjava/posix-threads.cc:394
#8  0x40435000 in GC_start_routine (arg=0x80aefa0)
    at /home/gcc/gcc/boehm-gc/pthread_support.c:1185
#9  0x435f979c in start_thread () from /lib/tls/libpthread.so.0
#10 0x433daf2a in clone () from /lib/tls/libc.so.6

=================================

Program received signal SIGSEGV, Segmentation fault.
0x40239747 in _Jv_CheckCast (c=0x4059d960, obj=0x80eb5c8)
    at /home/gcc/gcc/libjava/java/lang/natClass.cc:1100
1100	  if (__builtin_expect
Current language:  auto; currently c++
(gdb) bt
#0  0x40239747 in _Jv_CheckCast (c=0x4059d960, obj=0x80eb5c8)
    at /home/gcc/gcc/libjava/java/lang/natClass.cc:1100
#1  0x402ec21b in java.util.WeakHashMap.cleanQueue() (this=0x80b7d48)
    at /home/gcc/gcc/libjava/java/util/WeakHashMap.java:595
#2  0x402ec9eb in java.util.WeakHashMap.entrySet() (this=0x80b7d48)
    at /home/gcc/gcc/libjava/java/util/WeakHashMap.java:819
#3  0x402a4bcc in java.util.AbstractMap.clear() (this=0x80b7d48)
    at /home/gcc/gcc/libjava/java/util/AbstractMap.java:122
#4  0x402eca11 in java.util.WeakHashMap.clear() (this=0x80b7d48)
    at /home/gcc/gcc/libjava/java/util/WeakHashMap.java:828
#5  0x08049a62 in TestOb.doit(int) ()
#6  0x0804981a in TestOb$2.run() ()
#7  0x40242f92 in _Jv_ThreadRun (thread=0x80e9f90)
    at /home/gcc/gcc/libjava/java/lang/natThread.cc:287
#8  0x40412ce1 in really_start (x=0x80c8fe8) at
/home/gcc/gcc/libjava/posix-threads.cc:394
#9  0x40435000 in GC_start_routine (arg=0x80aedc0)
    at /home/gcc/gcc/boehm-gc/pthread_support.c:1185
#10 0x435f979c in start_thread () from /lib/tls/libpthread.so.0
#11 0x433daf2a in clone () from /lib/tls/libc.so.6

===================

0x4024420d in finalize_referred_to_object (obj=0x8115380)
    at /home/gcc/gcc/libjava/java/lang/ref/natReference.cc:303
303		    ref->enqueue ();
Current language:  auto; currently c++
(gdb) bt
#0  0x4024420d in finalize_referred_to_object (obj=0x8115380)
    at /home/gcc/gcc/libjava/java/lang/ref/natReference.cc:303
#1  0x40411c31 in call_finalizer (obj=0x8115380, client_data=0x40244092)
    at /home/gcc/gcc/libjava/boehm.cc:516
#2  0x40427f01 in GC_invoke_finalizers () at
/home/gcc/gcc/boehm-gc/finalize.c:787
#3  0x40411c8f in _Jv_RunFinalizers () at /home/gcc/gcc/libjava/boehm.cc:529
#4  0x4022eb92 in gnu::gcj::runtime::FinalizerThread::run (this=0x80b7d70)
    at /home/gcc/gcc/libjava/gnu/gcj/runtime/natFinalizerThread.cc:60
#5  0x40242f92 in _Jv_ThreadRun (thread=0x80b7d70) at
/home/gcc/gcc/libjava/java/lang/natThread.cc:287
#6  0x40412ce1 in really_start (x=0x80c8ff8) at
/home/gcc/gcc/libjava/posix-threads.cc:394
#7  0x40435000 in GC_start_routine (arg=0x80aefa0) at
/home/gcc/gcc/boehm-gc/pthread_support.c:1185
#8  0x435f979c in start_thread () from /lib/tls/libpthread.so.0
#9  0x433daf2a in clone () from /lib/tls/libc.so.6

===============================

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1089690544 (LWP 3229)]
0x4059d92e in vtable for java::util::WeakHashMap$WeakBucket () from
/home/gcc/build/nativegcc/lib/libgcj.so.6
(gdb) bt
#0  0x4059d92e in vtable for java::util::WeakHashMap$WeakBucket () from
/home/gcc/build/nativegcc/lib/libgcj.so.6
#1  0x40f35ac4 in ?? ()
#2  0x40f35bb0 in ?? ()
#3  0x40f35958 in ?? ()
#4  0x40f358ec in ?? ()
#5  0x4061d694 in __JCR_END__ () from /home/gcc/build/nativegcc/lib/libgcj.so.6

#6  0x00000001 in ?? ()
#7  0x08120680 in ?? ()
#8  0x08160780 in ?? ()
#9  0x4024421b in finalize_referred_to_object (obj=0x8120680)
    at /home/gcc/gcc/libjava/java/lang/ref/natReference.cc:303
#10 0x40411c31 in call_finalizer (obj=0x8120680, client_data=0x40244092) at
/home/gcc/gcc/libjava/boehm.cc:516
#11 0x40427f01 in GC_invoke_finalizers () at
/home/gcc/gcc/boehm-gc/finalize.c:787
#12 0x40411c8f in _Jv_RunFinalizers () at /home/gcc/gcc/libjava/boehm.cc:529
#13 0x4022eb92 in gnu::gcj::runtime::FinalizerThread::run (this=0x80b7d70)
    at /home/gcc/gcc/libjava/gnu/gcj/runtime/natFinalizerThread.cc:60
#14 0x40242f92 in _Jv_ThreadRun (thread=0x80b7d70) at
/home/gcc/gcc/libjava/java/lang/natThread.cc:287
#15 0x40412ce1 in really_start (x=0x80c8ff8) at
/home/gcc/gcc/libjava/posix-threads.cc:394
#16 0x40435000 in GC_start_routine (arg=0x80aefa0) at
/home/gcc/gcc/boehm-gc/pthread_support.c:1185
#17 0x435f979c in start_thread () from /lib/tls/libpthread.so.0
#18 0x433daf2a in clone () from /lib/tls/libc.so.6

===================================

0x4042b9f5 in GC_mark_from (mark_stack_top=0x807d000, mark_stack=0x807d000,
mark_stack_limit=0x8085000)
    at /home/gcc/gcc/boehm-gc/mark.c:724
724		    descr = *(word *)(type_descr
(gdb) bt
#0  0x4042b9f5 in GC_mark_from (mark_stack_top=0x807d000, mark_stack=0x807d000,
mark_stack_limit=0x8085000)
    at /home/gcc/gcc/boehm-gc/mark.c:724
#1  0x40427ab8 in GC_finalize () at /home/gcc/gcc/boehm-gc/finalize.c:639
#2  0x40423b83 in GC_finish_collection () at /home/gcc/gcc/boehm-gc/alloc.c:659

#3  0x4042335b in GC_try_to_collect_inner (stop_func=0x40422c5c
<GC_never_stop_func>)
    at /home/gcc/gcc/boehm-gc/alloc.c:376
#4  0x404243e8 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0) at
/home/gcc/gcc/boehm-gc/alloc.c:996
#5  0x40424683 in GC_allocobj (sz=4, kind=4) at
/home/gcc/gcc/boehm-gc/alloc.c:1071
#6  0x40429679 in GC_generic_malloc_inner (lb=16, k=4) at
/home/gcc/gcc/boehm-gc/malloc.c:136
#7  0x4042a7b3 in GC_generic_malloc_many (lb=16, k=4, result=0x406346b0) at
/home/gcc/gcc/boehm-gc/mallocx.c:512
#8  0x40434476 in GC_local_gcj_malloc (bytes=12,
ptr_to_struct_containing_descr=0x405a1148)
    at /home/gcc/gcc/boehm-gc/pthread_support.c:389
#9  0x401fbea3 in _Jv_AllocObj (size=12, klass=0x804bb08) at java-gc.h:46
#10 0x401fa17f in _Jv_AllocObjectNoFinalizer (klass=0x804bb08) at
/home/gcc/gcc/libjava/prims.cc:426


==================================

Detaching after fork from child process 5480.
java.util.ConcurrentModificationException
   at java.util.WeakHashMap$WeakEntrySet$2.checkMod (WeakHashMap.java:243)
   at java.util.WeakHashMap$WeakEntrySet$2.remove (WeakHashMap.java:330)
   at java.util.AbstractCollection.clear (AbstractCollection.java:170)
   at java.util.AbstractMap.clear (AbstractMap.java:122)
   at java.util.WeakHashMap.clear (WeakHashMap.java:828)
   at TestOb.doit (TestOb)
   at TestOb.main (TestOb)
Comment 14 Tom Tromey 2006-03-08 19:27:33 UTC
I've been looking into this a bit.

The current problem I see is that the heavyweight lock stuff
relies on the GC.  This won't interact well with the current
code in natReference.cc, as those data structures are not scanned.

Also, I do think that both calls to _Jv_RegisterFinalizer in
Reference::create are problematic.  The first call registers a finalizer
for the Reference, the second for the referent.  But, there is nothing
preventing a subclass of Reference from having a finalizer; or from
user code acquiring a heavy lock on a Reference object.  So, all
cases have to be handled here.
Comment 15 Rutger Ovidius 2006-11-03 22:58:44 UTC
Updated summary.  (Happy 2 year birthday, stupid stubborn bug.)
Comment 16 David Daney 2008-09-22 20:08:34 UTC
This is biting me now.  Perhaps I may look at fixing it (or maybe not...).
Comment 17 David Daney 2008-09-22 23:56:18 UTC
For me the testcase always gets a ConcurrentModificationException in w.clear() very soon after starting.  This is on GCC trunk 140563 on x86_64-pc-linux-gnu.

If I synchronize(w) for accesses to w there is no ConcurrentModificationException, but I have not seen it crash yet either.
Comment 18 David Daney 2008-09-23 17:40:11 UTC
Created attachment 16396 [details]
Possible work-around

This is a patch against 3.4.3 that we are seriously considering using internally.  The theory is:  With code compiled from valid java source, it is impossible to be GCable while holding a lock, so adding a finalizer is redundant.

It is still not clear to me if the number of heavy_locks could grow without bound while using the patch.  We are analyzing this now...
Comment 19 Hans Boehm 2008-09-23 23:39:51 UTC
I looked at this a bit, trying to remind myself of the logic.  I'm not positive, but it looks plausible to me that doing without the finalizer might work for most applications.  Historically, the finalizer was the only cleanup mechanism for heavy locks.  But that turned out to be insuficient, and maybe_remove_all_heavy was added.  It tries to occasionally remove long lists of heavy locks, that were still observed to accumulate in some cases.  That's probably usually enough to keep the list from growing without bounds.

I would however worry about an application that holds one or more locks indefinitely.  At that point, the heavy_count for the corresponding hash entry never drops to zero, and maybe_remove_all_heavy won't do its job for that entry.  Thus I think the list of heavy locks for that entry will grow without (reasonable) bounds.

As a result, I don't think this is curently a fully general fix, though it may work for many applications.

I'm not positive, but it seems likely to me that one might be able to avoid this "poisoning" effect of a single-in-use lock by adding in-use bits to heavy locks, which are protected on the lock bit by the hash entry.  This may allow unused entries to be cleaned up even if one or two heavy locks in the chain are held.  That might turn this into a real fix, if it's not possible to fix this elsewhere.  It might also simplify the code.


Comment 20 David Daney 2008-09-26 15:58:20 UTC
1) Create an object.

2) Enter a synchronized block on the object and do Object.wait().

3) In a second thread, enter a synchronized block on the object.  There is lock contention, heavy lock escalation is guaranteed.  Hash synchronization code registers a finalizer recording the old finalizer.  The old finalizer is null as no finzlizers have been registered for the object.

4) Create a WeakReference to the Object.  This adds a finalizer, overwriting and thus losing the Hash synchronization's finalizer.

5) Both threads leave the synchronized block.  At this point the heavy_lock object in the hash synchronization code in no longer in use and is eligible for clean up.

6) Many other locks are acquired and released, some of these hash to the same value as the lock used in steps 1 and 2.  Eventually the hash row for these locks is cleaned up.  The clean up code restores the saved finalizer from step 3 which was null, thus overwriting the WeakReference's finalizer.

7) WeakReference's referent is GCed, but its finalizer has been removed in step 6, so it does not get cleaned up.

If you ever synchronize o the If the object itself is a WeakReference, the same thing happens, but ref->enqueue is called on the WeakReference.  If A different type of object has been allocated in the WeakReference's previous location, when  the referent is finalized ref->enqueue will be called through the vtable of the new object resulting sometimes in SIGSEGV or other types of bad behavior.  
Comment 21 Andrew Pinski 2016-09-30 22:52:07 UTC
Closing as won't fix as libgcj (and the java front-end) has been removed from the trunk.