Looking for ideas to fix X server crash running GCJ-compiled program

Scott Gilbertson scottg@mantatest.com
Fri Mar 7 23:42:00 GMT 2003


> Scott> Now the X server crashes (taking my application with it, of
> Scott> course) after a few minutes of high load.  According to a gdb
> Scott> backtrace, the problem is a SEGV in CopyGC.  I'm using Xfbdev
> Scott> (tiny-X with frame-buffer driver).
Tom> Is this "tiny" because they removed error checking?  That would
Tom> account for a crash.

It doesn't seem that way, looking at the xfree86 sources.  The Xfbdev (tiny,
with frame-buffer video) builds don't include security permission checks,
but I don't see how that would influence a SEGV.  I don't see anything
compile-switched out in the way of argument-checking code, so I believe
Xfbdev is the same as the full X server in that regard.

BTW after doing many runs, I have found that the error is not always a SEGV.
Sometimes it's an illegal instruction at an invalid instruction address,
which I suppose indicates memory corruption.  Nasty business.

Tom> One thing to try would be running your application against an ordinary
Tom> full-featured desktop X server.  Maybe that one won't crash but will
Tom> instead report an error message.  That would help you track back to a
Tom> bug in the peers.

I set up my program running on the normal hardware (my input devices don't
work on a desktop PC), aimed at the X desktop of my RH8 machine.  It ran for
between half an hour and a couple of hours each time then quit.  It didn't
take the server down, and it left no message anywhere I can find it.  I
tried one run using the desktop X with xmon capturing the activity.  It
shows a normal EOF at the end, so maybe this is an application crash (i.e.,
maybe the X server crash only happens on my target system and/or the Xfbdev
tiny server).  I tried another one, this time with "ulimit -c unlimited" on
the client side, to see if I get a core dump when it quits. No core dump was
generated and no exception info printed.  It acts like a normal application
exit.

It's possible that the problem doesn't happen when there's plenty of CPU
power on the server (my RH8 box is a 2 gig P4), or that the required timing
coincidences happen very rarely unless the CPU is heavily loaded. Of course
it's also possible the problem only occurs with my target system's video
driver, or only with frame buffer video drivers generally.

Tom> I say a bug in the peers since in an ordinary environment, the X
Tom> client and the X server are very well separated.  I doubt anything in
Tom> libgcj, like the GC for instance, could affect the X server.

I can think of a few ways the GC could be involved (though not "at fault"):
 1. Something in xlib requires a specific finalization order.  The GC
wouldn't know how to respect that.
 2. Something eats resources until finalize, which should have done a
dispose instead.
 3. Something is affected by timing, so the GC "stop the world" pauses
affect it (by giving the X server time to catch its breath).
My gut feeling, like yours, is that there must be something wrong in the
xlib peers, possibly combined with a subtle bug or limitation in Xfbdev.

I'm thinking of making a small application that generally does the same
things my big one does with regard to the screen, and seeing if it has the
same problem.





More information about the Java mailing list