libSegFault.so and gcj

Thu May 14 15:28:00 GMT 2009

Hello all,

We are running a gcj-compiled application on an embedded platform
(MPC852T). For reference our versions are gcc-4.0.1, glibc-2.3.3 and
linux-2.4.24 -- I know these versions are ancient, but please don't stop
reading here.

We sometimes encounter segfaults in our application; that is to say that
it will terminate with 'Segmentation fault' on the console and return
139. These occur rather infrequently, and we have yet to find a reliable
way to reproduce them. To make things more difficult, we do not have
room for core dumps on our filesystem.

I thought that we could get the some information about these segfaults
by using the preload library libSegFault.so; I tested it and integrated
it with our init scripts and let it loose into our releases hoping that
a backtrace or two would come back to me. None did; there was no output
produced by libSegFault.so at all.

I think that since gcj registers its own segfault handler which
translates segv signals into NullPointerExceptions, the original signals
never make it to libSegfault's handler. Gcj registers its handler,
catch_segv (from prims.cc:146 in our version of gcj), in INIT_SEGV
(powerpc-signal.h:62) called from _Jv_CreateJavaVM (prims.cc:1211). Here
is a snippet of INIT_SEGV:

#define INIT_SEGV                            \
do                                    \
  {                                    \
    struct kernel_old_sigaction kact;                    \
    kact.k_sa_handler = catch_segv;                    \
    kact.k_sa_mask = 0;                            \
    kact.k_sa_flags = 0;                        \
    if (syscall (SYS_sigaction, SIGSEGV, &kact, NULL) != 0)        \
      __asm__ __volatile__ (".long 0");                    \
  }                                    \
while (0)

and of catch_segv:

SIGNAL_HANDLER (catch_segv)
{
  java::lang::NullPointerException *nullp
    = new java::lang::NullPointerException;
  unblock_signal (SIGSEGV);
  MAKE_THROW_FRAME (nullp);
  throw nullp;
}

I don't know a whole lot about signal handlers -- please correct me if
I'm wrong: I think that since the syscall (SYS_sigaction,...) passes
NULL as the fourth argument, that gcj is disregarding the presence of
any previously registered signal handlers. I also think that since the
flags are zero that catch_segv is executed on the same stack as the
process that threw the signal instead of the alternate stack.

I reason from this that the segfaults are likely stack overflows. Could
anyone confirm this?

Could we patch INIT_SEGV somehow so that signals not caught by
catch_segv will be passed up so that libSegFault.so can catch them? Is
there another way to catch the cause of these segfaults?

Regards,

Ben Gardiner
Nanometrics Seismological Instruments
250 Herzberg Rd
Kanata ON CA
K2K 2A1
613 592 6776 x239