Under FreeBSD, gcjlib does not always forward SIGSEGV signals to the application code correctly. Specifically, if the application has specified a handler setting the SA_SIGINFO bit in the flags argument of sigaction, then gcjlib calls the handler with a small integer as the second argument. To corectly forward the signal gcjlib needs to call the handler with a pointer to a siginfo_t as the second argument. The following small program demonstrates the problem: gcjsigsegv.cc: #include <iostream> #include <signal.h> #include <gcj/cni.h> #include "AdvancedMath.h" void fault_handler(int sig, siginfo_t * info, void *uap); int main(int argc, char* argv[]) { int i; int j; int k; int *p; bool use_java = (argc > 1); struct sigaction sa; sa.sa_sigaction = &fault_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_SIGINFO; sigaction(SIGSEGV, &sa, 0); if (use_java) { std::cout << "Using Java\n"; JvCreateJavaVM(NULL); JvAttachCurrentThread(NULL, NULL); } else std::cout << "Not using Java\n"; i = 2; j = 3; p = &k; if (use_java) k = AdvancedMath::add(i , j); else k = i + j; std::cout << "k = " << k << "\n"; std::cout << "*p = " << *p << "\n"; p = 0; std::cout << "*p = " << *p << "\n"; return 0; } void fault_handler(int sig, siginfo_t * info, void * uap) { std::cout << "Caught fault\n"; std::cout << "sig = " << sig << "\n"; std::cout << "info = " << info << "\n"; if (reinterpret_cast<int>(info) != 12) std::cout << "info->si_code = " << info->si_code << "\n"; std::cout << "Exiting\n"; exit(2); } AdvancedMath.java: public class AdvancedMath { public static int add(int a, int b) { return a + b; } } Build the program with the following commands: gcj -c -o AdvancedMath.o AdvancedMath.java gcj -C AdvancedMath.java gcjh AdvancedMath g++ -c -o gcjsigsegv.o gcjsigsegv.cc gcj -pthread -o gcjsigsegv gcjsigsegv.o AdvancedMath.o -lstdc++ Running the program without any arguments shows what happens when the application receives the SIGSEGV directly: $ ./gcjsigsegv Not using Java k = 5 *p = 5 Caught fault sig = 11 info = 0xbfbffa38 info->si_code = 12 Exiting Running the program with an argument demonstrates the problem: $ ./gcjsigsegv j Using Java Caught fault sig = 11 info = 0xc Exiting Note that the second argument to fault_handler is now "0xc" when it should be a pointer to a siginfo_t, as it was when Java was not used. Looking at the code for the GC_write_fault_handler function in the boehm-gc/os_dep.c file, it appears that, for FreeBSD, the code does not take into account the POSIX SA_SIGINFO handler prototype that is supported by FreeBDS. I am planning to code a fix for this problem.
Created attachment 6014 [details] Patch to os_dep.c and gcconfig.h to fix this bug This patch fixes the problem reported here. The patch is against the gcc-3.4-20040310 snapshot. After applying the patch the test program produces the following: $ ./gcjsigsegv j Using Java Caught fault sig = 11 info = 0xbfbff7dc info->si_code = 12 Exiting However, the changes in the patch do not appear to completely deal with supporting FreeBSD's signal handling. It appears that the reported bug was due to GC_fault_reset_handler not reinstating a SA_SIGINFO handler correctly. This patch fixes that problem. However, the code in GC_write_fault_handler should determine the style to the old handler and use that style when calling it.
Andrew, There are a number of known failing test cases in libjava related to signal handling on FreeBSD. I thought we had something wrong (or undone) there but I didn't know that the GC configuration itself might also be bad... Thanks for the PR and patch. Are you willing to work on this some more? Or, can you explain what is left to be done (keying off the final However...)? Loren
Loren, I am not planning to do more work on the apparent problem in GC_write_fault_handler (unless it turns out to be a problem for the project am working on, like this bug is). However, if there is more work I could do that would increase the chances of some form of my patch being accepted into GCC please let me know. So, to explain what is left to be done: The apparent problem in GC_write_fault_handler stems from the fact that FreeBSD (in versions >= 4) supports two protypes for signal handling: the traditional BSD style: void handler(int, int code, struct sigcontext *scp); and the POSIX SA_SIGINFO style: void handler(int, siginfo_t *info, ucontext_t *uap); See the sigaction man page for more information: http://www.freebsd.org/cgi/man.cgi?query=sigaction&sektion=2&apropos=0&manpath=FreeBSD+5.2-RELEASE+and+Ports When specifying a signal handler, the caller sets the SA_SIGINFO bit in the sa_flags field if their handler is the POSIX SA_SIGINFO style. The code in GC_write_fault_handler appears to forward signals it is handling to the handler recorded in GC_old_segv_handler (or GC_old_bus_handler) when the fault is not in memory allocated by the GC code. Under FreeBSD the current code always calls the old handler using the tradational BSD style. If the old handler actually uses the POSIX SA_SIGINFO style then it will not receive the arguments it expects. With the my patch it will always use the POSIX SA_SIGINO style. In this case, if the old handler actually uses the traditional BSD style then it will not receive the arguments it expects. I think the right thing for GC_write_fault_handler to do would be to use the style that the old handler was specified with. So the code that currently reads: # if defined (SUNOS5SIGS) (*(REAL_SIG_PF)old_handler) (sig, scp, context); return; # endif would look something like this (assuming my patch is applied): # if defined (SUNOS5SIGS) if (old_handler_flags & SA_SIGINFO) { (*(REAL_SIG_PF)old_handler) (sig, scp, context); } else { (*old_handler) (sig, scp->si_code, (struct sigcontext *)context); return; # endif This code raises at least two questions: Firstly, were does old_handler_flags get set? This would involve extra code in GC_dirty_init, maybe setting a new GC_old_segv_handler_flags variable. Secondly, can the third argument received by GC_write_fault_handler, which is a "ucontext_t *" (assuming my patch is applied) simply be cast to "struct sigcontext *"? The following comment in /usr/include/ucontext.h in the definition of struct ucontext_t suggests this might be OK: * Keep the order of the first two fields. Also, * keep them the first two fields in the structure. * This way we can have a union with struct * sigcontext and ucontext_t. This allows us to * support them both at the same time. * note: the union is not defined, though. In terms of what needs to be done to address this apparent problem: - Confirm that it actually is a bug. The code does not look right to me, but the next step would be to write a test case that demonstrates a problem. I do not know enough about the code to know when the GC_write_fault_handler code comes into play, so I'm not sure what the test would look like. - If a bug is confirmed, code a fix. What I have outlined above should be a starting point. - Test the fix. Andrew
I would generally like to see this go in roughly as is. Thanks for tracking this down. I think the GC_reset_fault_handler problem is much more serious than the remaining one. You really only need GC_write_fault_handler to pass through SIGSEGV if both: - The collector is running in incremental mode. - The client has a SIGSEGV handler for other reasons. I don't think either libgcj or Mono officially support incremental mode. (This has implications on other library code.) For now, a FIXME comment for the second bug is probably sufficient. (Of course, a proper patch would be gratefully accepted, too. A general clean-up patch for this code wouldn't hurt either :-) ) I have one question about the patch: For the hunk at line 121, is there really a reason to exclude FREEBSD? This worries me, since the upstream source replaces this test with one for a "UNIX- LIKE" OS. Can a plain setjmp be used to jump out of a signal handler on FreeBSD? Is sigsetjmp( ..., 1) appreciably slower than plain setjmp?
Hans, Thanks for your comments, in particular explaining under what circumstances GC_write_fault_hanlder would be forwarding SIGSEGV. Regarding your question about the patch: I have no reason to think FreeBSD needs to be excluded from the #if at line 124 of os_dep.c (which controls #defining setjmp to be sigsetjmp and #defining longjmp similarly). The reason I excluded it in the patch was to restrict the changes to the signal handling problem. The original code (from the gcc-3.4-20040310 snapshot, based on gc6.3alpha1) did not define SUNOS5SIGS for FreeBSD. So the original code did not include the #define for setjmp and longjmp under FreeBSD. My patch to gcconfig.h does define SUNOS5SIGS for FreeBSD version 4 or later. So to keep the behaviour of the original code my patch changes "defined (SUNOS5SIGS)" to "(defined SUNOS5SIGS) && !defined(FREEBSD)" at line 124. Sorry, I do not know if a plain setjmp can be used to jump out of a signal handler on FreeBSD. Also, I do not know if sigsetjmp is appreciably slower than plain setjmp. In quickly looking at the code, the most information about whether or not FreeBSD should use sigsetjmp that I found was your comment in the README.changes for gc6.3alpha4: - Made os_dep.c use sigsetjmp and SA_NODEFER for NetBSD. (Thanks to Christian Limpach.) (I generalized the patch to use sigsetjmp on all UNIX_LIKE platforms, admittedly a slightly risky move. But it may avoid similar problems on some other platforms. I also cleaned up the defn of UNIX_LIKE a bit. - Hans) Not knowing anything about how the GCC/GCJ people generally manage syncing the Boehm GC code in GCC with your upstream source, if my patch was going to be applied to GCC I would be inclined to leave it as is. That way the change to using sigsetjmp would occur when gc6.3alpha4 was imported into GCC. What do you think? Andrew
Having read the boehm-gc point here: http://gcc.gnu.org/codingconventions.html#upstream I realize that the normal procedure is to send patches to Hans Boehm. I have sent a patch to the gc mailing list: http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2004-April/000396.html Andrew
(In reply to Andrew Gray from comment #6) > Having read the boehm-gc point here: > > http://gcc.gnu.org/codingconventions.html#upstream > > I realize that the normal procedure is to send patches to Hans Boehm. > > I have sent a patch to the gc mailing list: > > http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2004-April/000396.html > > Andrew In that case that's where we can consider this bug as having MOVED to; closing accordingly.