Newer linux kernels (2.6.11 in this case) check the executable for a PT_GNU_STACK program header, and if it exists default to provide non-executable memory (for stack _and_ malloced memory) on CPUs which support this (all x86-64 CPUs and newer x86 ones). It seems that gij is not prepared to handle this. This can be seen in some testresults, e.g. here: http://gcc.gnu.org/ml/gcc-testresults/2005-02/msg00223.html This is the autovect branch, but it's the same on all GCC versions (3.3 through 4.0). This is with such a new kernel, with an older kernel which didn't do this yet all those testcases work. Currently Andi Kleen proposed again to switch this off on the kernel side as a hot fix, because too much software currently breaks. But somewhen it will be activated for sure, and then GCC should be able to cope with this. My guess is, that there only are missing some mprotect calls at the right places.
I think this is a libffi problem in that it creates a closure but does not make it exutable.
I don't think this is a libffi problem. gcj allocates trampolines on the heap, not the stack. I think this is a multilib problem.
Note that TestProxy is failing too, which does seem to indicate a problem with ffi closures. boehm-gc does have code to call mprotect() on its heap (see PROTECT/UNPROTECT in os_dep.c), but it isn't used because we don't build the GC with MPROTECT_VDB. It sounds like, if the heap is now not exec-enabled by default, that the GC will have to unconditionally call mprotect() in order to set PROT_EXEC when initializing its heap? cc'ing Hans for his comments.
I believe that the GC alters permissions on the heap only if either - It is running in incremental mode, or - It is built with USE_MMAP, and thus uses mmap to allocate the heap. I think we talked about always doing the latter, but we don't. Thus I think the heap is allocated with sbrk. If the kernel no longer gives execute permission for sbrk memory, then that would propagate through to the Java heap. That would presunably break libffi. It should be possible to confirm a lot of this with strace. I think the collector is configured to disable execute permission whenever it resets permissions. Thus if we used mmap, libffi would presumably break consistently everywhere it tries to put trampolines in the heap. (It doesn't need trampolines on all platforms.) If all of this is correct, then building the collector with USE_MMAP, and telling it to enable execute permission, might be a work-around.
I changed the gc settings to enable USE_MMAP on Linux. I had to do this because at least one Linux kernel didn't give exec permission on the heap. That change did work at the time. 2004-01-20 Andrew Haley <aph@redhat.com> * include/private/gcconfig.h (USE_MMAP): Define for all Linux.
I looked at my build dir (4.0 CVS) and everything looks ok. I guess I should run some strace - could somebody tell me how to invoke gij from the commandline in a simple case so that I can check this?
I'm not so sure that the patch to always define USE_MMAP really has that effect. By my reading this patch is revision 1.36, but if you diff against 1.35 and then go look at include/private/gcconfig.h, it looks like this addition takes place in the middle of a big "#ifdef M68K". Am I misreading this? I have no explanation of why things work on my system.
I agree with Tom's interpretation. The define for USE_MMAP only affects M68K/LINUX. Confirmed with strace on IA/64. I'd prefer to see the USE_MMAP definition in a gcj-specific configuration file. I'd like to move to using sbrk and then mmap on Linux. If we do that, presumably we want USE_MMAP to still mean consistent use of mmap. And we presumably then want to do that only on platforms on which libffi requires it.
I agree, the USE_MMAP is not global. I'm trying now the following patch to see whether we're really on the right track: ============================================================ Index: boehm-gc/include/private/gcconfig.h --- boehm-gc/include/private/gcconfig.h 13 Oct 2004 10:34:21 -0000 1.39 +++ boehm-gc/include/private/gcconfig.h 10 Feb 2005 06:08:42 -0000 @@ -1888,6 +1888,10 @@ # endif # endif +#if defined(LINUX) +# define USE_MMAP +#endif + #if defined(LINUX) && defined(USE_MMAP) /* The kernel may do a somewhat better job merging mappings etc. */ /* with anonymous mappings. */ If you have a patch for sbrk usage, I can try that one also.
With my patch, the results look good again (this is on x86-64 with multilibs): === libjava Summary for unix === # of expected passes 3726 # of expected failures 14 # of untested testcases 16 So, we are on the right track.
It looks like the patch was applied to the wrong place in the file: it certainly was my intention to apply it to all Linux. And indeed, my testing was not on m68k, but on x86-64. The obvious question is whether gc allocated memory should have exec permission by default.
We need to make this change soon. I'd like to do something that Hans would approve of, but I don't know exactly what that might be. Andreas Jaeger says his patch works. Unless someone comes up with something better, we should go with that.
Is it desirable to have all memory allocated by the GC executable though? I think libgcj always knows in advance what memory will it need executable, so wouldn't it be better to provide separate allocation entrypoint (I think void *GC_malloc_atomic_executable (size_t) would be enough) that would be giving executable memory and otherwise you'd get non-executable one (if OS allows that)?
This seems to me like creeping featurism. We need to distinguish between fixing this bug in a simple way and adding "nice" new properties that would require a change to the garbage collector's API. At this stage in gcc 4.0 development I don't think that is appropriate.
I'd like to see USE_MMAP set in boehm-gc/configure.ac instead of gcconfig.h. There's already a comment there that we're not setting NO_EXECUTE_PERMISSION for this reason, where the upstream version does set it. I also think we've been trying to keep gcconfig.h shared, with gcj-specific adjustments in configure.ac. It seems to me that, with this minor adjustment, this is clearly the right short term fix. I'm not sure that we should be turning on execute permission on all platforms, though. I'm pretty sure it's unnecessary on IA64. I think it should be unnecessary on all platforms on which function pointers point to an (ip, GOT) pair. IIRC, that includes PowerPC? At least IA64 does not put trampoline code in the closure. Having said that, I'm not sure that having the Java heap be executable is a major security risk. There can occasionally be a large performance down side to making the heap executable on some platforms, at least in the presence of incremental GC. I know that some old Un*x versions ensured d-cache and i-cache coherency whenever you made an mprotect call with the execute bit set. And this could take a substantial amount of time. Having the collector allocate both executable and non-executable memory sounds non-trivial. Probably it would be easiest to add a new "kind" of pointer-free, executable memory, and have the GC adjust permissions when a new block was moced between kinds. But that may not be cheap (see above). And there are issues if the physical page size is larger than the heap block size. The alternative is to essentially keep a separate heap, which would require a major code reorganization, and introduces a new and different kind of fragmentation. I suspect it would be far easier to have the interpreter allocate closures outside the Java heap.
Subject: Bug 19823 CVSROOT: /cvs/gcc Module name: gcc Changes by: bryce@gcc.gnu.org 2005-02-16 04:10:42 Modified files: boehm-gc : ChangeLog configure.ac configure.host configure boehm-gc/include: gc_config.h.in boehm-gc/include/private: gc_priv.h Log message: 2005-02-15 Bryce McKinlay <mckinlay@redhat.com> PR libgcj/19823 * configure.host: Set gc_use_mmap on *-linux*. * configure.ac: Define USE_MMAP if gc_use_mmap is set. * include/private/gc_priv.h: Include gc_config.h. * configure, include/gc_config.h.in: Rebuilt. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/ChangeLog.diff?cvsroot=gcc&r1=1.218&r2=1.219 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/configure.ac.diff?cvsroot=gcc&r1=1.11&r2=1.12 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/configure.host.diff?cvsroot=gcc&r1=1.7&r2=1.8 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/configure.diff?cvsroot=gcc&r1=1.94&r2=1.95 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/include/gc_config.h.in.diff?cvsroot=gcc&r1=1.5&r2=1.6 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/include/private/gc_priv.h.diff?cvsroot=gcc&r1=1.16&r2=1.17
Michael (or someone else who has seen this bug), Could you confirm that this patch fixes it? Thanks
Subject: Bug 19823 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: bryce@gcc.gnu.org 2005-02-22 19:00:40 Modified files: boehm-gc : ChangeLog configure.host configure.in configure Log message: 2005-02-22 Bryce McKinlay <mckinlay@redhat.com> PR libgcj/19823 * configure.host: Set gc_use_mmap on *-linux*. * configure.in: Define USE_MMAP if gc_use_mmap is set. * configure: Rebuilt. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.170.2.7&r2=1.170.2.8 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/configure.host.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.7&r2=1.7.10.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/configure.in.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.53.2.1&r2=1.53.2.2 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/configure.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.66.2.1&r2=1.66.2.2
Fixed in 3.4.4.