Created attachment 32659 [details] Bump page size to 64kB We are seeing random failures with go programs on a 64kB page size ppc64 box. It looks like garbage collection issues - sometimes we SEGV in timer code, sometimes we SEGV in the code that wraps a kernel read syscall. If I prevent the garbage collector from running, the programs work. The libgo malloc hard codes the page size so I wrote a quick hack to bump this (and a few other dependent variables). This makes the problem go away, but we will need to come up with a better way to do this at runtime.
This is going to be true on AARCH64 also where most distros are going to be using 64k pages (some might use 4k pages if they also support AARCH32). MIPS has many different page sizes too (4k, 8k, 16k, 32k, and 64k). So hard coding the page size seems wrong, maybe you should call getpagesize instead.
I agree, but when I tried this I found a few places that expect PageSize to be a compile time constant so it is not as trivial as I had hoped.
It would be extremely helpful if you could find a test case that can recreate this problem with some reliability. There is no obvious dependency on the system page size in libgo. The PageSize constant is the unit that the memory allocator deals in, and should have no direct relationship to the system page size. I believe that there is a bug, but we need to track it down. If you set the environment variable GOGC=1 the garbage collector will run much more frequently; perhaps that will help get a reproducible test case.
Created attachment 32669 [details] Don't use madvise(DONT_NEED) on sub pages
I think I see it: 19112 madvise(0xc211030000, 4096, MADV_DONTNEED) = 0 That 4kB madvise(MADV_DONTNEED) gets rounded up to the system page size of 64kB and we end up covering still in use memory. The following patch fixes it for me, but it just ignores any sub pages. We should keep them around so later calls have a chance at consolidating regions up to a system page size.
Perhaps it would be better instead of not doing the madvise at all if start or length isn't page aligned round the start to the next page boundary and end to the previous page boundary and madvise if the rounded end is above the rounded start.
Created attachment 32679 [details] runtime: Fix garbage collector issue with non 4kB system page size The go garbage collector tracks memory in terms of 4kB pages. Most of the code checks getpagesize() at runtime and does the right thing. On a 64kB ppc64 box I see SEGVs in long running processes which has been diagnosed as a bug in scavengelist. scavengelist does a madvise(MADV_DONTNEED) without rounding the arguments to the system page size. A strace of one of the failures shows the problem: madvise(0xc211030000, 4096, MADV_DONTNEED) = 0 The kernel rounds the length up to 64kB and we mark 60kB of valid data as no longer needed. Round start up to a system page and end down before calling madvise.
Author: ian Date: Fri Apr 25 04:28:48 2014 New Revision: 209776 URL: http://gcc.gnu.org/viewcvs?rev=209776&root=gcc&view=rev Log: PR go/60931 runtime: Fix garbage collector issue with non 4kB system page size The go garbage collector tracks memory in terms of 4kB pages. Most of the code checks getpagesize() at runtime and does the right thing. On a 64kB ppc64 box I see SEGVs in long running processes which has been diagnosed as a bug in scavengelist. scavengelist does a madvise(MADV_DONTNEED) without rounding the arguments to the system page size. A strace of one of the failures shows the problem: madvise(0xc211030000, 4096, MADV_DONTNEED) = 0 The kernel rounds the length up to 64kB and we mark 60kB of valid data as no longer needed. Round start up to a system page and end down before calling madvise. Modified: branches/gcc-4_9-branch/libgo/runtime/mheap.c
Author: ian Date: Fri Apr 25 04:29:07 2014 New Revision: 209777 URL: http://gcc.gnu.org/viewcvs?rev=209777&root=gcc&view=rev Log: PR go/60931 runtime: Fix garbage collector issue with non 4kB system page size The go garbage collector tracks memory in terms of 4kB pages. Most of the code checks getpagesize() at runtime and does the right thing. On a 64kB ppc64 box I see SEGVs in long running processes which has been diagnosed as a bug in scavengelist. scavengelist does a madvise(MADV_DONTNEED) without rounding the arguments to the system page size. A strace of one of the failures shows the problem: madvise(0xc211030000, 4096, MADV_DONTNEED) = 0 The kernel rounds the length up to 64kB and we mark 60kB of valid data as no longer needed. Round start up to a system page and end down before calling madvise. Modified: trunk/libgo/runtime/mheap.c
Thanks for the patch. I committed a version of it to mainline and 4.9 branch.