Bug 60931 - libgo has issues when page size is not 4k
Summary: libgo has issues when page size is not 4k
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: go (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: ---
Assignee: Ian Lance Taylor
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-23 07:06 UTC by Anton Blanchard
Modified: 2014-04-25 04:33 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Bump page size to 64kB (520 bytes, patch)
2014-04-23 07:06 UTC, Anton Blanchard
Details | Diff
Don't use madvise(DONT_NEED) on sub pages (418 bytes, text/plain)
2014-04-24 00:17 UTC, Anton Blanchard
Details
runtime: Fix garbage collector issue with non 4kB system page size (469 bytes, patch)
2014-04-25 02:00 UTC, Anton Blanchard
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Anton Blanchard 2014-04-23 07:06:09 UTC
Created attachment 32659 [details]
Bump page size to 64kB

We are seeing random failures with go programs on a 64kB page size ppc64 box. It looks like garbage collection issues - sometimes we SEGV in timer code, sometimes we SEGV in the code that wraps a kernel read syscall. If I prevent the garbage collector from running, the programs work.

The libgo malloc hard codes the page size so I wrote a quick hack to bump this (and a few other dependent variables). This makes the problem go away, but we will need to come up with a better way to do this at runtime.
Comment 1 Andrew Pinski 2014-04-23 07:11:55 UTC
This is going to be true on AARCH64 also where most distros are going to be using 64k pages (some might use 4k pages if they also support AARCH32).  MIPS has many different page sizes too (4k, 8k, 16k, 32k, and 64k).  So hard coding the page size seems wrong, maybe you should call getpagesize instead.
Comment 2 Anton Blanchard 2014-04-23 07:26:53 UTC
I agree, but when I tried this I found a few places that expect PageSize to be a compile time constant so it is not as trivial as I had hoped.
Comment 3 Ian Lance Taylor 2014-04-23 16:42:11 UTC
It would be extremely helpful if you could find a test case that can recreate this problem with some reliability.  There is no obvious dependency on the system page size in libgo.  The PageSize constant is the unit that the memory allocator deals in, and should have no direct relationship to the system page size.  I believe that there is a bug, but we need to track it down.

If you set the environment variable GOGC=1 the garbage collector will run much more frequently; perhaps that will help get a reproducible test case.
Comment 4 Anton Blanchard 2014-04-24 00:17:06 UTC
Created attachment 32669 [details]
Don't use madvise(DONT_NEED) on sub pages
Comment 5 Anton Blanchard 2014-04-24 00:18:11 UTC
I think I see it:

19112 madvise(0xc211030000, 4096, MADV_DONTNEED) = 0

That 4kB madvise(MADV_DONTNEED) gets rounded up to the system page size of 64kB and we end up covering still in use memory.

The following patch fixes it for me, but it just ignores any sub pages. We should keep them around so later calls have a chance at consolidating regions up to a system page size.
Comment 6 Jakub Jelinek 2014-04-24 05:38:26 UTC
Perhaps it would be better instead of not doing the madvise at all if start or length isn't page aligned round the start to the next page boundary and end to the previous page boundary and madvise if the rounded end is above the rounded start.
Comment 7 Anton Blanchard 2014-04-25 02:00:47 UTC
Created attachment 32679 [details]
runtime: Fix garbage collector issue with non 4kB system page size

The go garbage collector tracks memory in terms of 4kB pages. Most of
the code checks getpagesize() at runtime and does the right thing.

On a 64kB ppc64 box I see SEGVs in long running processes which has
been diagnosed as a bug in scavengelist. scavengelist does a
madvise(MADV_DONTNEED) without rounding the arguments to the system
page size. A strace of one of the failures shows the problem:

madvise(0xc211030000, 4096, MADV_DONTNEED) = 0

The kernel rounds the length up to 64kB and we mark 60kB of valid data
as no longer needed.

Round start up to a system page and end down before calling madvise.
Comment 8 ian@gcc.gnu.org 2014-04-25 04:29:21 UTC
Author: ian
Date: Fri Apr 25 04:28:48 2014
New Revision: 209776

URL: http://gcc.gnu.org/viewcvs?rev=209776&root=gcc&view=rev
Log:
	PR go/60931

runtime: Fix garbage collector issue with non 4kB system page size

The go garbage collector tracks memory in terms of 4kB pages.
Most of the code checks getpagesize() at runtime and does the
right thing.

On a 64kB ppc64 box I see SEGVs in long running processes
which has been diagnosed as a bug in scavengelist.
scavengelist does a madvise(MADV_DONTNEED) without rounding
the arguments to the system page size.  A strace of one of the
failures shows the problem:

madvise(0xc211030000, 4096, MADV_DONTNEED) = 0

The kernel rounds the length up to 64kB and we mark 60kB of
valid data as no longer needed.

Round start up to a system page and end down before calling
madvise.

Modified:
    branches/gcc-4_9-branch/libgo/runtime/mheap.c
Comment 9 ian@gcc.gnu.org 2014-04-25 04:29:39 UTC
Author: ian
Date: Fri Apr 25 04:29:07 2014
New Revision: 209777

URL: http://gcc.gnu.org/viewcvs?rev=209777&root=gcc&view=rev
Log:
	PR go/60931

runtime: Fix garbage collector issue with non 4kB system page size

The go garbage collector tracks memory in terms of 4kB pages.
Most of the code checks getpagesize() at runtime and does the
right thing.

On a 64kB ppc64 box I see SEGVs in long running processes
which has been diagnosed as a bug in scavengelist.
scavengelist does a madvise(MADV_DONTNEED) without rounding
the arguments to the system page size.  A strace of one of the
failures shows the problem:

madvise(0xc211030000, 4096, MADV_DONTNEED) = 0

The kernel rounds the length up to 64kB and we mark 60kB of
valid data as no longer needed.

Round start up to a system page and end down before calling
madvise.

Modified:
    trunk/libgo/runtime/mheap.c
Comment 10 Ian Lance Taylor 2014-04-25 04:33:22 UTC
Thanks for the patch.  I committed a version of it to mainline and 4.9 branch.