[PATCH] more conservative heuristic for ggc-min-heapsize

Mon Oct 9 22:05:00 GMT 2006

On Fri, 2006-10-06 at 13:59 -0700, Mike Stump wrote:
> On Oct 6, 2006, at 1:26 PM, Nathan Froyd wrote:
> > The larger static value (20Mb vs. 16Mb) is intended to accommodate  
> > larger programs
> 
> I'd phrase is this way, there is memory in use behind our back that  
> we don't account for and don't control, so give them just a little  
> more room to play in.  This part I think is reasonable.

OK.  Thanks for bringing out what I should have been saying.

> > limit/4 value is adding an extra measure of conservative-ness.
> 
> I don't think a quarter of memory should be reserved for slop during  
> compilations, maybe 1/10 - 1/40.  I'd rather measure the actual slop  
> in use and what it is used for and go from there.  I suspect it is  
> the optimizer, maybe the optimizer folks can chime in and tell us how  
> much memory they want to malloc.

I don't think it matters too much if you want to peg slop between 1/10
and 1/40.  If you do reserve 1/10 (1/40) of memory for slop, then, with
my patch, you will need an rlimit of ~ >200MB (>800MB) to start having
the "dynamic" limit/10 (limit/40) be greater than the "static" 20MB.
But if your rlimit is (say) 250MB, ggc-min-heapsize will wind up being
>140MB ((250MB - 25MB)/(1.10 + 0.48) -- the 0.48 is about what
ggc_min_expand would calculate; the formula is from
ggc_min_heapsize_heuristic).  At this point, the forcing of
ggc-min-heapsize to be between 4MB and 128MB starts coming into play and
ggc-min-heapsize is kept far away from rlimit almost automatically.

In other words, when rlimit/10 becomes greater than 20MB, it doesn't
make much difference because of the constraint on ggc-min-heapsize.
Therefore you want a larger amount of rlimit devoted to slop to actually
make a difference in the "dynamic" case.  So the options seem to be:

1) Go with 1/4 rlimit for slop;
2) Go with 1/10 rlimit < slop < 1/4 rlimit;
3) Go with a static 20MB figure for slop.

Favoring the conservative approach, I prefer 1 or 2 but would be OK with
2 or 3.

(Part of the problem, as noted by Paul Brook, is that rlimit doesn't
have anything to do with how much memory a compilation run is going to
take.  So I guess '4) Come up with a better heuristic' would be an
acceptable option...)

-Nathan

P.S. FWIW, some statistics on memory usage.

On the testcase from the previous mail, the top 10 consumers of heap
(non-GC'd) memory in 4.1.1 are:

expand                 : 8717 kB
tree PHI insertion     : 6980 kB
CSE                    : 5705 kB
CSE2                   : 5602 kB
tree SSA to normal     : 4464 kB
tree iv optimization   : 4366 kB
tree SSA incremental   : 4188 kB
loop analysis          : 3241 kB
life analysis          : 3219 kB
tree store copyprop    : 3035 kB (tie)
tree copy propagation  : 3035 kB (tie)

The equivalent chart for mainline is:

loop analysis          : 15840 kB
expand                 :  9093 kB
CSE                    :  5857 kB
CSE2                   :  5711 kB
tree PRE               :  4733 kB
tree PHI insertion     :  3610 kB
life analysis          :  3090 kB
tree PTA               :  2657 kB
tree SSA incremental   :  1937 kB
tree FRE               :  1245 kB

The compilation options for the (preprocessed) file were '-O2
-finline-functions' and the runs were on an x86 machine.

The method for gathering these numbers was to modify
x{malloc,calloc,realloc} to record the number of bytes allocated and to
make this value available from libiberty.  Then, when pushing and
popping timevars, the number of bytes allocated was recorded at each
event; the difference of the number of bytes allocated at each event
gives you an (rough) idea of how much memory a pass allocated.  The
numbers above are actually the *maximum* amount of memory a pass takes,
rather than cumulative over an entire compilation run.  Maximum memory
seemed to make more sense for this comparison, since we're concerned
with the amount of memory out of ggc's purview.

(I realize there are problems with nested timevars, realloc'd memory
getting counted twice, etc.  I do not claim exactness, but I would argue
that the above figures are "close enough.")